IO tag metrics #1653

Merged
dstepanov-yadro merged 4 commits from dstepanov-yadro/frostfs-node:feat/io_tag_metrics into master 2025-03-11 10:57:48 +00:00
  1. Added operation counters per io_tag/operation/shard/type for each shard.
    Each operation changes its status as follows: pending -> in_progress -> completed or resource_exhausted.

  2. For object service and tree service added per io_tag requests counters. I decided not to add labels to the existing metrics, so as not to increase the cardinality.

After this PR I will revert commit 25102d1e1a as limiter counter implemented in frostfs-node.

1. Added operation counters per io_tag/operation/shard/type for each shard. Each operation changes its status as follows: `pending` -> `in_progress` -> `completed` or `resource_exhausted`. 2. For object service and tree service added per io_tag requests counters. I decided not to add labels to the existing metrics, so as not to increase the cardinality. After this PR I will revert commit https://git.frostfs.info/TrueCloudLab/frostfs-qos/commit/25102d1e1aa3e9232af803ce3cd0088c37263de3 as limiter counter implemented in frostfs-node.
dstepanov-yadro added 5 commits 2025-02-24 08:32:52 +00:00
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
[#9999] pilorama: Add IO tag label to metrics
Some checks failed
DCO action / DCO (pull_request) Successful in 41s
Vulncheck / Vulncheck (pull_request) Successful in 51s
Tests and linters / Run gofumpt (pull_request) Successful in 1m6s
Build / Build Components (pull_request) Successful in 1m25s
Pre-commit hooks / Pre-commit (pull_request) Successful in 1m25s
Tests and linters / Lint (pull_request) Failing after 1m45s
Tests and linters / Tests (pull_request) Successful in 2m8s
Tests and linters / Staticcheck (pull_request) Successful in 2m8s
Tests and linters / gopls check (pull_request) Successful in 2m48s
Tests and linters / Tests with -race (pull_request) Successful in 3m11s
03445cdef6
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
dstepanov-yadro force-pushed feat/io_tag_metrics from 03445cdef6 to 8bd6d8dd8c 2025-02-24 08:36:35 +00:00 Compare
dstepanov-yadro force-pushed feat/io_tag_metrics from 8bd6d8dd8c to fabc1fef5c 2025-02-24 08:37:51 +00:00 Compare
dstepanov-yadro force-pushed feat/io_tag_metrics from fabc1fef5c to 1d5a24d9fc 2025-02-28 11:35:48 +00:00 Compare
dstepanov-yadro force-pushed feat/io_tag_metrics from 1d5a24d9fc to 8e2cd99f82 2025-03-03 14:25:11 +00:00 Compare
dstepanov-yadro force-pushed feat/io_tag_metrics from 8e2cd99f82 to 76b6370770 2025-03-05 13:09:03 +00:00 Compare
dstepanov-yadro force-pushed feat/io_tag_metrics from 76b6370770 to e3031359e0 2025-03-06 07:36:40 +00:00 Compare
dstepanov-yadro force-pushed feat/io_tag_metrics from e3031359e0 to 4d5d6edcbe 2025-03-06 07:37:13 +00:00 Compare
dstepanov-yadro changed title from WIP: Extension of metrics with IO tags to WIP: IO tag metrics 2025-03-06 08:04:13 +00:00
dstepanov-yadro changed title from WIP: IO tag metrics to IO tag metrics 2025-03-06 08:16:28 +00:00
requested reviews from storage-core-committers, storage-core-developers 2025-03-06 08:16:28 +00:00
fyrchik reviewed 2025-03-07 11:17:47 +00:00
@ -34,10 +41,6 @@ func NewLimiter(c *limits.Config) (Limiter, error) {
if err := validateConfig(c); err != nil {
return nil, err
}
read, write := c.Read(), c.Write()
Owner

This belongs to the commit [#1653] qos: Add metrics
Could you explain why have you removed there lines?

This belongs to the commit `[#1653] qos: Add metrics` Could you explain why have you removed there lines?
Author
Member

After this commit limiter must be able to report metrics. But noop limiter is just stub. So not to add extra code for noop limiter I just dropped this code.

After this commit limiter must be able to report metrics. But noop limiter is just stub. So not to add extra code for noop limiter I just dropped this code.
fyrchik marked this conversation as resolved
@ -114,0 +130,4 @@
readStats map[string]*stat
writeStats map[string]*stat
shardID atomic.Value // string
Owner

This will contain a pointer to string anyway (implicitly, when setting interface value to string).
Why have you decided to use atomic.Value instead of atomic.Pointer?

This will contain a pointer to string anyway (implicitly, when setting interface value to string). Why have you decided to use `atomic.Value` instead of `atomic.Pointer`?
Author
Member

I scare about pointer dereference. But ok, replaced with atomic.Pointer[shardID]

I scare about pointer dereference. But ok, replaced with `atomic.Pointer[shardID]`
fyrchik marked this conversation as resolved
@ -139,2 +170,3 @@
}
return ReleaseFunc(rel), nil
return func() {
stat.completed.Add(1)
Owner

It is important that stat.completed.Add() is called before rel?
The way I see it is that rel will release resources associated with the request, and we should "complete" request only afterwards. But OK, if it was delibarate.

It is important that `stat.completed.Add()` is called before `rel`? The way I see it is that `rel` will release resources associated with the request, and we should "complete" request only afterwards. But OK, if it was delibarate.
Author
Member

Fixed

Fixed
fyrchik marked this conversation as resolved
@ -0,0 +18,4 @@
// stat presents limiter statistics counters.
//
// Each operation changes its status as follows: `pending` -> `in_progress` -> `completed` or `resource_exhausted`.
Owner

So these metrics do not reflect an instantaneous snapshot, but rather an accumulated value?
It should be noted somewhere, because when I see metric{type="in_progress"}=123, my natural interpretation is "how much requests are currently being handled".
Maybe we could export in_progress - pending instead (without changing this struct)?

So these metrics do not reflect an instantaneous snapshot, but rather an accumulated value? It should be noted somewhere, because when I see `metric{type="in_progress"}=123`, my natural interpretation is "how much requests are currently being handled". Maybe we could export `in_progress - pending` instead (without changing this struct)?
Author
Member

Changed a little struct description.
complatedTotal might be clearer, but it looks cumbersome.
Prometheus metrics have _total suffix like others accumulated counters.
in_progress is also of interest, for example, to check the weights of tags.

Changed a little struct description. `complatedTotal` might be clearer, but it looks cumbersome. Prometheus metrics have `_total` suffix like others accumulated counters. `in_progress` is also of interest, for example, to check the weights of tags.
@ -0,0 +22,4 @@
func getStat(tag string, stats map[string]*stat) *stat {
statTag := tag
if _, ok := statTags[tag]; !ok {
Owner

Is there any reason you use statTags map here instead of stats?
The first returned value could then be returned immediately without additional map access.

Is there any reason you use `statTags` map here instead of `stats`? The first returned value could then be returned immediately without additional map access.
Author
Member

No, as they must contain the same keys. Replaced with stats.

No, as they must contain the same keys. Replaced with `stats`.
fyrchik marked this conversation as resolved
@ -40,0 +46,4 @@
func IOTagFromContext(ctx context.Context) string {
tag, ok := tagging.IOTagFromContext(ctx)
if !ok {
tag = "undefined"
Owner

There is undefined here, and unknown in internal/qos/stats.go.
How do these strings relate and why do they differ?

There is `undefined` here, and `unknown` in `internal/qos/stats.go`. How do these strings relate and why do they differ?
Author
Member

This method used to get tag from context to report metrics. If current context doesn't contain any tag, then undefined placeholder used.

In internal/qos/stats.go before getting tag stats:

	tag, ok := tagging.IOTagFromContext(ctx)
	if !ok {
		tag = IOTagClient.String()
	}

This is because shard assumes that request without tag is client request.
So when when getting tag stat, tag must be defined. But it could be unknown in case of some minor bugs I think. mclock scheduler will return error in this case, but for metrics it is unknown tag.

This method used to get tag from context to report metrics. If current context doesn't contain any tag, then `undefined` placeholder used. In `internal/qos/stats.go` before getting tag stats: ``` tag, ok := tagging.IOTagFromContext(ctx) if !ok { tag = IOTagClient.String() } ``` This is because shard assumes that request without tag is `client` request. So when when getting tag stat, tag must be defined. But it could be unknown in case of some minor bugs I think. `mclock` scheduler will return error in this case, but for metrics it is `unknown` tag.
Author
Member

Also object service and tree service first adjust IO tags and then report metrics, so there must not be unknown tags.

Also object service and tree service first adjust IO tags and then report metrics, so there must not be `unknown` tags.
fyrchik marked this conversation as resolved
dstepanov-yadro force-pushed feat/io_tag_metrics from 4d5d6edcbe to 562a23498c 2025-03-07 12:36:30 +00:00 Compare
dstepanov-yadro force-pushed feat/io_tag_metrics from 562a23498c to baf2a96b6a 2025-03-07 12:39:59 +00:00 Compare
dstepanov-yadro force-pushed feat/io_tag_metrics from baf2a96b6a to ce38eb376b 2025-03-07 12:44:50 +00:00 Compare
dstepanov-yadro force-pushed feat/io_tag_metrics from ce38eb376b to 55c0e96ad1 2025-03-07 12:53:10 +00:00 Compare
dstepanov-yadro force-pushed feat/io_tag_metrics from 55c0e96ad1 to 5e70d3c9c6 2025-03-10 08:05:12 +00:00 Compare
acid-ant reviewed 2025-03-10 11:54:57 +00:00
@ -0,0 +25,4 @@
if _, ok := stats[tag]; !ok {
statTag = unknownStatsTag
}
return stats[statTag]
Member

How about change to this:

	if _, ok := stats[tag]; ok {
		return stats[tag]
	}
	return stats[unknownStatsTag]
How about change to this: ``` if _, ok := stats[tag]; ok { return stats[tag] } return stats[unknownStatsTag] ```
Author
Member

fixed

fixed
dstepanov-yadro force-pushed feat/io_tag_metrics from 5e70d3c9c6 to 23fed45570 2025-03-10 13:44:44 +00:00 Compare
acid-ant approved these changes 2025-03-10 13:47:46 +00:00
fyrchik approved these changes 2025-03-11 10:49:20 +00:00
dstepanov-yadro merged commit 597bce7a87 into master 2025-03-11 10:57:48 +00:00
dstepanov-yadro deleted branch feat/io_tag_metrics 2025-03-11 10:57:50 +00:00
dstepanov-yadro referenced this pull request from a commit 2025-03-11 10:57:50 +00:00
dstepanov-yadro referenced this pull request from a commit 2025-03-11 10:57:51 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1653
No description provided.