IO tag metrics #1653
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#1653
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "dstepanov-yadro/frostfs-node:feat/io_tag_metrics"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Added operation counters per io_tag/operation/shard/type for each shard.
Each operation changes its status as follows:
pending
->in_progress
->completed
orresource_exhausted
.For object service and tree service added per io_tag requests counters. I decided not to add labels to the existing metrics, so as not to increase the cardinality.
After this PR I will revert commit
25102d1e1a
as limiter counter implemented in frostfs-node.03445cdef6
to8bd6d8dd8c
8bd6d8dd8c
tofabc1fef5c
fabc1fef5c
to1d5a24d9fc
1d5a24d9fc
to8e2cd99f82
8e2cd99f82
to76b6370770
76b6370770
toe3031359e0
e3031359e0
to4d5d6edcbe
WIP: Extension of metrics with IO tagsto WIP: IO tag metricsWIP: IO tag metricsto IO tag metrics@ -34,10 +41,6 @@ func NewLimiter(c *limits.Config) (Limiter, error) {
if err := validateConfig(c); err != nil {
return nil, err
}
read, write := c.Read(), c.Write()
This belongs to the commit
[#1653] qos: Add metrics
Could you explain why have you removed there lines?
After this commit limiter must be able to report metrics. But noop limiter is just stub. So not to add extra code for noop limiter I just dropped this code.
@ -114,0 +130,4 @@
readStats map[string]*stat
writeStats map[string]*stat
shardID atomic.Value // string
This will contain a pointer to string anyway (implicitly, when setting interface value to string).
Why have you decided to use
atomic.Value
instead ofatomic.Pointer
?I scare about pointer dereference. But ok, replaced with
atomic.Pointer[shardID]
@ -139,2 +170,3 @@
}
return ReleaseFunc(rel), nil
return func() {
stat.completed.Add(1)
It is important that
stat.completed.Add()
is called beforerel
?The way I see it is that
rel
will release resources associated with the request, and we should "complete" request only afterwards. But OK, if it was delibarate.Fixed
@ -0,0 +18,4 @@
// stat presents limiter statistics counters.
//
// Each operation changes its status as follows: `pending` -> `in_progress` -> `completed` or `resource_exhausted`.
So these metrics do not reflect an instantaneous snapshot, but rather an accumulated value?
It should be noted somewhere, because when I see
metric{type="in_progress"}=123
, my natural interpretation is "how much requests are currently being handled".Maybe we could export
in_progress - pending
instead (without changing this struct)?Changed a little struct description.
complatedTotal
might be clearer, but it looks cumbersome.Prometheus metrics have
_total
suffix like others accumulated counters.in_progress
is also of interest, for example, to check the weights of tags.@ -0,0 +22,4 @@
func getStat(tag string, stats map[string]*stat) *stat {
statTag := tag
if _, ok := statTags[tag]; !ok {
Is there any reason you use
statTags
map here instead ofstats
?The first returned value could then be returned immediately without additional map access.
No, as they must contain the same keys. Replaced with
stats
.@ -40,0 +46,4 @@
func IOTagFromContext(ctx context.Context) string {
tag, ok := tagging.IOTagFromContext(ctx)
if !ok {
tag = "undefined"
There is
undefined
here, andunknown
ininternal/qos/stats.go
.How do these strings relate and why do they differ?
This method used to get tag from context to report metrics. If current context doesn't contain any tag, then
undefined
placeholder used.In
internal/qos/stats.go
before getting tag stats:This is because shard assumes that request without tag is
client
request.So when when getting tag stat, tag must be defined. But it could be unknown in case of some minor bugs I think.
mclock
scheduler will return error in this case, but for metrics it isunknown
tag.Also object service and tree service first adjust IO tags and then report metrics, so there must not be
unknown
tags.4d5d6edcbe
to562a23498c
562a23498c
tobaf2a96b6a
baf2a96b6a
toce38eb376b
ce38eb376b
to55c0e96ad1
55c0e96ad1
to5e70d3c9c6
@ -0,0 +25,4 @@
if _, ok := stats[tag]; !ok {
statTag = unknownStatsTag
}
return stats[statTag]
How about change to this:
fixed
5e70d3c9c6
to23fed45570