[#244] pool/tree: Collect request duration statistic #246
No reviewers
TrueCloudLab/storage-sdk-developers
TrueCloudLab/storage-services-developers
Labels
No labels
P0
P1
P2
P3
good first issue
pool
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
6 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-sdk-go#246
Loading…
Reference in a new issue
No description provided.
Delete branch "mbiryukova/frostfs-sdk-go:feature/tree_pool_stat"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
After each request for tree pool statistic accumulated values are reset to zero
Signed-off-by: Marina Biryukova m.biryukova@yadro.com
58ef4a1014
to751f906dbf
751f906dbf
to8b638c3516
8b638c3516
tof97bf40bd2
@ -161,0 +184,4 @@
m.snapshot.allRequests++
}
func (m *MethodStatus) Reset() {
Why do we need this? Also if we really need this, why do use this only in tree pool?
We need this to return statistic collected between requests, not over all time. For pool we want to do the same, but not in this PR
As we discussed with @a.bogatyrev, cumulative metric is not quite representative, especially in a long run. If delay spike happens, cumulative metric changes very slowly or may not change at all. We want to see reactive change in the metric, therefore we are going to change it for both tree and object metric.
@ -315,2 +357,3 @@
var resp *grpcService.GetNodeByPathResponse
if err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {
start := time.Now()
err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {
I'm not sure if we want to measure request with retry
Those are different metrics, basically. In this PR we are interested in a combined time spent on request processing, because retries are expected and we would like to measure whole time pool takes to process request.
As for performance issue investigation, combined metric may seen a bit more useful, but we can add 'per-request' metric as well if it will be needed too.
@ -777,1 +851,4 @@
func (p *Pool) incRequests(elapsed time.Duration, method MethodIndex) {
methodStat := p.methods[method]
methodStat.IncRequests(elapsed)
Probably we can write
p.methods[method].IncRequests(elapsed)
@ -441,3 +489,3 @@
var resp *grpcService.AddResponse
if err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {
start := time.Now()
What kind of duration do you measure?
grpc request duration can be measured with grpc middleware.
pool method duration must include the whole method body (with
signRequest
and request creation).We would like to have a balance between measuring transmission time (which is grpc request duration) but include all retries in it, therefore
signRequest
is not included and grpc middleware isn't used as well.But I agree, seems like we can just calculate whole execution duration, so adding
signRequest
to a measure seems okay for me.@ -172,0 +199,4 @@
case methodRemoveNode:
return "removeNode"
case methodLast:
return "it's a system name rather than a method"
It will be a surprise if this accidentally is showed in a metric dashboard a a label :) we can just ignore this case and it'll be
unknown
f97bf40bd2
toe83d6b7c6a