[#244] pool/tree: Collect request duration statistic #246

Merged
dkirillov merged 1 commit from mbiryukova/frostfs-sdk-go:feature/tree_pool_stat into master 2024-08-05 06:22:53 +00:00
Member

After each request for tree pool statistic accumulated values are reset to zero

Signed-off-by: Marina Biryukova m.biryukova@yadro.com

After each request for tree pool statistic accumulated values are reset to zero Signed-off-by: Marina Biryukova <m.biryukova@yadro.com>
mbiryukova self-assigned this 2024-07-31 09:53:47 +00:00
mbiryukova added 1 commit 2024-07-31 09:53:54 +00:00
[#244] pool/tree: Collect request duration statistic
All checks were successful
DCO / DCO (pull_request) Successful in 32s
Tests and linters / Tests (1.22) (pull_request) Successful in 44s
Tests and linters / Tests (1.21) (pull_request) Successful in 1m0s
Tests and linters / Lint (pull_request) Successful in 1m47s
58ef4a1014
Signed-off-by: Marina Biryukova <m.biryukova@yadro.com>
mbiryukova requested review from storage-core-committers 2024-07-31 09:57:47 +00:00
mbiryukova requested review from storage-core-developers 2024-07-31 09:57:48 +00:00
mbiryukova requested review from storage-sdk-committers 2024-07-31 09:57:48 +00:00
mbiryukova requested review from storage-sdk-developers 2024-07-31 09:57:49 +00:00
mbiryukova requested review from storage-services-committers 2024-07-31 09:57:50 +00:00
mbiryukova requested review from storage-services-developers 2024-07-31 09:57:50 +00:00
mbiryukova force-pushed feature/tree_pool_stat from 58ef4a1014 to 751f906dbf 2024-07-31 15:14:24 +00:00 Compare
mbiryukova force-pushed feature/tree_pool_stat from 751f906dbf to 8b638c3516 2024-08-01 08:17:37 +00:00 Compare
mbiryukova force-pushed feature/tree_pool_stat from 8b638c3516 to f97bf40bd2 2024-08-01 08:24:02 +00:00 Compare
dkirillov reviewed 2024-08-01 08:39:52 +00:00
@ -161,0 +184,4 @@
m.snapshot.allRequests++
}
func (m *MethodStatus) Reset() {
Member

Why do we need this? Also if we really need this, why do use this only in tree pool?

Why do we need this? Also if we really need this, why do use this only in tree pool?
Author
Member

We need this to return statistic collected between requests, not over all time. For pool we want to do the same, but not in this PR

We need this to return statistic collected between requests, not over all time. For pool we want to do the same, but not in this PR
Owner

As we discussed with @a.bogatyrev, cumulative metric is not quite representative, especially in a long run. If delay spike happens, cumulative metric changes very slowly or may not change at all. We want to see reactive change in the metric, therefore we are going to change it for both tree and object metric.

As we discussed with @a.bogatyrev, cumulative metric is not quite representative, especially in a long run. If delay spike happens, cumulative metric changes very slowly or may not change at all. We want to see reactive change in the metric, therefore we are going to change it for both tree and object metric.
dkirillov marked this conversation as resolved
@ -315,2 +357,3 @@
var resp *grpcService.GetNodeByPathResponse
if err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {
start := time.Now()
err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {
Member

I'm not sure if we want to measure request with retry

I'm not sure if we want to measure request with retry
Owner

Those are different metrics, basically. In this PR we are interested in a combined time spent on request processing, because retries are expected and we would like to measure whole time pool takes to process request.

As for performance issue investigation, combined metric may seen a bit more useful, but we can add 'per-request' metric as well if it will be needed too.

Those are different metrics, basically. In this PR we are interested in a combined time spent on request processing, because retries are expected and we would like to measure whole time pool takes to process request. As for performance issue investigation, combined metric may seen a bit more useful, but we can add 'per-request' metric as well if it will be needed too.
dkirillov marked this conversation as resolved
@ -777,1 +851,4 @@
func (p *Pool) incRequests(elapsed time.Duration, method MethodIndex) {
methodStat := p.methods[method]
methodStat.IncRequests(elapsed)
Member

Probably we can write p.methods[method].IncRequests(elapsed)

Probably we can write `p.methods[method].IncRequests(elapsed)`
dkirillov marked this conversation as resolved
dstepanov-yadro reviewed 2024-08-01 13:55:21 +00:00
@ -441,3 +489,3 @@
var resp *grpcService.AddResponse
if err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {
start := time.Now()

What kind of duration do you measure?
grpc request duration can be measured with grpc middleware.
pool method duration must include the whole method body (with signRequest and request creation).

What kind of duration do you measure? grpc request duration can be measured with grpc middleware. pool method duration must include the whole method body (with `signRequest` and request creation).
Owner

We would like to have a balance between measuring transmission time (which is grpc request duration) but include all retries in it, therefore signRequest is not included and grpc middleware isn't used as well.

But I agree, seems like we can just calculate whole execution duration, so adding signRequest to a measure seems okay for me.

We would like to have a balance between measuring transmission time (which is grpc request duration) but include all retries in it, therefore `signRequest` is not included and grpc middleware isn't used as well. But I agree, seems like we can just calculate whole execution duration, so adding `signRequest` to a measure seems okay for me.
dstepanov-yadro marked this conversation as resolved
alexvanin approved these changes 2024-08-02 06:54:07 +00:00
aarifullin reviewed 2024-08-02 07:59:26 +00:00
@ -172,0 +199,4 @@
case methodRemoveNode:
return "removeNode"
case methodLast:
return "it's a system name rather than a method"
Member

It will be a surprise if this accidentally is showed in a metric dashboard a a label :) we can just ignore this case and it'll be unknown

It will be a surprise if this accidentally is showed in a metric dashboard a a label :) we can just ignore this case and it'll be `unknown`
aarifullin marked this conversation as resolved
mbiryukova force-pushed feature/tree_pool_stat from f97bf40bd2 to e83d6b7c6a 2024-08-02 10:02:41 +00:00 Compare
aarifullin approved these changes 2024-08-02 10:58:36 +00:00
dkirillov approved these changes 2024-08-02 13:43:24 +00:00
elebedeva approved these changes 2024-08-02 14:18:07 +00:00
dstepanov-yadro approved these changes 2024-08-02 15:15:58 +00:00
dkirillov merged commit e83d6b7c6a into master 2024-08-05 06:22:53 +00:00
Sign in to join this conversation.
No reviewers
TrueCloudLab/storage-sdk-developers
TrueCloudLab/storage-services-developers
No milestone
No project
No assignees
6 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-sdk-go#246
No description provided.