[#244] pool/tree: Collect request duration statistic #246

mbiryukova · 2024-07-31T09:53:47Z

mbiryukova commented

2024-07-31 09:53:47 +00:00

After each request for tree pool statistic accumulated values are reset to zero

Signed-off-by: Marina Biryukova m.biryukova@yadro.com

After each request for tree pool statistic accumulated values are reset to zero Signed-off-by: Marina Biryukova <m.biryukova@yadro.com>

mbiryukova self-assigned this 2024-07-31 09:53:47 +00:00

mbiryukova added 1 commit 2024-07-31 09:53:54 +00:00

[#244 ] pool/tree: Collect request duration statistic

DCO / DCO (pull_request) Successful in 32s

Details

Tests and linters / Tests (1.22) (pull_request) Successful in 44s

Details

Tests and linters / Tests (1.21) (pull_request) Successful in 1m0s

Details

Tests and linters / Lint (pull_request) Successful in 1m47s

Details

58ef4a1014

Signed-off-by: Marina Biryukova <m.biryukova@yadro.com>

mbiryukova requested review from storage-core-committers 2024-07-31 09:57:47 +00:00

mbiryukova requested review from storage-core-developers 2024-07-31 09:57:48 +00:00

mbiryukova requested review from storage-sdk-committers 2024-07-31 09:57:48 +00:00

mbiryukova requested review from storage-sdk-developers 2024-07-31 09:57:49 +00:00

mbiryukova requested review from storage-services-committers 2024-07-31 09:57:50 +00:00

mbiryukova requested review from storage-services-developers 2024-07-31 09:57:50 +00:00

mbiryukova force-pushed feature/tree_pool_stat from 58ef4a1014 to 751f906dbf

2024-07-31 15:14:24 +00:00

Compare

mbiryukova force-pushed feature/tree_pool_stat from 751f906dbf to 8b638c3516

2024-08-01 08:17:37 +00:00

Compare

mbiryukova force-pushed feature/tree_pool_stat from 8b638c3516 to f97bf40bd2

2024-08-01 08:24:02 +00:00

Compare

dkirillov reviewed 2024-08-01 08:39:52 +00:00

pool/statistic.go Outdated

					
				@ -161,0 +184,4 @@

					m.snapshot.allRequests++

				}

				func (m *MethodStatus) Reset() {

dkirillov commented

2024-08-01 08:35:58 +00:00

Why do we need this? Also if we really need this, why do use this only in tree pool?

mbiryukova commented

2024-08-01 10:35:46 +00:00

We need this to return statistic collected between requests, not over all time. For pool we want to do the same, but not in this PR

alexvanin commented

2024-08-02 06:41:13 +00:00

As we discussed with @a.bogatyrev, cumulative metric is not quite representative, especially in a long run. If delay spike happens, cumulative metric changes very slowly or may not change at all. We want to see reactive change in the metric, therefore we are going to change it for both tree and object metric.

dkirillov marked this conversation as resolved

pool/tree/pool.go Outdated

					
				@ -315,2 +357,3 @@

					var resp *grpcService.GetNodeByPathResponse

					if err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {

					start := time.Now()

					err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {

dkirillov commented

2024-08-01 08:39:44 +00:00

I'm not sure if we want to measure request with retry

alexvanin commented

2024-08-02 06:48:30 +00:00

Those are different metrics, basically. In this PR we are interested in a combined time spent on request processing, because retries are expected and we would like to measure whole time pool takes to process request.

As for performance issue investigation, combined metric may seen a bit more useful, but we can add 'per-request' metric as well if it will be needed too.

Those are different metrics, basically. In this PR we are interested in a combined time spent on request processing, because retries are expected and we would like to measure whole time pool takes to process request. As for performance issue investigation, combined metric may seen a bit more useful, but we can add 'per-request' metric as well if it will be needed too.

dkirillov marked this conversation as resolved

pool/tree/pool.go Outdated

					
				@ -777,1 +851,4 @@

				func (p *Pool) incRequests(elapsed time.Duration, method MethodIndex) {

					methodStat := p.methods[method]

					methodStat.IncRequests(elapsed)

dkirillov commented

2024-08-01 08:38:06 +00:00

Probably we can write p.methods[method].IncRequests(elapsed)

Probably we can write `p.methods[method].IncRequests(elapsed)`

dkirillov marked this conversation as resolved

dstepanov-yadro reviewed 2024-08-01 13:55:21 +00:00

pool/tree/pool.go Outdated

					
				@ -441,3 +489,3 @@

					var resp *grpcService.AddResponse

					if err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {

					start := time.Now()

dstepanov-yadro commented

2024-08-01 13:55:05 +00:00

What kind of duration do you measure?
grpc request duration can be measured with grpc middleware.
pool method duration must include the whole method body (with signRequest and request creation).

What kind of duration do you measure? grpc request duration can be measured with grpc middleware. pool method duration must include the whole method body (with `signRequest` and request creation).

alexvanin commented

2024-08-02 06:53:54 +00:00

We would like to have a balance between measuring transmission time (which is grpc request duration) but include all retries in it, therefore signRequest is not included and grpc middleware isn't used as well.

But I agree, seems like we can just calculate whole execution duration, so adding signRequest to a measure seems okay for me.

We would like to have a balance between measuring transmission time (which is grpc request duration) but include all retries in it, therefore `signRequest` is not included and grpc middleware isn't used as well. But I agree, seems like we can just calculate whole execution duration, so adding `signRequest` to a measure seems okay for me.

dstepanov-yadro marked this conversation as resolved

alexvanin approved these changes 2024-08-02 06:54:07 +00:00

aarifullin reviewed 2024-08-02 07:59:26 +00:00

pool/tree/pool.go Outdated

					
				@ -172,0 +199,4 @@

					case methodRemoveNode:

						return "removeNode"

					case methodLast:

						return "it's a system name rather than a method"