[#185] Control timeout of tree service operations #290

Merged
alexvanin merged 1 commit from nzinkevich/frostfs-sdk-go:tree_request_timeout into master 2024-11-06 08:12:37 +00:00
Member

Implemented context timeout for all tree service operations except those that return a gRPC stream

Signed-off-by: Nikita Zinkevich n.zinkevich@yadro.com

Implemented context timeout for all tree service operations except those that return a gRPC stream Signed-off-by: Nikita Zinkevich <n.zinkevich@yadro.com>
nzinkevich self-assigned this 2024-10-23 14:38:46 +00:00
nzinkevich added 1 commit 2024-10-23 14:38:47 +00:00
[#185] Control timeout of tree service operations
Some checks failed
DCO / DCO (pull_request) Successful in 58s
Tests and linters / Lint (pull_request) Failing after 1m10s
Tests and linters / Tests (pull_request) Successful in 1m14s
ce086fc234
Implemented context timeout for all tree service operations except those that return a GRPC stream

Signed-off-by: Nikita Zinkevich <n.zinkevich@yadro.com>
nzinkevich requested review from storage-services-committers 2024-10-23 14:38:58 +00:00
nzinkevich requested review from storage-services-developers 2024-10-23 14:38:59 +00:00
nzinkevich force-pushed tree_request_timeout from ce086fc234 to 16a4c016fe 2024-10-23 14:40:22 +00:00 Compare
dkirillov reviewed 2024-10-23 14:48:43 +00:00
@ -490,3 +498,3 @@
var resp *grpcService.AddResponse
err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {
resp, inErr = client.Add(ctx, request)
reqCtx, cancel := getTimeoutContext(ctx, p.connTimeout)
Member

Why don't invoke this function inside p.requestWithRetry?

Why don't invoke this function inside `p.requestWithRetry`?
Author
Member

GetSubTree should be ignored, because it returns a stream to the client. If I create a timeout context for this operation, it will not only cancel the method call, but it will also close the stream, which is using the same context

GetSubTree should be ignored, because it returns a stream to the client. If I create a timeout context for this operation, it will not only cancel the method call, but it will also close the stream, which is using the same context
Member

I meant:

diff --git a/pool/tree/pool.go b/pool/tree/pool.go
index be2cb2d..3d89146 100644
--- a/pool/tree/pool.go
+++ b/pool/tree/pool.go
@@ -362,10 +362,8 @@ func (p *Pool) GetNodes(ctx context.Context, prm GetNodesParams) ([]*grpcService
 	}
 
 	var resp *grpcService.GetNodeByPathResponse
-	err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) {
-		reqCtx, cancel := getTimeoutContext(ctx, p.connTimeout)
-		defer cancel()
-		resp, inErr = client.GetNodeByPath(reqCtx, request)
+	err := p.requestWithRetry(ctx, func(ctx context.Context, client grpcService.TreeServiceClient) (inErr error) {
+		resp, inErr = client.GetNodeByPath(ctx, request)
 		// Pool wants to do retry 'GetNodeByPath' request if result is empty.
 		// Empty result is expected due to delayed tree service sync.
 		// Return an error there to trigger retry and ignore it after,
@@ -820,7 +818,7 @@ func (p *Pool) setStartIndices(i, j int) {
 	p.startIndicesMtx.Unlock()
 }
 
-func (p *Pool) requestWithRetry(ctx context.Context, fn func(client grpcService.TreeServiceClient) error) error {
+func (p *Pool) requestWithRetry(ctx context.Context, fn func(context.Context, grpcService.TreeServiceClient) error) error {
 	var (
 		err, finErr error
 		cl          grpcService.TreeServiceClient
@@ -848,7 +846,9 @@ LOOP:
 			attempts--
 
 			if cl, err = p.innerPools[indexI].clients[indexJ].serviceClient(); err == nil {
-				err = fn(cl)
+				reqCtx, cancel := getTimeoutContext(ctx, p.connTimeout)
+				err = fn(reqCtx, cl)
+				cancel()
 			}
 			if !shouldTryAgain(err) {
 				if startI != indexI || startJ != indexJ {

I meant: ```diff diff --git a/pool/tree/pool.go b/pool/tree/pool.go index be2cb2d..3d89146 100644 --- a/pool/tree/pool.go +++ b/pool/tree/pool.go @@ -362,10 +362,8 @@ func (p *Pool) GetNodes(ctx context.Context, prm GetNodesParams) ([]*grpcService } var resp *grpcService.GetNodeByPathResponse - err := p.requestWithRetry(ctx, func(client grpcService.TreeServiceClient) (inErr error) { - reqCtx, cancel := getTimeoutContext(ctx, p.connTimeout) - defer cancel() - resp, inErr = client.GetNodeByPath(reqCtx, request) + err := p.requestWithRetry(ctx, func(ctx context.Context, client grpcService.TreeServiceClient) (inErr error) { + resp, inErr = client.GetNodeByPath(ctx, request) // Pool wants to do retry 'GetNodeByPath' request if result is empty. // Empty result is expected due to delayed tree service sync. // Return an error there to trigger retry and ignore it after, @@ -820,7 +818,7 @@ func (p *Pool) setStartIndices(i, j int) { p.startIndicesMtx.Unlock() } -func (p *Pool) requestWithRetry(ctx context.Context, fn func(client grpcService.TreeServiceClient) error) error { +func (p *Pool) requestWithRetry(ctx context.Context, fn func(context.Context, grpcService.TreeServiceClient) error) error { var ( err, finErr error cl grpcService.TreeServiceClient @@ -848,7 +846,9 @@ LOOP: attempts-- if cl, err = p.innerPools[indexI].clients[indexJ].serviceClient(); err == nil { - err = fn(cl) + reqCtx, cancel := getTimeoutContext(ctx, p.connTimeout) + err = fn(reqCtx, cl) + cancel() } if !shouldTryAgain(err) { if startI != indexI || startJ != indexJ { ```
Author
Member

In that case all treeService methods which use requestWithRetry will have timeout, but getSubTree shouldn't have it.

In that case all treeService methods which use requestWithRetry will have timeout, but getSubTree shouldn't have it.
Member

Oh, I see. Ok

Oh, I see. Ok
dkirillov marked this conversation as resolved
nzinkevich force-pushed tree_request_timeout from 16a4c016fe to 00eb080f50 2024-10-24 06:50:55 +00:00 Compare
dkirillov reviewed 2024-10-25 13:46:36 +00:00
@ -66,6 +66,7 @@ type InitParameters struct {
logger *zap.Logger
nodeDialTimeout time.Duration
nodeStreamTimeout time.Duration
connTimeout time.Duration
Member

We should add default value for this param in fillDefaultInitParams

We should add default value for this param in `fillDefaultInitParams`
dkirillov marked this conversation as resolved
fyrchik reviewed 2024-10-31 11:40:14 +00:00
@ -66,6 +66,7 @@ type InitParameters struct {
logger *zap.Logger
nodeDialTimeout time.Duration
nodeStreamTimeout time.Duration
Owner

I am a bit out of loop: for unary RPC, isn't nodeStreamTimeout what we need?

I am a bit out of loop: for unary RPC, isn't `nodeStreamTimeout` what we need?
Author
Member

In that case this value cannot be made small enough compared to the rebalance interval, because this value also tunes streaming.
And I think it would be less surprising from the component configuration perspective to set separate parameters for that.

In that case this value cannot be made small enough compared to the rebalance interval, because this value also tunes streaming. And I think it would be less surprising from the component configuration perspective to set separate parameters for that.
Owner

Stream timeout on the API level is in fact a timeout to send 1 RPC message (oneshot or inside a stream).

Stream timeout on the API level is in fact a timeout to send 1 RPC message (oneshot or inside a stream).
Author
Member

Then it's okay to use nodeStreamTimeout if it works this way

Then it's okay to use `nodeStreamTimeout` if it works this way
nzinkevich force-pushed tree_request_timeout from 00eb080f50 to 55f5409203 2024-11-01 06:35:14 +00:00 Compare
dkirillov approved these changes 2024-11-05 07:03:26 +00:00
dkirillov left a comment
Member

LGTM,
But is it possible to add some tests?

LGTM, But is it possible to add some tests?
aarifullin approved these changes 2024-11-05 08:05:49 +00:00
alexvanin requested changes 2024-11-06 07:45:22 +00:00
Dismissed
alexvanin left a comment
Owner

Let's get a bit deeper in this PR.

streamTimeout now controls timeout for unary gRPC operations. However there are stream operations too, e.g. GetSubTree. Stream is a set of messages, and gRPC allows to set timeout for whole stream. We would like to control timeouts for every message in a stream.

I want you to check api/rpc/client package. This package re-implements client methods by using RPC client from api/rpc/client package.

This packages defines read/write wrappers, that control timeout for every single message transmission.

I suggest you to keep this commit and try to add tree service implementation in api/rpc package the same way other services are defined there.

Let's get a bit deeper in this PR. `streamTimeout` now controls timeout for unary gRPC operations. However there are stream operations too, e.g. `GetSubTree`. Stream is a set of messages, and gRPC allows to set timeout for whole stream. We would like to control timeouts for every message in a stream. I want you to check [api/rpc/client](https://git.frostfs.info/TrueCloudLab/frostfs-sdk-go/src/branch/master/api/rpc) package. This package re-implements client methods by using RPC client from `api/rpc/client` package. This packages defines [read/write wrappers]([url](https://git.frostfs.info/TrueCloudLab/frostfs-sdk-go/src/branch/master/api/rpc/client/stream_wrapper.go)), that control timeout for every single message transmission. I suggest you to keep this commit and try to add tree service implementation in `api/rpc` package the same way other services are defined there.
alexvanin approved these changes 2024-11-06 08:12:31 +00:00
alexvanin left a comment
Owner

With @nzinkevich we've decided to approach stream timeouts later in separate issue, because it may be a big diff. Let's keep it separate.

With @nzinkevich we've decided to approach stream timeouts later in separate issue, because it may be a big diff. Let's keep it separate.
alexvanin merged commit 43d5c8dbac into master 2024-11-06 08:12:37 +00:00
alexvanin deleted branch tree_request_timeout 2024-11-06 08:12:39 +00:00
Sign in to join this conversation.
No reviewers
TrueCloudLab/storage-services-developers
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-sdk-go#290
No description provided.