cli: Use grpc.WaitForReady while initializing SDK client #1441

Merged
dstepanov-yadro merged 1 commit from a-savchuk/frostfs-node:correct-timeout-for-cli into master 2024-10-26 11:30:27 +00:00
Member

Before, when the target RPC server was unavailable, requests made by CLI didn't wait for a timeout specified by the --timeout option if the timeout was more than 20 seconds. It's because of the gRPC default backoff strategy. Adding this option fixes that behavior.

Before

$ time ./frostfs-cli -c cli.yaml --timeout 30s container list
rpc error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout"

real    0m20.443s
user    0m0.417s
sys     0m0.046s

After

$ time ./frostfs-cli -c cli.yaml --timeout 30s container list
can't create API client: can't init SDK client: context deadline exceeded

real    0m30.495s
user    0m0.479s
sys     0m0.048s
Before, when the target RPC server was unavailable, requests made by CLI didn't wait for a timeout specified by the `--timeout` option if the timeout was more than 20 seconds. It's because of the gRPC default backoff strategy. Adding this option fixes that behavior. Before ``` $ time ./frostfs-cli -c cli.yaml --timeout 30s container list rpc error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout" real 0m20.443s user 0m0.417s sys 0m0.046s ``` After ``` $ time ./frostfs-cli -c cli.yaml --timeout 30s container list can't create API client: can't init SDK client: context deadline exceeded real 0m30.495s user 0m0.479s sys 0m0.048s ```
a-savchuk added 1 commit 2024-10-22 07:31:52 +00:00
[#xx] cli: Use grpc.WaitForReady while initializing SDK client
Some checks failed
DCO action / DCO (pull_request) Failing after 38s
Tests and linters / Run gofumpt (pull_request) Successful in 2m13s
Tests and linters / Staticcheck (pull_request) Successful in 2m57s
Pre-commit hooks / Pre-commit (pull_request) Successful in 3m27s
Build / Build Components (pull_request) Successful in 3m21s
Vulncheck / Vulncheck (pull_request) Successful in 3m17s
Tests and linters / gopls check (pull_request) Successful in 4m6s
Tests and linters / Tests with -race (pull_request) Successful in 4m40s
Tests and linters / Lint (pull_request) Successful in 6m4s
Tests and linters / Tests (pull_request) Successful in 9m15s
89d5d387fc
Before, when the target RPC server wasn't available, requests made
by CLI didn't wait for a timeout specified by the '--timeout' option
because of the default backoff strategy. Adding this option fixes
that behavior.

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
a-savchuk force-pushed correct-timeout-for-cli from 89d5d387fc to d673ed73ba 2024-10-22 07:32:35 +00:00 Compare
a-savchuk force-pushed correct-timeout-for-cli from d673ed73ba to 0a752752ec 2024-10-22 07:36:59 +00:00 Compare
a-savchuk changed title from WIP: [#xx] cli: Use grpc.WaitForReady while initializing SDK client to cli: Use grpc.WaitForReady while initializing SDK client 2024-10-22 07:44:18 +00:00
a-savchuk force-pushed correct-timeout-for-cli from 0a752752ec to 969d3ba028 2024-10-22 07:45:08 +00:00 Compare
a-savchuk requested review from storage-core-developers 2024-10-22 07:50:58 +00:00
a-savchuk requested review from storage-core-committers 2024-10-22 07:50:58 +00:00
dstepanov-yadro requested changes 2024-10-22 10:45:26 +00:00
Dismissed
@ -58,6 +58,7 @@ func GetSDKClient(ctx context.Context, cmd *cobra.Command, key *ecdsa.PrivateKey
GRPCDialOptions: []grpc.DialOption{
grpc.WithChainUnaryInterceptor(tracing.NewUnaryClientInteceptor()),
grpc.WithChainStreamInterceptor(tracing.NewStreamClientInterceptor()),
grpc.WithDefaultCallOptions(grpc.WaitForReady(true)),

what about tree service and grpc clients (object, container)?

what about tree service and grpc clients (object, container)?
Author
Member

Done

Before:

$ time ./frostfs-cli -c cli.yaml -r 10.78.130.238:8080 --timeout 30s \
        tree list --cid 9HAXqftkMZAPRpStA1pQHLqxL7gj9mdihY5K9hv4Fvr
failed to call treeList rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout"

real    0m20.671s
user    0m0.654s
sys     0m0.041s

After:

$ time ./frostfs-cli -c cli.yaml -r 10.78.130.238:8080 --timeout 30s \
        tree list --cid 9HAXqftkMZAPRpStA1pQHLqxL7gj9mdihY5K9hv4Fvr
failed to call treeList rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout"

real    0m30.724s
user    0m0.688s
sys     0m0.065s
Done Before: ``` $ time ./frostfs-cli -c cli.yaml -r 10.78.130.238:8080 --timeout 30s \ tree list --cid 9HAXqftkMZAPRpStA1pQHLqxL7gj9mdihY5K9hv4Fvr failed to call treeList rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout" real 0m20.671s user 0m0.654s sys 0m0.041s ``` After: ``` $ time ./frostfs-cli -c cli.yaml -r 10.78.130.238:8080 --timeout 30s \ tree list --cid 9HAXqftkMZAPRpStA1pQHLqxL7gj9mdihY5K9hv4Fvr failed to call treeList rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout" real 0m30.724s user 0m0.688s sys 0m0.065s ```
Owner

What about internal client in node?
It seems not only CLI is affected.

What about internal client in node? It seems not only CLI is affected.

Here too:

grpcOpts := []grpc.DialOption{

Here too: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/8b6ec57c6147e5b784d78bc891144dd55493503d/pkg/network/cache/multi.go#L63
Author
Member

I thought we've decided not to change a client in node. As for me, it's okay when a client fails as soon as it knows it can't dial connection because the target host is unavailable

I thought we've decided not to change a client in node. As for me, it's okay when a client fails as soon as it knows it can't dial connection because the target host is unavailable
Author
Member

The idea of the task was to make a timeout that the CLI (only CLI) waits for and a timeout specified by user equal

The idea of the task was to make a timeout that the CLI (only CLI) waits for and a timeout specified by user equal

But what for client -> node1 -> node2 scenario?

But what for `client -> node1 -> node2` scenario?
Author
Member

I can do that but either way it won't work as we want

The help info for frostfs-cli says that a user specifies timeout for an operation. Suppose a dial timeout for client is specified by the user but node1 creates its own client with timeouts configured for node1, so the entire operation may fail before the user specified deadline exceeded

I can do that but either way it won't work as we want The help info for `frostfs-cli` says that a user specifies timeout for *an operation*. Suppose a dial timeout for `client` is specified by the user but `node1` creates its own client with timeouts configured for `node1`, so the entire operation may fail before the user specified deadline exceeded
Author
Member

So, what's the final decision? To do or not to do?

So, what's the final decision? To do or not to do?
Author
Member

Also added this option to the node's internal client

Also added this option to the node's internal client
a-savchuk force-pushed correct-timeout-for-cli from 969d3ba028 to 602c42a237 2024-10-22 12:19:39 +00:00 Compare
acid-ant approved these changes 2024-10-23 08:18:56 +00:00
Dismissed
a-savchuk added 2 commits 2024-10-23 11:05:47 +00:00
Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
[#1441] services/tree: Use grpc.WaitForReady option when creating client
All checks were successful
DCO action / DCO (pull_request) Successful in 1m32s
Pre-commit hooks / Pre-commit (pull_request) Successful in 2m11s
Tests and linters / Run gofumpt (pull_request) Successful in 2m7s
Vulncheck / Vulncheck (pull_request) Successful in 2m13s
Build / Build Components (pull_request) Successful in 2m30s
Tests and linters / gopls check (pull_request) Successful in 2m49s
Tests and linters / Staticcheck (pull_request) Successful in 3m8s
Tests and linters / Lint (pull_request) Successful in 3m49s
Tests and linters / Tests (pull_request) Successful in 4m17s
Tests and linters / Tests with -race (pull_request) Successful in 6m0s
3914369e96
Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
a-savchuk dismissed acid-ant's review 2024-10-23 11:05:47 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

fyrchik approved these changes 2024-10-23 11:23:49 +00:00
acid-ant approved these changes 2024-10-23 11:35:58 +00:00
a-savchuk force-pushed correct-timeout-for-cli from 3914369e96 to 65a4320c75 2024-10-23 11:45:47 +00:00 Compare
dstepanov-yadro approved these changes 2024-10-23 13:02:23 +00:00
dstepanov-yadro merged commit 65a4320c75 into master 2024-10-23 13:02:45 +00:00
Sign in to join this conversation.
No reviewers
TrueCloudLab/storage-core-developers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1441
No description provided.