cli: Use grpc.WaitForReady while initializing SDK client #1441

a-savchuk · 2024-10-22T07:31:51Z

a-savchuk commented

2024-10-22 07:31:51 +00:00

Before, when the target RPC server was unavailable, requests made by CLI didn't wait for a timeout specified by the --timeout option if the timeout was more than 20 seconds. It's because of the gRPC default backoff strategy. Adding this option fixes that behavior.

Before

$ time ./frostfs-cli -c cli.yaml --timeout 30s container list
rpc error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout"

real    0m20.443s
user    0m0.417s
sys     0m0.046s

After

$ time ./frostfs-cli -c cli.yaml --timeout 30s container list
can't create API client: can't init SDK client: context deadline exceeded

real    0m30.495s
user    0m0.479s
sys     0m0.048s

Before, when the target RPC server was unavailable, requests made by CLI didn't wait for a timeout specified by the `--timeout` option if the timeout was more than 20 seconds. It's because of the gRPC default backoff strategy. Adding this option fixes that behavior. Before ``` $ time ./frostfs-cli -c cli.yaml --timeout 30s container list rpc error: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout" real 0m20.443s user 0m0.417s sys 0m0.046s ``` After ``` $ time ./frostfs-cli -c cli.yaml --timeout 30s container list can't create API client: can't init SDK client: context deadline exceeded real 0m30.495s user 0m0.479s sys 0m0.048s ```

a-savchuk force-pushed correct-timeout-for-cli from 89d5d387fc to d673ed73ba

2024-10-22 07:32:35 +00:00

Compare

a-savchuk force-pushed correct-timeout-for-cli from d673ed73ba to 0a752752ec

2024-10-22 07:36:59 +00:00

Compare

a-savchuk changed title from ~~WIP: [#xx] cli: Use grpc.WaitForReady while initializing SDK client~~ to cli: Use grpc.WaitForReady while initializing SDK client

2024-10-22 07:44:18 +00:00

a-savchuk force-pushed correct-timeout-for-cli from 0a752752ec to 969d3ba028

2024-10-22 07:45:08 +00:00

Compare

a-savchuk referenced this pull request from TrueCloudLab/frostfs-sdk-go

2024-10-22 07:47:06 +00:00

WIP: client: Pass grpc.CallOption options on dial #287

a-savchuk referenced this pull request from TrueCloudLab/frostfs-api-go

2024-10-22 07:47:16 +00:00

WIP: rpc/client: Allow to pass custom grpc.CallOption options #124

requested reviews from storage-core-developers, storage-core-committers

2024-10-22 07:50:58 +00:00

dstepanov-yadro requested changes 2024-10-22 10:45:26 +00:00

Dismissed

cmd/frostfs-cli/internal/client/sdk.go Outdated

					
				@ -58,6 +58,7 @@ func GetSDKClient(ctx context.Context, cmd *cobra.Command, key *ecdsa.PrivateKey

						GRPCDialOptions: []grpc.DialOption{

							grpc.WithChainUnaryInterceptor(tracing.NewUnaryClientInteceptor()),

							grpc.WithChainStreamInterceptor(tracing.NewStreamClientInterceptor()),

							grpc.WithDefaultCallOptions(grpc.WaitForReady(true)),

dstepanov-yadro commented

2024-10-22 10:45:22 +00:00

what about tree service and grpc clients (object, container)?

👍 2

a-savchuk commented

2024-10-22 12:19:45 +00:00

Done

Before:

$ time ./frostfs-cli -c cli.yaml -r 10.78.130.238:8080 --timeout 30s \
        tree list --cid 9HAXqftkMZAPRpStA1pQHLqxL7gj9mdihY5K9hv4Fvr
failed to call treeList rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout"

real    0m20.671s
user    0m0.654s
sys     0m0.041s

After:

$ time ./frostfs-cli -c cli.yaml -r 10.78.130.238:8080 --timeout 30s \
        tree list --cid 9HAXqftkMZAPRpStA1pQHLqxL7gj9mdihY5K9hv4Fvr
failed to call treeList rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout"

real    0m30.724s
user    0m0.688s
sys     0m0.065s

Done Before: ``` $ time ./frostfs-cli -c cli.yaml -r 10.78.130.238:8080 --timeout 30s \ tree list --cid 9HAXqftkMZAPRpStA1pQHLqxL7gj9mdihY5K9hv4Fvr failed to call treeList rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout" real 0m20.671s user 0m0.654s sys 0m0.041s ``` After: ``` $ time ./frostfs-cli -c cli.yaml -r 10.78.130.238:8080 --timeout 30s \ tree list --cid 9HAXqftkMZAPRpStA1pQHLqxL7gj9mdihY5K9hv4Fvr failed to call treeList rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = "transport: Error while dialing: dial tcp 10.78.130.238:8080: i/o timeout" real 0m30.724s user 0m0.688s sys 0m0.065s ```

fyrchik commented

2024-10-22 12:24:56 +00:00

What about internal client in node?
It seems not only CLI is affected.

What about internal client in node? It seems not only CLI is affected.

dstepanov-yadro commented

2024-10-22 12:35:35 +00:00

Here too:

grpcOpts := []grpc.DialOption{

Here too: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/8b6ec57c6147e5b784d78bc891144dd55493503d/pkg/network/cache/multi.go#L63

a-savchuk commented

2024-10-22 12:40:50 +00:00

I thought we've decided not to change a client in node. As for me, it's okay when a client fails as soon as it knows it can't dial connection because the target host is unavailable

a-savchuk commented

2024-10-22 12:44:58 +00:00

The idea of the task was to make a timeout that the CLI (only CLI) waits for and a timeout specified by user equal

dstepanov-yadro commented

2024-10-22 12:49:21 +00:00

But what for client -> node1 -> node2 scenario?

But what for `client -> node1 -> node2` scenario?

👍 1

a-savchuk commented

2024-10-22 13:52:31 +00:00

I can do that but either way it won't work as we want

The help info for frostfs-cli says that a user specifies timeout for an operation. Suppose a dial timeout for client is specified by the user but node1 creates its own client with timeouts configured for node1, so the entire operation may fail before the user specified deadline exceeded

I can do that but either way it won't work as we want The help info for `frostfs-cli` says that a user specifies timeout for *an operation*. Suppose a dial timeout for `client` is specified by the user but `node1` creates its own client with timeouts configured for `node1`, so the entire operation may fail before the user specified deadline exceeded

a-savchuk commented

2024-10-23 08:24:38 +00:00

So, what's the final decision? To do or not to do?

a-savchuk commented

2024-10-23 12:59:18 +00:00

Also added this option to the node's internal client