Support active RPC limiting #1639

a-savchuk · 2025-02-07T13:56:17Z

a-savchuk commented

2025-02-07 13:56:17 +00:00

Adopt TrueCloudLab/frostfs-qos#4 to limit the number of active RPCs

Eliminate pool usage for Put/PutSingle RPCs
Allow configuration of active RPC limits
Apply RPC limiting for all services except the control service

I used a virtual cluster to check limits are applied, the limits config looks as follows:

rpc:
  limits:
    - methods:
      - /neo.fs.v2.object.ObjectService/Put
      - /neo.fs.v2.object.ObjectService/PutSingle
      max_ops: 100
    - methods:
      - /neo.fs.v2.object.ObjectService/Delete
      max_ops: 100

Since this limiter limits the number of active RPC rather than the rate, I tried to estimate that number using the existing metrics, calculating it as a product of a RPC rate and RPC latency. You can see the results below.

Also I check the ability to change limits on a service reload, changed the config from this

rpc:
  limits:
    - methods:
      - /neo.fs.v2.object.ObjectService/Put
      - /neo.fs.v2.object.ObjectService/PutSingle
      max_ops: 100

to this

rpc:
  limits:
    - methods:
      - /neo.fs.v2.object.ObjectService/Put
      - /neo.fs.v2.object.ObjectService/PutSingle
      max_ops: 50

See results below.

Adopt https://git.frostfs.info/TrueCloudLab/frostfs-qos/pulls/4 to limit the number of active RPCs - Eliminate pool usage for Put/PutSingle RPCs - Allow configuration of active RPC limits - Apply RPC limiting for all services except the control service I used a virtual cluster to check limits are applied, the limits config looks as follows: ```yaml rpc: limits: - methods: - /neo.fs.v2.object.ObjectService/Put - /neo.fs.v2.object.ObjectService/PutSingle max_ops: 100 - methods: - /neo.fs.v2.object.ObjectService/Delete max_ops: 100 ``` Since this limiter limits the number of active RPC rather than the rate, I tried to estimate that number using the existing metrics, calculating it as a product of a RPC rate and RPC latency. You can see the results below. <img src="/attachments/4185927c-6ba2-4c82-8688-92aaa8fcfcaf" alt="pic" width="400"/> <img src="/attachments/338cc36b-586b-45cb-8373-77cb2ccb4600" alt="pic" width="400"/> Also I check the ability to change limits on a service reload, changed the config from this ```yaml rpc: limits: - methods: - /neo.fs.v2.object.ObjectService/Put - /neo.fs.v2.object.ObjectService/PutSingle max_ops: 100 ``` to this ```yaml rpc: limits: - methods: - /neo.fs.v2.object.ObjectService/Put - /neo.fs.v2.object.ObjectService/PutSingle max_ops: 50 ``` See results below. <img src="/attachments/1bc24ece-21c3-4f6c-8b6b-c5c9c6f106c5" alt="pic" width="400"/>

put.png

56 KiB

delete.png

60 KiB

put_reload.png

56 KiB

a-savchuk added 6 commits 2025-02-07 13:56:18 +00:00

[#xx] config: Separate replicator.pool_size from other settings 12839e0087

Separated `replicator.pool_size` and `object.put.remote_pool_size` settings.

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>

[#xx] services/object: Remove non-blocking pools for Put operation 268c8a4f9a

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>

[#xx] config/object: Remove pool settings for Put operation 888ad8188b

Removed `object.put.remote_pool_size` and `object.put.local_pool_size` settings.

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>

[#xx] go.mod: Update sdk-go 4bf3ff92fd

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>

[#xx] qos: Add interceptors for limiting active RPCs 8efd7eaac9

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>

[#xx] node: Support active RPC limiting

DCO action / DCO (pull_request) Failing after 33s

Details

Tests and linters / Tests with -race (pull_request) Failing after 33s

Details

Build / Build Components (pull_request) Failing after 42s

Details

Tests and linters / Run gofumpt (pull_request) Successful in 35s

Details

Tests and linters / Tests (pull_request) Failing after 42s

Details

Vulncheck / Vulncheck (pull_request) Failing after 57s

Details

Tests and linters / Staticcheck (pull_request) Failing after 1m8s

Details

Tests and linters / Lint (pull_request) Failing after 1m34s

Details

Pre-commit hooks / Pre-commit (pull_request) Failing after 1m41s

Details

Tests and linters / gopls check (pull_request) Failing after 2m37s

Details

bc8616323d

- Allow configuration of active RPC limits for method groups
- Apply RPC limiting for all services except the control service

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>

a-savchuk changed title from ~~WIP: node: Support active RPC limiting~~ to WIP: Support active RPC limiting

2025-02-07 13:56:29 +00:00

a-savchuk force-pushed rps-limiting from bc8616323d to 36844001c3

2025-02-07 13:59:31 +00:00

Compare

a-savchuk force-pushed rps-limiting from 36844001c3 to 94c2d596c1

2025-02-07 14:26:26 +00:00

Compare

a-savchuk force-pushed rps-limiting from 94c2d596c1 to 81ba3de84a

2025-02-07 14:32:08 +00:00

Compare

a-savchuk force-pushed rps-limiting from 81ba3de84a to 084218f100

2025-02-20 08:20:20 +00:00

Compare

a-savchuk force-pushed rps-limiting from 084218f100 to bbc56251cb

2025-02-21 08:34:47 +00:00

Compare

a-savchuk force-pushed rps-limiting from bbc56251cb to da330ff5cb

2025-02-21 08:50:50 +00:00

Compare

a-savchuk force-pushed rps-limiting from da330ff5cb to 02bbacb071

2025-02-21 11:05:20 +00:00

Compare

a-savchuk changed title from ~~WIP: Support active RPC limiting~~ to Support active RPC limiting

2025-02-21 11:12:14 +00:00

requested reviews from storage-core-committers, storage-core-developers

2025-02-21 11:12:14 +00:00

dstepanov-yadro requested changes 2025-02-21 13:25:10 +00:00

Dismissed

internal/qos/grpc.go Outdated

					
				@ -52,0 +79,4 @@

						if !ok {

							return new(apistatus.ResourceExhausted)

						}

						defer release()

dstepanov-yadro commented

2025-02-21 13:24:54 +00:00

The current behavior is as follows: the counter increases, the handler is called, the counter decreases, ss grpc.ServerStream will continue to work.
You need to make a wrapper over ss grpc.ServerStream that will decrease the counter when the stream is closed.

The current behavior is as follows: the counter increases, the handler is called, the counter decreases, `ss grpc.ServerStream` will continue to work. You need to make a wrapper over `ss grpc.ServerStream` that will decrease the counter when the stream is closed.

a-savchuk commented

2025-02-21 14:00:41 +00:00

Are you sure? I always thought an interceptor's handler is called only once. I don't need to track each message, so I can reduce the counter on the handler exit, can't I?

dstepanov-yadro commented

2025-02-21 14:03:14 +00:00

Now I'm not sure:) So if you checked and it works as expected, then it is great!

a-savchuk commented

2025-02-21 14:06:35 +00:00

I'll check it one more time

a-savchuk commented

2025-02-24 06:58:24 +00:00

I think the handler calls RecvMsg and other stream methods itself, so if I don't need to process each stream message, I can simply call the handler.

handler invocation from the source code
65c6718afb/server.go (L1570-L1579)
65c6718afb/server.go (L1694-L1703)
server interceptor example
65c6718afb/examples/features/interceptor/server/main.go

I think the handler calls `RecvMsg` and other stream methods itself, so if I don't need to process each stream message, I can simply call the handler. - handler invocation from the source code https://github.com/grpc/grpc-go/blob/65c6718afb5d5d41a277f4b3c6439ac80bb38292/server.go#L1570-L1579 https://github.com/grpc/grpc-go/blob/65c6718afb5d5d41a277f4b3c6439ac80bb38292/server.go#L1694-L1703 - server interceptor example https://github.com/grpc/grpc-go/blob/65c6718afb5d5d41a277f4b3c6439ac80bb38292/examples/features/interceptor/server/main.go

dstepanov-yadro marked this conversation as resolved

pkg/services/object/util/log.go Outdated

					
				@ -20,4 +20,0 @@

				// LogWorkerPoolError writes debug error message of object worker pool to provided logger.

				func LogWorkerPoolError(ctx context.Context, l *logger.Logger, req string, err error) {

					l.Error(ctx, logs.UtilCouldNotPushTaskToWorkerPool,

dstepanov-yadro commented

2025-02-21 13:18:58 +00:00

Also drop logs.UtilCouldNotPushTaskToWorkerPool

Also drop `logs.UtilCouldNotPushTaskToWorkerPool`

a-savchuk commented

2025-02-21 14:06:03 +00:00

Done

dstepanov-yadro marked this conversation as resolved

a-savchuk force-pushed rps-limiting from 02bbacb071 to 72685e35ee

2025-02-21 14:05:23 +00:00

Compare

dstepanov-yadro approved these changes 2025-02-21 14:20:21 +00:00

Dismissed

requested reviews from storage-core-committers, storage-core-developers

2025-02-24 08:21:22 +00:00

aarifullin reviewed 2025-02-25 15:21:23 +00:00

cmd/frostfs-node/config/replicator/config.go Outdated

					
				@ -29,2 +31,4 @@

				// PoolSize returns the value of "pool_size" config parameter

				// from "replicator" section.

				//

				// Returns PoolSizeDefault if the value is not positive integer.

aarifullin commented

2025-02-25 15:21:23 +00:00

not positive -> non-positive :)

`not positive` -> `non-positive` :)

a-savchuk commented

2025-02-26 08:03:22 +00:00

Fixed

aarifullin marked this conversation as resolved

aarifullin reviewed 2025-02-25 15:24:01 +00:00

cmd/frostfs-node/config/rpc/config.go Outdated

					
				@ -0,0 +22,4 @@

					var limits []LimitConfig

					var i uint64

					for ; ; i++ {

aarifullin commented

2025-02-25 15:24:00 +00:00

Optional change request

How about for i := uint64(0); ; i++ as we don't this variable outside?

##### Optional change request How about `for i := uint64(0); ; i++` as we don't this variable outside?

a-savchuk commented

2025-02-26 08:03:27 +00:00

Done

aarifullin marked this conversation as resolved

a-savchuk force-pushed rps-limiting from 72685e35ee to ad3b32eade

2025-02-26 08:01:29 +00:00

Compare

a-savchuk dismissed dstepanov-yadro's review 2025-02-26 08:01:30 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

fyrchik reviewed 2025-02-26 10:58:34 +00:00

cmd/frostfs-node/grpc.go Outdated

					
				@ -134,11 +136,13 @@ func getGrpcServerOpts(ctx context.Context, c *cfg, sc *grpcconfig.Config) ([]gr

							qos.NewUnaryServerInterceptor(),

							metrics.NewUnaryServerInterceptor(),

							tracing.NewUnaryServerInterceptor(),

							qosInternal.NewMaxActiveRPCLimiterUnaryServerInterceptor(func() limiting.Limiter { return c.cfgGRPC.limiter.Load() }),

fyrchik commented

2025-02-26 10:54:42 +00:00

We could use c.cfgGRPC.limiter.Load instead of func() limiting.Limiter { return c.cfgGRPC.limiter.Load() }.
Was your choice deliberate?

We could use `c.cfgGRPC.limiter.Load` instead of `func() limiting.Limiter { return c.cfgGRPC.limiter.Load() }`. Was your choice deliberate?

a-savchuk commented

2025-02-26 11:15:06 +00:00

NewMaxActiveRPCLimiterUnaryServerInterceptor receives the Limiter interface, but c.cfgGRPC.limiter.Load returns the *SemaphoreLimiter

cannot use c.cfgGRPC.limiter.Load (value of type func() *limiting.SemaphoreLimiter) as func() limiting.Limiter value in argument to qosInternal.NewMaxActiveRPCLimiterStreamServerInterceptor

`NewMaxActiveRPCLimiterUnaryServerInterceptor` receives the `Limiter` interface, but `c.cfgGRPC.limiter.Load` returns the `*SemaphoreLimiter` ``` cannot use c.cfgGRPC.limiter.Load (value of type func() *limiting.SemaphoreLimiter) as func() limiting.Limiter value in argument to qosInternal.NewMaxActiveRPCLimiterStreamServerInterceptor ```

👍 1

fyrchik marked this conversation as resolved

internal/qos/grpc.go Outdated

					
				@ -52,0 +68,4 @@

					}

				}

				//nolint:contextcheck (false positive)

fyrchik commented

2025-02-26 10:47:21 +00:00

false positive implies a bug in linter.
Here we have no bug in linter, just cumbersome gRPC API.

`false positive` implies a bug in linter. Here we have no bug in linter, just cumbersome gRPC API.

a-savchuk commented

2025-02-26 11:25:09 +00:00

Made the message more clear

fyrchik marked this conversation as resolved

internal/qos/grpc.go Outdated

					
				@ -52,0 +69,4 @@

				}

				//nolint:contextcheck (false positive)

				func NewMaxActiveRPCLimiterStreamServerInterceptor(getLimiter func() limiting.Limiter) grpc.StreamServerInterceptor {

fyrchik commented

2025-02-26 10:57:45 +00:00

Refs #1656.

fyrchik reviewed 2025-02-26 11:03:50 +00:00

cmd/frostfs-node/config.go Outdated

					
				@ -1416,0 +1413,4 @@

				func (c *cfg) reloadLimits() error {

					var limits []limiting.KeyLimit

					for _, l := range rpcconfig.Limits(c.appCfg) {

						limits = append(limits, limiting.KeyLimit{Keys: l.Methods, Limit: l.MaxOps})

fyrchik commented

2025-02-26 11:03:50 +00:00

We currently do not check for Keys uniqueness inside limits array.
This may be either a feature or a misconfiguration.
I don't think we need such feature, at the very least, the last item should take priority. And without wildcards, such feature has very limited use.

So, what about checking for uniqueness?
Also, there may be typos, proto names are not short. Do we have any solution for this?

We currently do not check for `Keys` uniqueness inside `limits` array. This may be either a feature or a misconfiguration. I don't think we need such feature, at the very least, the last item should take priority. And without wildcards, such feature has very limited use. So, what about checking for uniqueness? Also, there may be typos, proto names are not short. Do we have any solution for this?

fyrchik commented

2025-02-26 11:17:29 +00:00

c.cfgGRPC.servers[0].Server.GetServiceInfo() could be of use.

`c.cfgGRPC.servers[0].Server.GetServiceInfo()` could be of use.

fyrchik commented

2025-02-26 11:18:08 +00:00

As an example of where this will be useful -- future transition from neo.fs to frostfs namespace.

As an example of where this will be useful -- future transition from `neo.fs` to `frostfs` namespace.

a-savchuk commented

2025-02-26 11:21:23 +00:00

So, what about checking for uniqueness?

The limiter already do this check

 func (lr *SemaphoreLimiter) addLimit(limit *KeyLimit, sem *semaphore.Semaphore) error {
 	for _, key := range limit.Keys {
 		if _, exists := lr.m[key]; exists {
 			return fmt.Errorf("duplicate key %q", key)
 		}
 		lr.m[key] = sem
 	}
 	return nil
 }

> So, what about checking for uniqueness? The limiter already do this check https://git.frostfs.info/TrueCloudLab/frostfs-qos/src/commit/356851eed3bf77e13c87bca9606935c8e7c58769/limiting/limiter.go#L55-L63

a-savchuk commented

2025-02-26 11:24:24 +00:00