Support active RPC limiting #1639

Merged
dstepanov-yadro merged 5 commits from a-savchuk/frostfs-node:rps-limiting into master 2025-02-28 11:08:10 +00:00
Member

Adopt TrueCloudLab/frostfs-qos#4 to limit the number of active RPCs

  • Eliminate pool usage for Put/PutSingle RPCs
  • Allow configuration of active RPC limits
  • Apply RPC limiting for all services except the control service

I used a virtual cluster to check limits are applied, the limits config looks as follows:

rpc:
  limits:
    - methods:
      - /neo.fs.v2.object.ObjectService/Put
      - /neo.fs.v2.object.ObjectService/PutSingle
      max_ops: 100
    - methods:
      - /neo.fs.v2.object.ObjectService/Delete
      max_ops: 100

Since this limiter limits the number of active RPC rather than the rate, I tried to estimate that number using the existing metrics, calculating it as a product of a RPC rate and RPC latency. You can see the results below.

pic pic

Also I check the ability to change limits on a service reload, changed the config from this

rpc:
  limits:
    - methods:
      - /neo.fs.v2.object.ObjectService/Put
      - /neo.fs.v2.object.ObjectService/PutSingle
      max_ops: 100

to this

rpc:
  limits:
    - methods:
      - /neo.fs.v2.object.ObjectService/Put
      - /neo.fs.v2.object.ObjectService/PutSingle
      max_ops: 50

See results below.

pic
Adopt https://git.frostfs.info/TrueCloudLab/frostfs-qos/pulls/4 to limit the number of active RPCs - Eliminate pool usage for Put/PutSingle RPCs - Allow configuration of active RPC limits - Apply RPC limiting for all services except the control service I used a virtual cluster to check limits are applied, the limits config looks as follows: ```yaml rpc: limits: - methods: - /neo.fs.v2.object.ObjectService/Put - /neo.fs.v2.object.ObjectService/PutSingle max_ops: 100 - methods: - /neo.fs.v2.object.ObjectService/Delete max_ops: 100 ``` Since this limiter limits the number of active RPC rather than the rate, I tried to estimate that number using the existing metrics, calculating it as a product of a RPC rate and RPC latency. You can see the results below. <img src="/attachments/4185927c-6ba2-4c82-8688-92aaa8fcfcaf" alt="pic" width="400"/> <img src="/attachments/338cc36b-586b-45cb-8373-77cb2ccb4600" alt="pic" width="400"/> Also I check the ability to change limits on a service reload, changed the config from this ```yaml rpc: limits: - methods: - /neo.fs.v2.object.ObjectService/Put - /neo.fs.v2.object.ObjectService/PutSingle max_ops: 100 ``` to this ```yaml rpc: limits: - methods: - /neo.fs.v2.object.ObjectService/Put - /neo.fs.v2.object.ObjectService/PutSingle max_ops: 50 ``` See results below. <img src="/attachments/1bc24ece-21c3-4f6c-8b6b-c5c9c6f106c5" alt="pic" width="400"/>
a-savchuk added 6 commits 2025-02-07 13:56:18 +00:00
Separated `replicator.pool_size` and `object.put.remote_pool_size` settings.

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
Removed `object.put.remote_pool_size` and `object.put.local_pool_size` settings.

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
[#xx] node: Support active RPC limiting
Some checks failed
DCO action / DCO (pull_request) Failing after 33s
Tests and linters / Tests with -race (pull_request) Failing after 33s
Build / Build Components (pull_request) Failing after 42s
Tests and linters / Run gofumpt (pull_request) Successful in 35s
Tests and linters / Tests (pull_request) Failing after 42s
Vulncheck / Vulncheck (pull_request) Failing after 57s
Tests and linters / Staticcheck (pull_request) Failing after 1m8s
Tests and linters / Lint (pull_request) Failing after 1m34s
Pre-commit hooks / Pre-commit (pull_request) Failing after 1m41s
Tests and linters / gopls check (pull_request) Failing after 2m37s
bc8616323d
- Allow configuration of active RPC limits for method groups
- Apply RPC limiting for all services except the control service

Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
a-savchuk changed title from WIP: node: Support active RPC limiting to WIP: Support active RPC limiting 2025-02-07 13:56:29 +00:00
a-savchuk force-pushed rps-limiting from bc8616323d to 36844001c3 2025-02-07 13:59:31 +00:00 Compare
a-savchuk force-pushed rps-limiting from 36844001c3 to 94c2d596c1 2025-02-07 14:26:26 +00:00 Compare
a-savchuk force-pushed rps-limiting from 94c2d596c1 to 81ba3de84a 2025-02-07 14:32:08 +00:00 Compare
a-savchuk force-pushed rps-limiting from 81ba3de84a to 084218f100 2025-02-20 08:20:20 +00:00 Compare
a-savchuk force-pushed rps-limiting from 084218f100 to bbc56251cb 2025-02-21 08:34:47 +00:00 Compare
a-savchuk force-pushed rps-limiting from bbc56251cb to da330ff5cb 2025-02-21 08:50:50 +00:00 Compare
a-savchuk force-pushed rps-limiting from da330ff5cb to 02bbacb071 2025-02-21 11:05:20 +00:00 Compare
a-savchuk changed title from WIP: Support active RPC limiting to Support active RPC limiting 2025-02-21 11:12:14 +00:00
requested reviews from storage-core-committers, storage-core-developers 2025-02-21 11:12:14 +00:00
dstepanov-yadro requested changes 2025-02-21 13:25:10 +00:00
Dismissed
@ -52,0 +79,4 @@
if !ok {
return new(apistatus.ResourceExhausted)
}
defer release()

The current behavior is as follows: the counter increases, the handler is called, the counter decreases, ss grpc.ServerStream will continue to work.
You need to make a wrapper over ss grpc.ServerStream that will decrease the counter when the stream is closed.

The current behavior is as follows: the counter increases, the handler is called, the counter decreases, `ss grpc.ServerStream` will continue to work. You need to make a wrapper over `ss grpc.ServerStream` that will decrease the counter when the stream is closed.
Author
Member

Are you sure? I always thought an interceptor's handler is called only once. I don't need to track each message, so I can reduce the counter on the handler exit, can't I?

Are you sure? I always thought an interceptor's handler is called only once. I don't need to track each message, so I can reduce the counter on the handler exit, can't I?

Now I'm not sure:) So if you checked and it works as expected, then it is great!

Now I'm not sure:) So if you checked and it works as expected, then it is great!
Author
Member

I'll check it one more time

I'll check it one more time
Author
Member

I think the handler calls RecvMsg and other stream methods itself, so if I don't need to process each stream message, I can simply call the handler.

I think the handler calls `RecvMsg` and other stream methods itself, so if I don't need to process each stream message, I can simply call the handler. - handler invocation from the source code https://github.com/grpc/grpc-go/blob/65c6718afb5d5d41a277f4b3c6439ac80bb38292/server.go#L1570-L1579 https://github.com/grpc/grpc-go/blob/65c6718afb5d5d41a277f4b3c6439ac80bb38292/server.go#L1694-L1703 - server interceptor example https://github.com/grpc/grpc-go/blob/65c6718afb5d5d41a277f4b3c6439ac80bb38292/examples/features/interceptor/server/main.go
dstepanov-yadro marked this conversation as resolved
@ -20,4 +20,0 @@
// LogWorkerPoolError writes debug error message of object worker pool to provided logger.
func LogWorkerPoolError(ctx context.Context, l *logger.Logger, req string, err error) {
l.Error(ctx, logs.UtilCouldNotPushTaskToWorkerPool,

Also drop logs.UtilCouldNotPushTaskToWorkerPool

Also drop `logs.UtilCouldNotPushTaskToWorkerPool`
Author
Member

Done

Done
dstepanov-yadro marked this conversation as resolved
a-savchuk force-pushed rps-limiting from 02bbacb071 to 72685e35ee 2025-02-21 14:05:23 +00:00 Compare
dstepanov-yadro approved these changes 2025-02-21 14:20:21 +00:00
Dismissed
requested reviews from storage-core-committers, storage-core-developers 2025-02-24 08:21:24 +00:00
aarifullin reviewed 2025-02-25 15:21:23 +00:00
@ -29,2 +31,4 @@
// PoolSize returns the value of "pool_size" config parameter
// from "replicator" section.
//
// Returns PoolSizeDefault if the value is not positive integer.
Member

not positive -> non-positive :)

`not positive` -> `non-positive` :)
Author
Member

Fixed

Fixed
aarifullin marked this conversation as resolved
aarifullin reviewed 2025-02-25 15:24:01 +00:00
@ -0,0 +22,4 @@
var limits []LimitConfig
var i uint64
for ; ; i++ {
Member
Optional change request

How about for i := uint64(0); ; i++ as we don't this variable outside?

##### Optional change request How about `for i := uint64(0); ; i++` as we don't this variable outside?
Author
Member

Done

Done
aarifullin marked this conversation as resolved
a-savchuk force-pushed rps-limiting from 72685e35ee to ad3b32eade 2025-02-26 08:01:29 +00:00 Compare
a-savchuk dismissed dstepanov-yadro's review 2025-02-26 08:01:30 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

fyrchik reviewed 2025-02-26 10:58:34 +00:00
@ -134,11 +136,13 @@ func getGrpcServerOpts(ctx context.Context, c *cfg, sc *grpcconfig.Config) ([]gr
qos.NewUnaryServerInterceptor(),
metrics.NewUnaryServerInterceptor(),
tracing.NewUnaryServerInterceptor(),
qosInternal.NewMaxActiveRPCLimiterUnaryServerInterceptor(func() limiting.Limiter { return c.cfgGRPC.limiter.Load() }),
Owner

We could use c.cfgGRPC.limiter.Load instead of func() limiting.Limiter { return c.cfgGRPC.limiter.Load() }.
Was your choice deliberate?

We could use `c.cfgGRPC.limiter.Load` instead of `func() limiting.Limiter { return c.cfgGRPC.limiter.Load() }`. Was your choice deliberate?
Author
Member

NewMaxActiveRPCLimiterUnaryServerInterceptor receives the Limiter interface, but c.cfgGRPC.limiter.Load returns the *SemaphoreLimiter

cannot use c.cfgGRPC.limiter.Load (value of type func() *limiting.SemaphoreLimiter) as func() limiting.Limiter value in argument to qosInternal.NewMaxActiveRPCLimiterStreamServerInterceptor
`NewMaxActiveRPCLimiterUnaryServerInterceptor` receives the `Limiter` interface, but `c.cfgGRPC.limiter.Load` returns the `*SemaphoreLimiter` ``` cannot use c.cfgGRPC.limiter.Load (value of type func() *limiting.SemaphoreLimiter) as func() limiting.Limiter value in argument to qosInternal.NewMaxActiveRPCLimiterStreamServerInterceptor ```
fyrchik marked this conversation as resolved
@ -52,0 +68,4 @@
}
}
//nolint:contextcheck (false positive)
Owner

false positive implies a bug in linter.
Here we have no bug in linter, just cumbersome gRPC API.

`false positive` implies a bug in linter. Here we have no bug in linter, just cumbersome gRPC API.
Author
Member

Made the message more clear

Made the message more clear
fyrchik marked this conversation as resolved
@ -52,0 +69,4 @@
}
//nolint:contextcheck (false positive)
func NewMaxActiveRPCLimiterStreamServerInterceptor(getLimiter func() limiting.Limiter) grpc.StreamServerInterceptor {
Owner

Refs #1656.

Refs #1656.
fyrchik reviewed 2025-02-26 11:03:50 +00:00
@ -1416,0 +1413,4 @@
func (c *cfg) reloadLimits() error {
var limits []limiting.KeyLimit
for _, l := range rpcconfig.Limits(c.appCfg) {
limits = append(limits, limiting.KeyLimit{Keys: l.Methods, Limit: l.MaxOps})
Owner

We currently do not check for Keys uniqueness inside limits array.
This may be either a feature or a misconfiguration.
I don't think we need such feature, at the very least, the last item should take priority. And without wildcards, such feature has very limited use.

So, what about checking for uniqueness?
Also, there may be typos, proto names are not short. Do we have any solution for this?

We currently do not check for `Keys` uniqueness inside `limits` array. This may be either a feature or a misconfiguration. I don't think we need such feature, at the very least, the last item should take priority. And without wildcards, such feature has very limited use. So, what about checking for uniqueness? Also, there may be typos, proto names are not short. Do we have any solution for this?
Owner

c.cfgGRPC.servers[0].Server.GetServiceInfo() could be of use.

`c.cfgGRPC.servers[0].Server.GetServiceInfo()` could be of use.
Owner

As an example of where this will be useful -- future transition from neo.fs to frostfs namespace.

As an example of where this will be useful -- future transition from `neo.fs` to `frostfs` namespace.
Author
Member

So, what about checking for uniqueness?

The limiter already do this check

func (lr *SemaphoreLimiter) addLimit(limit *KeyLimit, sem *semaphore.Semaphore) error {
for _, key := range limit.Keys {
if _, exists := lr.m[key]; exists {
return fmt.Errorf("duplicate key %q", key)
}
lr.m[key] = sem
}
return nil
}

> So, what about checking for uniqueness? The limiter already do this check https://git.frostfs.info/TrueCloudLab/frostfs-qos/src/commit/356851eed3bf77e13c87bca9606935c8e7c58769/limiting/limiter.go#L55-L63
Author
Member

Also, there may be typos, proto names are not short. Do we have any solution for this?

I'll add it

> Also, there may be typos, proto names are not short. Do we have any solution for this? I'll add it
a-savchuk force-pushed rps-limiting from ad3b32eade to 611b96f54b 2025-02-26 11:24:55 +00:00 Compare
dstepanov-yadro approved these changes 2025-02-28 08:48:19 +00:00
fyrchik approved these changes 2025-02-28 10:51:47 +00:00
dstepanov-yadro merged commit dae0949f6e into master 2025-02-28 11:08:10 +00:00
a-savchuk deleted branch rps-limiting 2025-02-28 14:29:10 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1639
No description provided.