Improve registry performance #136

Closed
opened 2024-05-02 15:59:07 +00:00 by dstepanov-yadro · 6 comments

There is an output example:

     data_received..............: 0 B     0 B/s
     data_sent..................: 57 GB   16 MB/s
     frostfs_obj_put_bytes......: 57 GB   16 MB/s
     frostfs_obj_put_duration...: avg=38.52ms  min=8.2ms    med=38.74ms  max=278.16ms p(90)=46.32ms  p(95)=49.25ms
     frostfs_obj_put_success....: 6945463 1929.087151/s
     iteration_duration.........: avg=310.98ms min=453.48µs med=313.15ms max=882.79ms p(90)=331.01ms p(95)=336.3ms
     iterations.................: 6945463 1929.087151/s
     vus........................: 600     min=600       max=600

Registry save latency is iteration_duration(p90) - frostfs_obj_put_duration(p90) = 331 - 46 = 285 ms. Looks that registry performance is around 10 time worse than storage.

Task to research and implement using of other storage engine for registry (badger, bitcask etc).

There is an output example: ``` data_received..............: 0 B 0 B/s data_sent..................: 57 GB 16 MB/s frostfs_obj_put_bytes......: 57 GB 16 MB/s frostfs_obj_put_duration...: avg=38.52ms min=8.2ms med=38.74ms max=278.16ms p(90)=46.32ms p(95)=49.25ms frostfs_obj_put_success....: 6945463 1929.087151/s iteration_duration.........: avg=310.98ms min=453.48µs med=313.15ms max=882.79ms p(90)=331.01ms p(95)=336.3ms iterations.................: 6945463 1929.087151/s vus........................: 600 min=600 max=600 ``` Registry save latency is `iteration_duration(p90) - frostfs_obj_put_duration(p90) = 331 - 46 = 285 ms`. Looks that registry performance is around 10 time worse than storage. Task to research and implement using of other storage engine for registry (badger, bitcask etc).
dstepanov-yadro added the
bug
perfomance
labels 2024-05-02 15:59:23 +00:00

We should first try having Batch instead of Update and async persist.
It is a lot cheaper and can provide substantial improvements.

We should first try having `Batch` instead of `Update` and async persist. It is a lot cheaper and can provide substantial improvements.
Poster
Collaborator

Using asynchronous persist also has its own problems: when the async queue is full (this is quite realistic on long-term tests), the write speed will decrease, since you can't just drop objects. This will lead to side effects of the test results, which will be difficult to interpret correctly.

Using asynchronous persist also has its own problems: when the async queue is full (this is quite realistic on long-term tests), the write speed will decrease, since you can't just drop objects. This will lead to side effects of the test results, which will be difficult to interpret correctly.

Badger won't solve our problems either, it is still DB operation whose duration is taken into account in the metrics. Also, the queue doesn't need to be bounded (by a static size, may be bounded by ram)

Badger won't solve our problems either, it is still DB operation whose duration is taken into account in the metrics. Also, the queue doesn't need to be bounded (by a static size, may be bounded by ram)

I am not opposed to the badger idea, I just think there are less expensive things to try first.
And there are possible problems with badger too (currently registry is a single file which may be easily copied by some existing pipelines, they will break).

I am not opposed to the badger idea, I just think there are less expensive things to try first. And there are possible problems with badger too (currently registry is a single file which _may be_ easily copied by some existing pipelines, they will break).
Poster
Collaborator

RAM is bounded too.
Badger won't solve our problems either - agree. The main idea is to improve registry saving performance, but make registry saving latency as much constant as possible.

RAM is bounded too. `Badger won't solve our problems either` - agree. The main idea is to improve registry saving performance, but make registry saving latency as much constant as possible.
Poster
Collaborator

@fyrchik Should task be closed?

@fyrchik Should task be closed?
Sign in to join this conversation.
There is no content yet.