Use clever batches for blobovnicza and metabase #627

New issue

Closed

opened 2023-08-21 10:20:24 +00:00 by fyrchik · 4 comments

fyrchik commented

2023-08-21 10:20:24 +00:00

Owner

In pilorama we have custom batching scheme. The difference is that our batch can accept new requests after the configured timeout but before the batch has actually started executing. This makes sense because we spend lots of time in Fdatasync.

The suggestion is:

Try to come up with a common interface. Pilorama batches know about Log operations, but blobovnicza and metabase batches are different.
Irregardless of whether we came up with a single interface, try adopting new batches in metabase and blobovnicza (batches in blobovnicza should be simpler, but batches in metabase can have an immediate observable effect on perfomance, because each shard has a single metabase).
As for perfomance tests, we need these to be long enough, so native go benchmarks are desireable, but won't show us the full picture.

In pilorama we have custom batching scheme. The difference is that our batch can accept new requests _after_ the configured timeout but before the batch has actually started executing. This makes sense because we spend lots of time in Fdatasync. The suggestion is: 1. Try to come up with a common interface. Pilorama batches know about Log operations, but blobovnicza and metabase batches are different. 2. Irregardless of whether we came up with a single interface, try adopting new batches in metabase and blobovnicza (batches in blobovnicza should be simpler, but batches in metabase can have an immediate observable effect on perfomance, because each shard has a single metabase). 3. As for perfomance tests, we need these to be long enough, so native go benchmarks are desireable, but won't show us the full picture.

fyrchik added this to the v0.37.0 milestone 2023-08-21 10:20:24 +00:00

fyrchik added the

frostfs-node

triage

labels 2023-08-21 10:20:24 +00:00

fyrchik commented

2023-08-21 10:22:45 +00:00

Author

Owner

Metabase batches are also a bit harder, because there are logical errors, so take this into account.

fyrchik commented

2023-08-21 10:25:32 +00:00

Author

Owner

As a nice side-effect, we can handle context.Context cancelation better.

As a nice side-effect, we can handle `context.Context` cancelation better.

fyrchik commented

2023-08-21 11:01:07 +00:00

Author

Owner

Actually, for blobovnicza we can go even further:
It has a simple key-value structure, so we can manage the transaction manually and cache bucket (tx.Bucket() always returns the same value). This may aleviate our problems with degradation, because we may perform PUT immediately, thus amortizing the cost over the time (instead of wating for batch delay and then doing it). In the current scheme Update/Batch lock the database exclusively.

For metabase similar optimization can be done, but we need to ensure that all GET operations (where logical error can occur) are done before the PUT ones.

Actually, for blobovnicza we can go even further: It has a simple key-value structure, so we can manage the transaction manually and cache bucket (`tx.Bucket()` always returns the same value). This may aleviate our problems with degradation, because we may perform PUT immediately, thus amortizing the cost over the time (instead of wating for batch delay and then doing it). In the current scheme `Update/Batch` lock the database exclusively. For metabase similar optimization can be done, but we need to ensure that all GET operations (where logical error can occur) are done before the PUT ones.

fyrchik modified the milestone from v0.37.0 to v0.38.0

2023-08-21 14:08:45 +00:00

fyrchik referenced this issue

2023-08-22 08:30:42 +00:00

Improve batching in badger #636

fyrchik modified the milestone from v0.38.0 to v0.39.0

2023-10-02 12:48:56 +00:00

fyrchik commented

2024-05-14 14:14:12 +00:00

Author

Owner

We will eventually move to a badger store or another storage with doesn't panic on disk removal, closing this.