Optimize existence check in the metabase #874

Closed
opened 2023-12-18 08:51:38 +00:00 by fyrchik · 4 comments
Owner

To check whether an object exists we check multiple buckets (locked, gc, graveyard etc.)
Maybe it would be beneficial to store object status by key separately. This way status and existence checks will consult a single bucket.

To check whether an object exists we check multiple buckets (locked, gc, graveyard etc.) _Maybe_ it would be beneficial to store object status by key separately. This way status and existence checks will consult a single bucket.
fyrchik added the
enhancement
frostfs-node
labels 2023-12-18 08:51:38 +00:00

Now metabase existence check works pretty fast: 5ms for 95q on 1.2K RPS. So multiple buckets check is a problem?

image

image

Now metabase existence check works pretty fast: 5ms for 95q on 1.2K RPS. So multiple buckets check is a problem? ![image](/attachments/af18ed49-2e5e-48e8-bfae-21c09ed071f9) ![image](/attachments/b00657c1-15c7-4645-926c-d8cc554e19d6)
Author
Owner

This task is about supporting >60 shards with a rack. We currently linearly check each shard, so even tiny improvement will be multiplied by the amount of shards, so benchmarks should be done in this setup.
WIth big objects I can already see existence checks being noticeable in terms of the amount of allocated objects in this scenario.

This task is about supporting >60 shards with a rack. We currently linearly check each shard, so even tiny improvement will be multiplied by the amount of shards, so benchmarks should be done in this setup. WIth big objects I can already see existence checks being noticeable in terms of the amount of allocated objects in this scenario.

Ok, but one more metabase bucket will decrease write latency and increase space amplification, need keep it in mind.

Also it is possible to run existance check concurrently. I read comment in the source code and in the issue https://github.com/nspcc-dev/neofs-node/issues/1146. But I also checked it by myself, and looks like in case of a lot of shards concurrency can lead to a significant performance improvement.

There are benchmark results on my laptop:

master:

goos: linux
goarch: amd64
pkg: git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/engine
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkExists/2_shards-8                 57301             19638 ns/op            5456 B/op        120 allocs/op
BenchmarkExists/4_shards-8                 28542             37502 ns/op           10496 B/op        230 allocs/op
BenchmarkExists/8_shards-8                 16347             72513 ns/op           20576 B/op        450 allocs/op
BenchmarkExists/12_shards-8                10393            103917 ns/op           30224 B/op        661 allocs/op
BenchmarkExists/16_shards-8                10435            110501 ns/op           30657 B/op        670 allocs/op
PASS
ok      git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/engine      59.305s

with concurrency:

goos: linux
goarch: amd64
pkg: git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/engine
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkExists/2_shards-8                 47755             24069 ns/op            5893 B/op        129 allocs/op
BenchmarkExists/4_shards-8                 33543             35749 ns/op           11140 B/op        243 allocs/op
BenchmarkExists/8_shards-8                 22914             51463 ns/op           21636 B/op        471 allocs/op
BenchmarkExists/12_shards-8                18244             68309 ns/op           31701 B/op        690 allocs/op
BenchmarkExists/16_shards-8                18285             65104 ns/op           32132 B/op        699 allocs/op
PASS
ok      git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/engine      59.434s
Ok, but one more metabase bucket will decrease write latency and increase space amplification, need keep it in mind. Also it is possible to run existance check concurrently. I read comment in the source code and in the issue https://github.com/nspcc-dev/neofs-node/issues/1146. But I also checked it by myself, and looks like in case of a lot of shards concurrency can lead to a significant performance improvement. There are benchmark results on my laptop: master: ``` goos: linux goarch: amd64 pkg: git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/engine cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz BenchmarkExists/2_shards-8 57301 19638 ns/op 5456 B/op 120 allocs/op BenchmarkExists/4_shards-8 28542 37502 ns/op 10496 B/op 230 allocs/op BenchmarkExists/8_shards-8 16347 72513 ns/op 20576 B/op 450 allocs/op BenchmarkExists/12_shards-8 10393 103917 ns/op 30224 B/op 661 allocs/op BenchmarkExists/16_shards-8 10435 110501 ns/op 30657 B/op 670 allocs/op PASS ok git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/engine 59.305s ``` with concurrency: ``` goos: linux goarch: amd64 pkg: git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/engine cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz BenchmarkExists/2_shards-8 47755 24069 ns/op 5893 B/op 129 allocs/op BenchmarkExists/4_shards-8 33543 35749 ns/op 11140 B/op 243 allocs/op BenchmarkExists/8_shards-8 22914 51463 ns/op 21636 B/op 471 allocs/op BenchmarkExists/12_shards-8 18244 68309 ns/op 31701 B/op 690 allocs/op BenchmarkExists/16_shards-8 18285 65104 ns/op 32132 B/op 699 allocs/op PASS ok git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/engine 59.434s ```
fyrchik added this to the v0.38.0 milestone 2023-12-22 07:22:24 +00:00
fyrchik added the
perfomance
label 2024-01-11 20:44:40 +00:00
dstepanov-yadro self-assigned this 2024-01-16 05:32:28 +00:00

Now existance check performed concurrently.

Now existance check performed concurrently.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#874
No description provided.