Blobovnicza tree optimizations #1337

Merged
dstepanov-yadro merged 1 commit from dstepanov-yadro/frostfs-node:feat/blobovnicza_shrink into master 2024-10-26 11:30:25 +00:00
  1. Rebuild process adds temp file with .rebuild suffix. This allow to reduce DB count to open on blobovnicza tree initialization.

  2. Add frostfs-cli control shards rebuild command to rebuild blobovniczas to reduce disk usage.

  3. Drop completely background rebuild as now frostfs-cli control shards rebuild command exists and blobovnicza tree works after config change.

1. Rebuild process adds temp file with `.rebuild` suffix. This allow to reduce DB count to open on blobovnicza tree initialization. 2. Add `frostfs-cli control shards rebuild` command to rebuild blobovniczas to reduce disk usage. 3. Drop completely background rebuild as now `frostfs-cli control shards rebuild` command exists and blobovnicza tree works after config change.
dstepanov-yadro added 1 commit 2024-08-27 13:00:31 +00:00
[#9999] blobovniczatree: Add .rebuild temp files
All checks were successful
DCO action / DCO (pull_request) Successful in 1m37s
Tests and linters / Run gofumpt (pull_request) Successful in 1m37s
Vulncheck / Vulncheck (pull_request) Successful in 2m28s
Pre-commit hooks / Pre-commit (pull_request) Successful in 2m51s
Tests and linters / Tests (1.22) (pull_request) Successful in 2m53s
Build / Build Components (1.22) (pull_request) Successful in 3m3s
Build / Build Components (1.23) (pull_request) Successful in 3m2s
Tests and linters / Tests (1.23) (pull_request) Successful in 3m30s
Tests and linters / Staticcheck (pull_request) Successful in 3m33s
Tests and linters / Lint (pull_request) Successful in 3m41s
Tests and linters / gopls check (pull_request) Successful in 3m44s
Tests and linters / Tests with -race (pull_request) Successful in 4m7s
73b6bf5b96
This allows to reduce open/close DBs to check incompleted rebuilds.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 73b6bf5b96 to 79c8c89638 2024-08-29 09:13:19 +00:00 Compare
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 79c8c89638 to 27a4a6a882 2024-08-29 10:10:24 +00:00 Compare
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 27a4a6a882 to f48a68c274 2024-08-29 10:27:20 +00:00 Compare
dstepanov-yadro added 1 commit 2024-08-29 10:55:17 +00:00
[#1337] config: Move rebuild_worker_count to shard section
Some checks failed
Tests and linters / Run gofumpt (pull_request) Successful in 1m53s
DCO action / DCO (pull_request) Successful in 2m4s
Pre-commit hooks / Pre-commit (pull_request) Failing after 2m56s
Vulncheck / Vulncheck (pull_request) Successful in 2m41s
Tests and linters / Tests (1.23) (pull_request) Successful in 2m52s
Tests and linters / Tests (1.22) (pull_request) Successful in 2m55s
Build / Build Components (1.22) (pull_request) Successful in 2m54s
Build / Build Components (1.23) (pull_request) Successful in 3m7s
Tests and linters / Lint (pull_request) Successful in 3m40s
Tests and linters / Staticcheck (pull_request) Successful in 3m34s
Tests and linters / Tests with -race (pull_request) Successful in 4m14s
Tests and linters / gopls check (pull_request) Successful in 4m22s
6fc79de12f
This makes it simple to limit performance degradation for every shard
because of rebuild.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 6fc79de12f to 10a11315cc 2024-08-30 08:14:15 +00:00 Compare
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 10a11315cc to 032ef546b8 2024-08-30 16:25:22 +00:00 Compare
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 032ef546b8 to 74bc4eb1e0 2024-09-01 09:41:11 +00:00 Compare
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 74bc4eb1e0 to 49e4c246dd 2024-09-01 09:54:24 +00:00 Compare
dstepanov-yadro changed title from WIP: blobovnicza optimizations to Blobovnicza tree optimizations 2024-09-01 09:54:45 +00:00
dstepanov-yadro requested review from storage-core-committers 2024-09-01 09:56:50 +00:00
dstepanov-yadro requested review from storage-core-developers 2024-09-01 09:56:58 +00:00
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 49e4c246dd to 07501bfda2 2024-09-04 07:24:25 +00:00 Compare
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 07501bfda2 to e94c30a98f 2024-09-04 07:53:10 +00:00 Compare
dstepanov-yadro force-pushed feat/blobovnicza_shrink from e94c30a98f to 54a39dd2db 2024-09-05 05:47:53 +00:00 Compare
dstepanov-yadro reviewed 2024-09-05 06:42:40 +00:00
@ -110,0 +135,4 @@
return nil, fmt.Errorf("invalid fill percent value %d: must be (0; 100]", target)
}
result := make(map[string]struct{})
if err := b.iterateDeepest(ctx, oid.Address{}, func(lvlPath string) (bool, error) {
Author
Member

iterateDeepest iterates blobovnicza tree leaf directories only. It is ok as blobovniczas with schema change collected before.

`iterateDeepest` iterates blobovnicza tree leaf directories only. It is ok as blobovniczas with schema change collected before.
dstepanov-yadro reviewed 2024-09-05 06:44:11 +00:00
@ -110,0 +157,4 @@
if !hasDBs {
return false, nil
}
for _, e := range entries {
Author
Member

I know that it could be done in single iteration, but code looks more complicated.

I know that it could be done in single iteration, but code looks more complicated.
dstepanov-yadro reviewed 2024-09-05 06:51:16 +00:00
@ -414,4 +401,1 @@
// WithDisabledRebuild returns an option to disable a shard rebuild.
// For testing purposes only.
func WithDisabledRebuild() Option {
Author
Member

Background rebuild disabled now.

Background rebuild disabled now.
fyrchik reviewed 2024-09-05 08:29:41 +00:00
fyrchik left a comment
Owner
  1. If I abort the rebuild and manually remove .rebuild file what is the worse thing that could happen?
  2. Same question, but know I add rebuild file by hand.
1. If I abort the rebuild and manually remove `.rebuild` file what is the worse thing that could happen? 2. Same question, but know I add rebuild file by hand.
@ -0,0 +18,4 @@
var shardsRebuildCmd = &cobra.Command{
Use: "rebuild",
Short: "Rebuild shards",
Long: "Rebuild reclaims storage occupied by dead objects",
Owner

It also, well, rebuilds the tree according to the width/depth from the configuration.

It also, well, rebuilds the tree according to the width/depth from the configuration.
Author
Member

Fixed

Fixed
@ -193,3 +193,3 @@
}
func (b *Blobovniczas) iterateExistingDBPathsDFS(ctx context.Context, path string, f func(string) (bool, error)) (bool, error) {
func (b *Blobovniczas) iterateExistingDBPathsDFS(ctx context.Context, path string, f func(string) (bool, error), fileFilter func(path string) bool) (bool, error) {
Owner

We now use iterateExistingDB function to iterate over .rebuild files, which is not DB, using an additional parameter.
How about renaming?

We now use `iterateExistingDB` function to iterate over `.rebuild` files, which is not `DB`, using an additional parameter. How about renaming?
Author
Member

Fixed

Fixed
@ -114,0 +187,4 @@
return false, err
}
defer shDB.Close()
return blz.FillPercent() < targetFillPercent || blz.IsOverflowed(), nil
Owner

Could you explain how these conditions play together (have the same return value)?
They seem opposite -- FiilPercent() < ... is true if the blobovnicza is empty and IsOverflowed is true if it is full.

Could you explain how these conditions play together (have the same return value)? They seem opposite -- `FiilPercent() < ...` is true if the blobovnicza is empty and `IsOverflowed` is true if it is full.
Author
Member

Assume blobovnicza has size_limit = 40mb
If actual data size is 30mb, targetFillPercent equals 90% and maxObjSize equals 100kb, then blobovnicza will be rebuilded.
If actual data size is 39mb, targetFillPercent equals 90% and maxObjSize equals 100kb, then blobovnicza will not be rebuilded.
If actual data size is 45mb, targetFillPercent equals 90% and maxObjSize equals 100kb (this could happen if size_limit was changed from 45mb to 40mb), then blobovnicza will be rebuilded.

Assume blobovnicza has `size_limit = 40mb` If actual data size is 30mb, `targetFillPercent` equals 90% and `maxObjSize` equals 100kb, then blobovnicza will be rebuilded. If actual data size is 39mb, `targetFillPercent` equals 90% and `maxObjSize` equals 100kb, then blobovnicza will not be rebuilded. If actual data size is 45mb, `targetFillPercent` equals 90% and `maxObjSize` equals 100kb (this could happen if `size_limit` was changed from 45mb to 40mb), then blobovnicza will be rebuilded.
Author
Member

Fixed

Fixed
@ -155,6 +164,7 @@ func testRebuildFailoverValidate(t *testing.T, dir string, obj *objectSDK.Object
rRes, err := b.Rebuild(context.Background(), common.RebuildPrm{
MetaStorage: metaStub,
WorkerLimiter: &rebuildLimiterStub{},
FillPercent: 1,
Owner

How is this change related to disable rebuild commit message?
Describe it in the commit message.

How is this change related to `disable rebuild` commit message? Describe it in the commit message.
Author
Member

Fixed

Fixed
fyrchik marked this conversation as resolved
@ -0,0 +31,4 @@
ctx, span := tracing.StartSpanFromContext(ctx, "StorageEngine.Rebuild",
trace.WithAttributes(
attribute.Int("shard_id_count", len(prm.ShardIDs)),
attribute.Int64("target_fill_percent", int64(prm.TargetFillPercent)),
Owner

Could you explain what tradeoff this setting represents?
I have some random tree.
What will the result be if target_fill_percent = 50, what will it be if target_fill_percent = 100?

Could you explain what tradeoff this setting represents? I have some random tree. What will the result be if `target_fill_percent = 50`, what will it be if `target_fill_percent = 100`?
Author
Member

Assume blobovnicza has size_limit = 40mb and actual data size in some blobovnicza is 30mb.
If target_fill_percent equals 50, then rebuild will skip this blobovnicza to rebuild: 100 * (30mb / 40mb) = 75%, 75% > 50%.
If target_fill_percent equals 100, then rebuild will move objects from this blobovnicza to other blobovnicza(s).

This option defines how 'aggressive' disk space will be reclaimed. If there is enough free disk space, but user wants to reclaim disk space periodically, then fill percent value could be around 50%. If there is not enough free disk space and the point is that disk space will end completely, then fill percent value of 90-100% could be used.

Assume blobovnicza has `size_limit = 40mb` and actual data size in some blobovnicza is 30mb. If `target_fill_percent` equals 50, then rebuild will skip this blobovnicza to rebuild: 100 * (30mb / 40mb) = 75%, 75% > 50%. If `target_fill_percent` equals 100, then rebuild will move objects from this blobovnicza to other blobovnicza(s). This option defines how 'aggressive' disk space will be reclaimed. If there is enough free disk space, but user wants to reclaim disk space periodically, then fill percent value could be around 50%. If there is not enough free disk space and the point is that disk space will end completely, then fill percent value of 90-100% could be used.
fyrchik marked this conversation as resolved
Author
Member
  1. If I abort the rebuild and manually remove .rebuild file what is the worse thing that could happen?
  2. Same question, but know I add rebuild file by hand.
  1. The worst thing is if object exists in 'old' and 'new' blobovniczas at the same time. If such object is deleted, then it will be deleted only from single blobovnicza ('old' or 'new' depends on the fact if storageID updated or not). After deleting GC mark (or tombstone) and metabase resync this object will be available again.

  2. Nothing happens: blobovnicza tree will work as usual, but next rebuild will open corresponding blobovnicza, try to find any incomplete move, then drops this file. I assume that file added by hand has valid name, otherwise frostfs-node will panic.

> 1. If I abort the rebuild and manually remove `.rebuild` file what is the worse thing that could happen? > 2. Same question, but know I add rebuild file by hand. 1. The worst thing is if object exists in 'old' and 'new' blobovniczas at the same time. If such object is deleted, then it will be deleted only from single blobovnicza ('old' or 'new' depends on the fact if storageID updated or not). After deleting GC mark (or tombstone) and metabase resync this object will be available again. 2. Nothing happens: blobovnicza tree will work as usual, but next rebuild will open corresponding blobovnicza, try to find any incomplete move, then drops this file. I assume that file added by hand has valid name, otherwise frostfs-node will panic.
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 54a39dd2db to 25380d8de4 2024-09-05 08:58:47 +00:00 Compare
Owner

otherwise frostfs-node will panic

this concerns me a bit, is it easy to fix?

>otherwise frostfs-node will panic this concerns me a bit, is it easy to fix?
Author
Member

otherwise frostfs-node will panic

this concerns me a bit, is it easy to fix?

This behavior was from the very beginning. Why fix now?

Also panic signals that something wrong happens and blobovnicza tree has invalid structure.

> >otherwise frostfs-node will panic > > this concerns me a bit, is it easy to fix? This behavior was from the very beginning. Why fix now? Also panic signals that something wrong happens and blobovnicza tree has invalid structure.
Owner

This behavior was from the very beginning. Why fix now?

missed that part

Also panic signals that something wrong happens and blobovnicza tree has invalid structure.

there are other ways to signal that, especially given that corruption happens

>This behavior was from the very beginning. Why fix now? missed that part >Also panic signals that something wrong happens and blobovnicza tree has invalid structure. there are other ways to signal that, especially given that corruption happens
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 25380d8de4 to 6a5abcb8ac 2024-09-05 13:30:22 +00:00 Compare
aarifullin approved these changes 2024-09-06 08:34:01 +00:00
Dismissed
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 6a5abcb8ac to 9875704415 2024-09-06 10:34:28 +00:00 Compare
dstepanov-yadro force-pushed feat/blobovnicza_shrink from 9875704415 to f56f613982 2024-09-06 10:57:35 +00:00 Compare
dstepanov-yadro force-pushed feat/blobovnicza_shrink from f56f613982 to d3b209c8e1 2024-09-06 12:22:03 +00:00 Compare
dstepanov-yadro dismissed aarifullin's review 2024-09-06 12:22:04 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

fyrchik approved these changes 2024-09-06 12:35:27 +00:00
dstepanov-yadro requested review from storage-core-developers 2024-09-06 12:39:07 +00:00
dstepanov-yadro requested review from storage-core-committers 2024-09-06 12:39:08 +00:00
aarifullin approved these changes 2024-09-06 12:47:52 +00:00
dstepanov-yadro merged commit d3b209c8e1 into master 2024-09-06 12:49:21 +00:00
dstepanov-yadro deleted branch feat/blobovnicza_shrink 2024-09-06 12:49:23 +00:00
acid-ant approved these changes 2024-09-09 08:27:40 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1337
No description provided.