Blobovnicza tree rebuild (support) #703

dstepanov-yadro · 2023-09-22T10:13:14Z

dstepanov-yadro commented

2023-09-22 10:13:14 +00:00

Relates #661

Relates #698

Relates #661 Relates #698

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 1105bec70a to 9bf021ea1f

2023-09-22 13:06:39 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 9bf021ea1f to 74f2ee699a

2023-09-22 13:51:12 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 839e08c489 to 87548e3a4d

2023-09-22 14:29:41 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 87548e3a4d to 8f9f1de48c

2023-09-25 07:48:24 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 8f9f1de48c to 79ad3f767f

2023-09-27 14:15:08 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 79ad3f767f to a323c175e7

2023-09-27 15:04:58 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from dfc3db24ab to 9b6c5004fa

2023-09-28 16:08:02 +00:00

Compare

dstepanov-yadro changed title from ~~WIP: Blobovnicza tree rebuild (support)~~ to Blobovnicza tree rebuild (support)

2023-10-02 08:00:59 +00:00

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 9b6c5004fa to c623b13a3a

2023-10-02 11:34:28 +00:00

Compare

fyrchik commented

2023-10-02 12:45:20 +00:00

Tests with race time out, could there be a deadlock?

acid-ant reviewed 2023-10-02 13:28:44 +00:00

pkg/local_object_storage/blobstor/blobovniczatree/control_test.go Outdated

					
				@ -0,0 +72,4 @@

					}

				}

				func TestObjectsAvailableAfterDepthAndWithEdit(t *testing.T) {

acid-ant commented

2023-10-02 13:28:44 +00:00

Typo With -> Width.

Typo `With -> Width`.

dstepanov-yadro commented

2023-10-03 09:06:43 +00:00

fixed

fyrchik marked this conversation as resolved

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from c623b13a3a to 61e8a0cf31

2023-10-02 15:48:27 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 60a68e1b4b to 16d157154c

2023-10-03 09:06:22 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 16d157154c to 1edcbe9865

2023-10-03 09:12:05 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 1edcbe9865 to f999a36880

2023-10-03 09:47:24 +00:00

Compare

dstepanov-yadro commented

2023-10-03 11:00:54 +00:00

Tests with race time out, could there be a deadlock?

Cheched locally, it's ok.

> Tests with race time out, could there be a deadlock? Cheched locally, it's ok.

fyrchik reviewed 2023-10-03 11:44:25 +00:00

pkg/local_object_storage/blobovnicza/move.go Outdated

					
				@ -0,0 +59,4 @@

							return nil

						}

						if err := bucket.Delete(key); err != nil {

fyrchik commented

2023-10-03 10:35:22 +00:00

Maybe also delete the bucket if it is empty?

dstepanov-yadro commented

2023-10-03 14:28:49 +00:00

fixed

fyrchik marked this conversation as resolved

pkg/local_object_storage/blobovnicza/move.go Outdated

					
				@ -0,0 +83,4 @@

						return bucket.ForEach(func(k, v []byte) error {

							var addr oid.Address

							storageID := make([]byte, len(v))

							if e := addressFromKey(&addr, k); e != nil {

fyrchik commented

2023-10-03 10:36:48 +00:00

Why not err?

Why not `err`?

dstepanov-yadro commented

2023-10-03 14:30:03 +00:00

fixed

fyrchik marked this conversation as resolved

pkg/local_object_storage/blobstor/blobovniczatree/active.go Outdated

					
				@ -23,3 +23,2 @@

				func (db *activeDB) Path() string {

					return db.shDB.Path()

				func (db *activeDB) SystemPath() string {

fyrchik commented

2023-10-03 10:37:03 +00:00

Why this change?

dstepanov-yadro commented

2023-10-03 14:27:47 +00:00

blobovniczatree uses concept path as internal path inside tree (like 0/0/0.db), but activeDB returns system path (like /storage/blobovnicza/0/0/0.db). So i think SystemPath() is more understandable.

`blobovniczatree` uses concept `path` as internal path inside tree (like `0/0/0.db`), but activeDB returns system path (like `/storage/blobovnicza/0/0/0.db`). So i think `SystemPath()` is more understandable.

pkg/local_object_storage/blobstor/blobovniczatree/cache.go Outdated

					
				@ -60,2 +62,4 @@

				}

				func (c *dbCache) EvictAndMarkNonCached(path string) {

					c.pathLock.Lock(path) //order important

fyrchik commented

2023-10-03 11:09:11 +00:00

Can we move ordering comments to the field definitions?
inline comments could easily be missed when writing new code.

Can we move ordering comments to the field definitions? inline comments could easily be missed when writing new code.

dstepanov-yadro commented

2023-10-03 14:33:55 +00:00

fixed

pkg/local_object_storage/blobstor/blobovniczatree/control.go Outdated

					
				@ -38,0 +53,4 @@

					eg, egCtx := errgroup.WithContext(ctx)

					eg.SetLimit(b.blzInitWorkersCount)

					visited := make(map[string]bool)

fyrchik commented

2023-10-03 10:25:55 +00:00

Why not map[string]struct{}?

Why not `map[string]struct{}`?

dstepanov-yadro commented

2023-10-03 14:38:05 +00:00

fixed

pkg/local_object_storage/blobstor/blobovniczatree/control.go Outdated

					
				@ -40,0 +94,4 @@

							}

							defer shBlz.Close()

							b.log.Debug(logs.BlobovniczatreeBlobovniczaSuccessfullyInitializedClosing, zap.String("id", p))

fyrchik commented

2023-10-03 10:20:06 +00:00

SuccessfullyInitialized, but we only call Open without Init, is this expected?

`SuccessfullyInitialized`, but we only call `Open` without `Init`, is this expected?

dstepanov-yadro commented

2023-10-03 14:41:48 +00:00

shBlz is wrapper over blobovnicza, so shBlz.Open() calls blobovnicza's Open() and Init().

`shBlz` is wrapper over `blobovnicza`, so `shBlz.Open()` calls blobovnicza's `Open()` and `Init()`.

pkg/local_object_storage/blobstor/blobovniczatree/control.go Outdated

					
				@ -67,0 +145,4 @@

							if err := b.addDBExtensionToDBs(filepath.Join(path, entry.Name()), depth+1); err != nil {

								return err

							}

						} else {

fyrchik commented

2023-10-03 10:17:52 +00:00

I like continue + smaller indentation for else better, but don't insist.

I like `continue` + smaller indentation for `else` better, but don't insist.

dstepanov-yadro commented

2023-10-03 14:45:52 +00:00

fixed

pkg/local_object_storage/blobstor/blobovniczatree/delete.go Outdated

					
				@ -42,12 +45,22 @@ func (b *Blobovniczas) Delete(ctx context.Context, prm common.DeletePrm) (res co

						return common.DeleteRes{}, common.ErrReadOnly

					}

					if b.rebuildGuard.TryRLock() {

fyrchik commented

2023-10-03 10:40:06 +00:00

TryRLock is not recommended, I would like to avoid it if possible. Can we do the same with atomics?

`TryRLock` is not recommended, I would like to avoid it if possible. Can we do the same with atomics?

dstepanov-yadro commented

2023-10-03 15:47:12 +00:00

I don't know how to correct do the same with atomics. Now Delete call can perform concurrently , but can't perform concurrently with Rebuild.
So i believe that this is exactly the case when the use of TryLock is justified.

I don't know how to correct do the same with atomics. Now `Delete` call can perform concurrently , but can't perform concurrently with Rebuild. So i believe that this is exactly the case when the use of TryLock is justified.

pkg/local_object_storage/blobstor/blobovniczatree/iterate.go Outdated

					
				@ -159,3 +169,1 @@

				// iterator over the paths of Blobovniczas in random order.

				func (b *Blobovniczas) iterateLeaves(ctx context.Context, f func(string) (bool, error)) error {

					return b.iterateSortedLeaves(ctx, nil, f)

				// iterateExistedDBPaths iterates over the paths of Blobovniczas without any order.

fyrchik commented

2023-10-03 10:06:54 +00:00

Existed or Existing?

`Existed` or `Existing`?

dstepanov-yadro commented

2023-10-03 15:06:06 +00:00

fixed

fyrchik marked this conversation as resolved

pkg/local_object_storage/blobstor/blobovniczatree/iterate.go Outdated

					
				@ -162,0 +175,4 @@

				}

				func (b *Blobovniczas) iterateExistedDBPathsDFS(ctx context.Context, path string, f func(string) (bool, error)) (bool, error) {

					if path == "" {

fyrchik commented

2023-10-03 10:07:56 +00:00

Why not move this locking to the caller?

dstepanov-yadro commented

2023-10-03 15:11:36 +00:00

Fixed

pkg/local_object_storage/blobstor/blobovniczatree/iterate.go Outdated

					
				@ -162,0 +182,4 @@

					sysPath := filepath.Join(b.rootPath, path)

					entries, err := os.ReadDir(sysPath)

					if os.IsNotExist(err) && b.readOnly && path == "" { //non initialized tree in read only mode

fyrchik commented

2023-10-03 10:10:15 +00:00

Why do we NOT return error in this case? Was it the case in the previous implementation?

dstepanov-yadro commented

2023-10-03 15:14:49 +00:00

if b.readOnly {

DB files and dirs may be missing in case of read only mode.

https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/627b3027456fc87605d9a82490088cd2c767dbe4/pkg/local_object_storage/blobstor/blobovniczatree/control.go#L24 DB files and dirs may be missing in case of read only mode.

pkg/local_object_storage/blobstor/blobovniczatree/iterate.go Outdated

					
				@ -162,0 +215,4 @@

					return false, nil

				}

				func (b *Blobovniczas) iterateSortedDBPaths(ctx context.Context, addr oid.Address, f func(string) (bool, error)) error {

fyrchik commented

2023-10-03 10:15:36 +00:00

iterateSortedDBPaths iterates over ALL databases
iterateSortedLeaves iterates over databases according to config

Am I correct?

`iterateSortedDBPaths` iterates over ALL databases `iterateSortedLeaves` iterates over databases according to config Am I correct?

dstepanov-yadro commented

2023-10-03 15:15:03 +00:00

Yes

pkg/local_object_storage/blobstor/blobovniczatree/iterate.go Outdated

					
				@ -162,0 +220,4 @@

					return err

				}

				func (b *Blobovniczas) iterateSorderDBPathsInternal(ctx context.Context, path string, addr oid.Address, f func(string) (bool, error)) (bool, error) {

fyrchik commented

2023-10-03 10:14:03 +00:00

s/Sorder/Sorted/ ?

Actually, Sorted looks like a nice abbreviation for "sorted order" :)

s/Sorder/Sorted/ ? Actually, `Sorted` looks like a nice abbreviation for "sorted order" :)

dstepanov-yadro commented

2023-10-03 15:18:35 +00:00

fixed

pkg/local_object_storage/blobstor/blobovniczatree/iterate.go Outdated

					
				@ -162,0 +237,4 @@

					var dbIdxs []uint64

					var dirIdxs []uint64

					for _, entry := range entries {

fyrchik commented

2023-10-03 10:12:32 +00:00

If we iterate during rebuild the object could be iterated twice.
It doesn't seem like a big problem, but IMO worth mentioning somewhere in docs.

If we iterate during rebuild the object could be iterated twice. It doesn't seem like a big problem, but IMO worth mentioning somewhere in docs.

pkg/local_object_storage/blobstor/blobovniczatree/manager.go Outdated

					
				@ -98,32 +106,108 @@ func (b *sharedDB) Close() {

					}

					b.refCount--

					if b.refCount == 1 {

fyrchik commented

2023-10-03 11:29:16 +00:00

What scenario does this branch corresponds to?

dstepanov-yadro commented

2023-10-03 15:22:13 +00:00

See

func (b *sharedDB) CloseAndRemoveFile() error {
	b.cond.L.Lock()
	if b.refCount > 1 {
		b.cond.Wait()
	}
	defer b.cond.L.Unlock()

This is where the caller waits on condition variable until there is only one reference left.

See ``` func (b *sharedDB) CloseAndRemoveFile() error { b.cond.L.Lock() if b.refCount > 1 { b.cond.Wait() } defer b.cond.L.Unlock() ``` This is where the caller waits on condition variable until there is only one reference left.

pkg/local_object_storage/blobstor/blobovniczatree/manager.go Outdated

					
				@ -127,0 +202,4 @@

				}

				func (m *levelDbManager) hasAnyDB() bool {

					m.dbMtx.Lock()

fyrchik commented

2023-10-03 11:29:50 +00:00

Why is it Lock and not RLock?

Why is it `Lock` and not `RLock`?

dstepanov-yadro commented

2023-10-03 15:25:00 +00:00

fixed

pkg/local_object_storage/blobstor/blobovniczatree/rebuild.go Outdated

					
				@ -0,0 +417,4 @@

				}

				type addressMap struct {

					data  map[oid.Address]bool

fyrchik commented

2023-10-03 11:06:13 +00:00

Again, if we only store true here, what about map[oid.Address]struct{}?

Again, if we only store `true` here, what about `map[oid.Address]struct{}`?

dstepanov-yadro commented

2023-10-03 15:27:28 +00:00

fixed. It just shortens the code.

pkg/local_object_storage/engine/rebuild_limiter.go Outdated

					
				@ -0,0 +6,4 @@

					semaphore chan struct{}

				}

				func newRebuildLimiter(workersCount uint32) *rebuildLimiter {

fyrchik commented

2023-10-03 11:04:32 +00:00

We already have errgroup and ants.Pool which allow limiting concurrent routines, why do we create yet another limiter here?

dstepanov-yadro commented

2023-10-03 15:33:33 +00:00

In general blobovniczatree.Rebuild requires from caller some limiter, but not pool to run tasks.
errgroup or ants.Pool doen't look like something that is suitable for this case: it will be difficult to cancel operations, and rebuilds from different shards should not depend on each other's errors.

In general `blobovniczatree.Rebuild` requires from caller some limiter, but not pool to run tasks. `errgroup` or `ants.Pool` doen't look like something that is suitable for this case: it will be difficult to cancel operations, and rebuilds from different shards should not depend on each other's errors.

pkg/local_object_storage/shard/delete.go Outdated

					
				@ -67,3 +64,1 @@

						s.deleteObjectFromWriteCacheSafe(ctx, addr)

						s.deleteFromBlobstorSafe(ctx, addr)

						if err := s.deleteFromBlobstor(ctx, addr); err != nil {

fyrchik commented

2023-10-03 10:31:17 +00:00

Why do we no longer delete from the write-cache? It is not obvious from the commit message.

dstepanov-yadro commented

2023-10-03 15:39:49 +00:00

updated commit message with explanation, that objects will be deleted from writecache after flush to blobstore.

pkg/local_object_storage/shard/rebuilder.go Outdated

					
				@ -0,0 +30,4 @@

				}

				func (r *rebuilder) Start(ctx context.Context, bs *blobstor.BlobStor, mb *meta.DB, log *logger.Logger) {

					r.mtx.Lock()

fyrchik commented

2023-10-03 10:01:16 +00:00

Is it possible to call this concurrently?
Anyway, why do we block here and not return immediately with "already started"?
This could (?) lead to Stop() stopping only the first started instance.

Is it possible to call this concurrently? Anyway, why do we block here and not return immediately with "already started"? This could (?) lead to `Stop()` stopping only the first started instance.

dstepanov-yadro commented

2023-10-03 15:44:36 +00:00

Is it possible to call this concurrently?

Actually no, mutes is just foolproof.

Anyway, why do we block here and not return immediately with "already started"?

I don't know what to do with such error. The idea was just to maintain the correct state.

This could (?) lead to Stop() stopping only the first started instance.

Stop() stores nil to cancel.

> Is it possible to call this concurrently? Actually no, mutes is just foolproof. > Anyway, why do we block here and not return immediately with "already started"? I don't know what to do with such error. The idea was just to maintain the correct state. > This could (?) lead to Stop() stopping only the first started instance. Stop() stores nil to cancel.

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from f999a36880 to c9fc6ec535

2023-10-03 14:28:46 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from c9fc6ec535 to e0349eb50b

2023-10-03 14:30:07 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from e0349eb50b to db1e129883

2023-10-03 14:33:55 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from db1e129883 to d6b3af013d

2023-10-03 14:37:57 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from d6b3af013d to 0179b6ebe8

2023-10-03 14:45:48 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 0179b6ebe8 to 5c3084163b

2023-10-03 15:06:05 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 5c3084163b to 7641f1cacc

2023-10-03 15:11:35 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 7641f1cacc to 425a6141ef

2023-10-03 15:17:46 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 425a6141ef to 9f77e8311f

2023-10-03 15:25:05 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 9f77e8311f to 9c3c4e2cb7

2023-10-03 15:27:18 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 9c3c4e2cb7 to 18041273e0

2023-10-03 15:39:04 +00:00

Compare

requested reviews from storage-core-committers, storage-core-developers

2023-10-03 15:48:25 +00:00

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 21d6669161 to 5943df1efe

2023-10-10 15:25:10 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from e72dc0b286 to dc56e70587

2023-10-11 07:20:39 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 5c1b09153a to fd4c91d96e

2023-10-11 14:16:19 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from fd4c91d96e to 589dddb6d4

2023-10-12 06:51:18 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 32334a52cd to b2215fe640

2023-10-12 14:29:24 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 07909edb8d to 2552db1eb4

2023-10-13 09:17:34 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 2552db1eb4 to 660e8a8002

2023-10-13 09:25:11 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 660e8a8002 to 9de673c37a

2023-10-13 13:29:59 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 9de673c37a to d875842aeb

2023-10-30 10:48:51 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from d875842aeb to 7569671c66

2023-11-09 10:46:07 +00:00

Compare

fyrchik added the

blocked

label 2023-11-09 13:03:03 +00:00

dstepanov-yadro referenced this pull request

2023-11-16 09:00:40 +00:00

Blobovnicza tree rebuild #812

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 7569671c66 to 9769b2e4ec

2023-11-23 13:02:07 +00:00

Compare

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 9769b2e4ec to 60df32379f

2023-11-23 13:25:28 +00:00

Compare

fyrchik referenced this pull request

2023-11-23 13:43:03 +00:00

Make initialization great again #826

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from c056bbb8e5 to 13a2b3b8b5

2023-11-28 16:54:29 +00:00

Compare

JuliaKovshova approved these changes 2023-12-07 08:19:51 +00:00

dstepanov-yadro force-pushed feat/shard_migrator_support_v0.37 from 13a2b3b8b5 to 02450a9a16

2023-12-07 12:36:29 +00:00

Compare

fyrchik merged commit 02450a9a16 into support/v0.37

2023-12-07 13:15:47 +00:00

Sign in to join this conversation.

No reviewers

TrueCloudLab/storage-core-committers

TrueCloudLab/storage-core-developers

No labels

No milestone

No project

No assignees

4 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#703

Rows
Columns