Do not change shard mode to DEGRADED_READ_ONLY in case of no space left from blobovnicza #1166

dstepanov-yadro · 2024-06-06T15:11:39Z

dstepanov-yadro commented

2024-06-06 15:11:39 +00:00

Now engine doesn't change shard mode in case of no space left errors and error threshold defined.

Now engine doesn't change shard mode in case of `no space left` errors and error threshold defined.

dstepanov-yadro force-pushed fix/out_of_space_dro from 118da6a174 to e916b5765c

2024-06-06 15:13:09 +00:00

Compare

fyrchik reviewed 2024-06-06 15:20:52 +00:00

pkg/local_object_storage/blobstor/blobovniczatree/delete.go Outdated

					
				@ -80,2 +75,4 @@

						res, err = b.deleteObjectFromLevel(ctx, bPrm, p)

						if err != nil {

							if isErrNoSpaceLeft(err) {

								return false, common.ErrNoSpace // stop iteration if no space left

fyrchik commented

2024-06-06 15:19:22 +00:00

It is tricky: one db may have no space because it wanted to do a remap, another one can have a large freelist with already allocated memory.
If we exit prematurely, we can make it harder to free space, e.g. there would be no place to put tombstone into.

It is tricky: one db may have no space because it wanted to do a remap, another one can have a large freelist with already allocated memory. If we exit prematurely, we can make it harder to free space, e.g. there would be no place to put tombstone into.

dstepanov-yadro commented

2024-06-06 15:26:45 +00:00

Not actual.

pkg/local_object_storage/blobstor/blobovniczatree/put.go Outdated

					
				@ -84,3 +82,4 @@

							i.B.log.Debug(logs.BlobovniczatreeCouldNotGetActiveBlobovnicza,

								zap.String("error", err.Error()),

								zap.String("trace_id", tracingPkg.GetTraceID(ctx)))

						} else if isErrNoSpaceLeft(err) { // stop iteration if no space left

fyrchik commented

2024-06-06 15:20:41 +00:00

Why have you decided to add this handlers on the blobovniczatree level and not on the blobovnicza?
It seems easier to miss sth here, and in blobovnicza we can easily ensure that each Update or Batch is wrapped, for example.

Why have you decided to add this handlers on the `blobovniczatree` level and not on the `blobovnicza`? It seems easier to miss sth here, and in blobovnicza we can easily ensure that each `Update` or `Batch` is wrapped, for example.

dstepanov-yadro commented

2024-06-06 15:29:13 +00:00

To stop iteration over databases as soon as possible.

dstepanov-yadro force-pushed fix/out_of_space_dro from e916b5765c to 447741ca7f

2024-06-06 15:21:25 +00:00

Compare

dstepanov-yadro force-pushed fix/out_of_space_dro from 447741ca7f to c5c22c632e

2024-06-06 15:23:36 +00:00

Compare

dstepanov-yadro force-pushed fix/out_of_space_dro from c5c22c632e to f991f4d4fb

2024-06-06 15:25:39 +00:00

Compare

dstepanov-yadro changed title from ~~Do not change shard mode to DEGRADED_READ_ONLY in case of no space left from blobovnicza~~ to WIP: Do not change shard mode to DEGRADED_READ_ONLY in case of no space left from blobovnicza

2024-06-06 15:32:35 +00:00

dstepanov-yadro force-pushed fix/out_of_space_dro from f991f4d4fb to 84b8e0bd41

2024-06-06 16:02:44 +00:00

Compare

dstepanov-yadro reviewed 2024-06-06 16:04:10 +00:00

pkg/local_object_storage/blobovnicza/errors.go Outdated

					
				@ -0,0 +3,4 @@

				import "git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/util/logicerr"

				// ErrNoSpace returned if blobovnicza failed to perform an operation because of syscall.ENOSPC.

				var ErrNoSpace = logicerr.New("no space left on device with blobovnicza")

dstepanov-yadro commented

2024-06-06 16:04:09 +00:00

To not to use blobstor's ErrNoSpace: blobstor should depend on blobovnicza, not vice versa.

dstepanov-yadro changed title from ~~WIP: Do not change shard mode to DEGRADED_READ_ONLY in case of no space left from blobovnicza~~ to Do not change shard mode to DEGRADED_READ_ONLY in case of no space left from blobovnicza

2024-06-06 16:55:07 +00:00

requested reviews from storage-core-committers, storage-core-developers

2024-06-06 16:55:14 +00:00

acid-ant approved these changes 2024-06-07 06:58:21 +00:00

achuprov approved these changes 2024-06-07 08:37:43 +00:00

~~achuprov referenced this pull request 2024-06-07 10:54:41 +00:00~~

adm/morph: Fix set-config parameter validation #1167

fyrchik reviewed 2024-06-07 12:10:16 +00:00

pkg/local_object_storage/blobovnicza/put.go Outdated

					
				@ -95,6 +97,8 @@ func (b *Blobovnicza) Put(ctx context.Context, prm PutPrm) (PutRes, error) {

					})

					if err == nil {

						b.itemAdded(recordSize)

					} else if errors.Is(err, syscall.ENOSPC) {

fyrchik commented

2024-06-07 12:10:16 +00:00

Any modifying method can allocate new pages, even delete.

dstepanov-yadro commented

2024-06-07 13:41:05 +00:00

I thought about it, but have found this comment of contributor: https://github.com/etcd-io/bbolt/issues/288#issuecomment-919971605

Anyway, ok, I will fix it.

I thought about it, but have found this comment of contributor: https://github.com/etcd-io/bbolt/issues/288#issuecomment-919971605 Anyway, ok, I will fix it.

dstepanov-yadro commented

2024-06-07 14:16:28 +00:00

Done

fyrchik commented

2024-06-10 07:54:26 +00:00

The comment has different context (shrinking the DB), we already have experienced situations where deletions lead to db remap leading to a deadlock (with the write-cache)

fyrchik marked this conversation as resolved

dstepanov-yadro force-pushed fix/out_of_space_dro from 84b8e0bd41 to 815e87df74

2024-06-07 14:12:15 +00:00

Compare

dstepanov-yadro force-pushed fix/out_of_space_dro from 815e87df74 to 6cf512e574

2024-06-07 14:15:59 +00:00

Compare

fyrchik reviewed 2024-06-10 07:57:09 +00:00

pkg/local_object_storage/blobstor/blobovniczatree/put.go

					
				@ -110,2 +111,3 @@

						}

						if errors.Is(err, blobovnicza.ErrNoSpace) {

							i.AllFull = true

fyrchik commented

2024-06-10 07:57:08 +00:00

Again, do we exit if we received this error from at least 1 blobovnicza? Until we have vacuum I think it is not worth having this optimization, as others blobovniczas may still have free pages.

dstepanov-yadro commented

2024-06-10 08:23:45 +00:00

No, blobstor will try all databases:

	i.AllFull = false <------- here AllFull resets

	_, err = active.Blobovnicza().Put(ctx, i.PutPrm)
	if err != nil {
		if !isLogical(err) {
			i.B.reportError(logs.BlobovniczatreeCouldNotPutObjectToActiveBlobovnicza, err)
		} else {
			i.B.log.Debug(logs.BlobovniczatreeCouldNotPutObjectToActiveBlobovnicza,
				zap.String("path", active.SystemPath()),
				zap.String("error", err.Error()),
				zap.String("trace_id", tracingPkg.GetTraceID(ctx)))
		}

		if errors.Is(err, blobovnicza.ErrNoSpace) {
			i.AllFull = true

No, blobstor will try all databases: ``` i.AllFull = false <------- here AllFull resets _, err = active.Blobovnicza().Put(ctx, i.PutPrm) if err != nil { if !isLogical(err) { i.B.reportError(logs.BlobovniczatreeCouldNotPutObjectToActiveBlobovnicza, err) } else { i.B.log.Debug(logs.BlobovniczatreeCouldNotPutObjectToActiveBlobovnicza, zap.String("path", active.SystemPath()), zap.String("error", err.Error()), zap.String("trace_id", tracingPkg.GetTraceID(ctx))) } if errors.Is(err, blobovnicza.ErrNoSpace) { i.AllFull = true ```

fyrchik commented

2024-06-10 10:07:26 +00:00

I don't understand, why do we need this change in this PR? Is something wrong without it?

dstepanov-yadro commented

2024-06-10 10:38:04 +00:00

Without this change blobovnicza tree will return non logical error:

return common.PutRes{}, errPutFailed

So shard will increase error counter.

Without this change blobovnicza tree will return non logical error: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/a0c588263bd550bb131a8ed2f1a5ad318811018c/pkg/local_object_storage/blobstor/blobovniczatree/put.go#L64 So shard will increase error counter.

fyrchik commented

2024-06-10 11:28:25 +00:00

Hm, but why iterateDeepest return non-nil error?

Hm, but why `iterateDeepest` return non-nil error?

dstepanov-yadro commented

2024-06-10 11:43:09 +00:00

By design.