Do not change shard mode to DEGRADED_READ_ONLY in case of no space left from blobovnicza #1166

Merged
fyrchik merged 1 commit from dstepanov-yadro/frostfs-node:fix/out_of_space_dro into master 2024-09-04 19:51:09 +00:00

Now engine doesn't change shard mode in case of no space left errors and error threshold defined.

Now engine doesn't change shard mode in case of `no space left` errors and error threshold defined.
dstepanov-yadro force-pushed fix/out_of_space_dro from 118da6a174 to e916b5765c 2024-06-06 15:13:09 +00:00 Compare
fyrchik reviewed 2024-06-06 15:20:52 +00:00
@ -80,2 +75,4 @@
res, err = b.deleteObjectFromLevel(ctx, bPrm, p)
if err != nil {
if isErrNoSpaceLeft(err) {
return false, common.ErrNoSpace // stop iteration if no space left
Owner

It is tricky: one db may have no space because it wanted to do a remap, another one can have a large freelist with already allocated memory.
If we exit prematurely, we can make it harder to free space, e.g. there would be no place to put tombstone into.

It is tricky: one db may have no space because it wanted to do a remap, another one can have a large freelist with already allocated memory. If we exit prematurely, we can make it harder to free space, e.g. there would be no place to put tombstone into.
Author
Member

Not actual.

Not actual.
@ -84,3 +82,4 @@
i.B.log.Debug(logs.BlobovniczatreeCouldNotGetActiveBlobovnicza,
zap.String("error", err.Error()),
zap.String("trace_id", tracingPkg.GetTraceID(ctx)))
} else if isErrNoSpaceLeft(err) { // stop iteration if no space left
Owner

Why have you decided to add this handlers on the blobovniczatree level and not on the blobovnicza?
It seems easier to miss sth here, and in blobovnicza we can easily ensure that each Update or Batch is wrapped, for example.

Why have you decided to add this handlers on the `blobovniczatree` level and not on the `blobovnicza`? It seems easier to miss sth here, and in blobovnicza we can easily ensure that each `Update` or `Batch` is wrapped, for example.
Author
Member

To stop iteration over databases as soon as possible.

To stop iteration over databases as soon as possible.
dstepanov-yadro force-pushed fix/out_of_space_dro from e916b5765c to 447741ca7f 2024-06-06 15:21:25 +00:00 Compare
dstepanov-yadro force-pushed fix/out_of_space_dro from 447741ca7f to c5c22c632e 2024-06-06 15:23:36 +00:00 Compare
dstepanov-yadro force-pushed fix/out_of_space_dro from c5c22c632e to f991f4d4fb 2024-06-06 15:25:39 +00:00 Compare
dstepanov-yadro changed title from Do not change shard mode to DEGRADED_READ_ONLY in case of no space left from blobovnicza to WIP: Do not change shard mode to DEGRADED_READ_ONLY in case of no space left from blobovnicza 2024-06-06 15:32:35 +00:00
dstepanov-yadro force-pushed fix/out_of_space_dro from f991f4d4fb to 84b8e0bd41 2024-06-06 16:02:44 +00:00 Compare
dstepanov-yadro reviewed 2024-06-06 16:04:10 +00:00
@ -0,0 +3,4 @@
import "git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/util/logicerr"
// ErrNoSpace returned if blobovnicza failed to perform an operation because of syscall.ENOSPC.
var ErrNoSpace = logicerr.New("no space left on device with blobovnicza")
Author
Member

To not to use blobstor's ErrNoSpace: blobstor should depend on blobovnicza, not vice versa.

To not to use blobstor's ErrNoSpace: blobstor should depend on blobovnicza, not vice versa.
dstepanov-yadro changed title from WIP: Do not change shard mode to DEGRADED_READ_ONLY in case of no space left from blobovnicza to Do not change shard mode to DEGRADED_READ_ONLY in case of no space left from blobovnicza 2024-06-06 16:55:07 +00:00
dstepanov-yadro requested review from storage-core-committers 2024-06-06 16:55:13 +00:00
dstepanov-yadro requested review from storage-core-developers 2024-06-06 16:55:14 +00:00
acid-ant approved these changes 2024-06-07 06:58:21 +00:00
achuprov approved these changes 2024-06-07 08:37:43 +00:00
fyrchik reviewed 2024-06-07 12:10:16 +00:00
@ -95,6 +97,8 @@ func (b *Blobovnicza) Put(ctx context.Context, prm PutPrm) (PutRes, error) {
})
if err == nil {
b.itemAdded(recordSize)
} else if errors.Is(err, syscall.ENOSPC) {
Owner

Any modifying method can allocate new pages, even delete.

Any modifying method can allocate new pages, even delete.
Author
Member

I thought about it, but have found this comment of contributor: https://github.com/etcd-io/bbolt/issues/288#issuecomment-919971605

Anyway, ok, I will fix it.

I thought about it, but have found this comment of contributor: https://github.com/etcd-io/bbolt/issues/288#issuecomment-919971605 Anyway, ok, I will fix it.
Author
Member

Done

Done
Owner

The comment has different context (shrinking the DB), we already have experienced situations where deletions lead to db remap leading to a deadlock (with the write-cache)

The comment has different context (shrinking the DB), we already have experienced situations where deletions lead to db remap leading to a deadlock (with the write-cache)
fyrchik marked this conversation as resolved
dstepanov-yadro force-pushed fix/out_of_space_dro from 84b8e0bd41 to 815e87df74 2024-06-07 14:12:15 +00:00 Compare
dstepanov-yadro force-pushed fix/out_of_space_dro from 815e87df74 to 6cf512e574 2024-06-07 14:15:59 +00:00 Compare
fyrchik reviewed 2024-06-10 07:57:09 +00:00
@ -110,2 +111,3 @@
}
if errors.Is(err, blobovnicza.ErrNoSpace) {
i.AllFull = true
Owner

Again, do we exit if we received this error from at least 1 blobovnicza? Until we have vacuum I think it is not worth having this optimization, as others blobovniczas may still have free pages.

Again, do we exit if we received this error from at least 1 blobovnicza? Until we have vacuum I think it is not worth having this optimization, as others blobovniczas may still have free pages.
Author
Member

No, blobstor will try all databases:

	i.AllFull = false <------- here AllFull resets

	_, err = active.Blobovnicza().Put(ctx, i.PutPrm)
	if err != nil {
		if !isLogical(err) {
			i.B.reportError(logs.BlobovniczatreeCouldNotPutObjectToActiveBlobovnicza, err)
		} else {
			i.B.log.Debug(logs.BlobovniczatreeCouldNotPutObjectToActiveBlobovnicza,
				zap.String("path", active.SystemPath()),
				zap.String("error", err.Error()),
				zap.String("trace_id", tracingPkg.GetTraceID(ctx)))
		}

		if errors.Is(err, blobovnicza.ErrNoSpace) {
			i.AllFull = true
No, blobstor will try all databases: ``` i.AllFull = false <------- here AllFull resets _, err = active.Blobovnicza().Put(ctx, i.PutPrm) if err != nil { if !isLogical(err) { i.B.reportError(logs.BlobovniczatreeCouldNotPutObjectToActiveBlobovnicza, err) } else { i.B.log.Debug(logs.BlobovniczatreeCouldNotPutObjectToActiveBlobovnicza, zap.String("path", active.SystemPath()), zap.String("error", err.Error()), zap.String("trace_id", tracingPkg.GetTraceID(ctx))) } if errors.Is(err, blobovnicza.ErrNoSpace) { i.AllFull = true ```
Owner

I don't understand, why do we need this change in this PR? Is something wrong without it?

I don't understand, why do we need this change in this PR? Is something wrong without it?
Author
Member

Without this change blobovnicza tree will return non logical error:

return common.PutRes{}, errPutFailed


So shard will increase error counter.

Without this change blobovnicza tree will return non logical error: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/a0c588263bd550bb131a8ed2f1a5ad318811018c/pkg/local_object_storage/blobstor/blobovniczatree/put.go#L64 So shard will increase error counter.
Owner

Hm, but why iterateDeepest return non-nil error?

Hm, but why `iterateDeepest` return non-nil error?
Author
Member

By design.

By design.
fyrchik approved these changes 2024-06-10 11:49:15 +00:00
fyrchik merged commit 6cf512e574 into master 2024-06-10 11:51:33 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1166
No description provided.