New epoch event affects stream of object.Delete operations #1433
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#1433
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Expected Behavior
Consecutive object.Delete operations are not affected by new epoch event.
Current Behavior
Some object.Delete operation fail with
remove object via client localhost:8080: delete object on client: status: code = 1024 message = incomplete object PUT by placement: could not write header: (writer.LocalTarget) could not put object to local storage: could not put object to any shard
Investigation
The issue was found in AIO environment with single storage node. Node behaviour in dev-env environment with various container placement policies may be different.
This is still hypothesis, so read it with the grain of salt
The issue appears when new epoch triggers collectExpiredTombstones routine. This routine may be executing between these two operations:
storage.Delete()
Between (1) and (2) GC may kick in and collect all available records from graveyard bucket in metabase. After that, GC checks if object should remain or be removed by applying GCMark.
To make decision, GC searches for object expiration data in cache and in the network. In this case, when (1) has happened and (2) has not, both checks fails. And GC decides to mark tombstone ID with GCMark.
Then, during (2) step, storage engine calls Exists method of the metabase. However, tombstone ID is already marked with GCMark, objectStatus check returns error, and put operation eventually fails.
Possible Solution
For this exact issue solution may be quite simple: switch order or operation in local target writer:
Maybe there is a reason to keep such order, but this solution does not break any tests and fixes the issue.
Steps to Reproduce (for bugs)
object.Delete
operationsI wrote simple main.go.txt to create this load, but one can use k6 for the same purpose.
Context
This issue was found during minio warp runs. In the end of the benchmark test, it removes plenty of objects and it fails sometimes.
Regression
No
Your Environment
frostfs-aio v1.6.1
frostfs-node v0.42.15 / v0.44.0-rc.5-14-g90f36693 (master)
I have reproduced it on the shard level, here are some tricks to make it work for the posterity:
removeGarbage
routine from running (because GC mark in the description will be removed, and thus the object could be put again).collectExpiredTombstones
need to be set to a shard method and tombstone source need to return false (which may be the culprit -- it will be first when we put the first tombstone copy).The GC remover interval is 100ms, we need either to increase it or to disable
removeGarbage()
routine.IsTombstoneAvailable()
#1434