engine: Lock operation has a power to resurrect expired regular objects #1527

Closed
opened 2024-12-02 09:58:00 +00:00 by a-savchuk · 5 comments
Member

The method (*StorageEngine).Lock allows locking expired regular objects.

Furthermore, once an expired object is locked, it becomes available for retrieval because of a tricky check in the meta.objectStatus method. This method is indirectly used by both meta.(*DB).Get and meta.(*DB).Exists:

func objectStatus(tx *bbolt.Tx, addr oid.Address, currEpoch uint64) (uint8, error) {
	// locked object could not be removed/marked with GC/expired
	if objectLocked(tx, addr.Container(), addr.Object()) {
		return 0, nil
	}

Expected Behavior

The engine shouldn't lock the expired object and should return some error.

Current Behavior

The engine locks the expired object.

Steps to Reproduce (for bugs)

Consider the following unit test which fails:

$ go test -run TestLockExpired ./pkg/local_object_storage/engine/ | grep --invert-match "logger"
--- FAIL: TestLockExpired (0.04s)
    --- FAIL: TestLockExpired/lock_expired_object (0.01s)
        lock_test.go:363: 
                Error Trace:    /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:363
                Error:          An error is expected but got nil.
                Test:           TestLockExpired/lock_expired_object
    --- FAIL: TestLockExpired/check_object_presence_if_it's_expired_after_lock_attempt (0.00s)
        lock_test.go:368: 
                Error Trace:    /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:368
                Error:          An error is expected but got nil.
                Test:           TestLockExpired/check_object_presence_if_it's_expired_after_lock_attempt
        lock_test.go:369: 
                Error Trace:    /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:369
                Error:          Should be true
                Test:           TestLockExpired/check_object_presence_if_it's_expired_after_lock_attempt
    --- FAIL: TestLockExpired/get_expired_object_after_lock_attempt (0.00s)
        lock_test.go:374: 
                Error Trace:    /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:374
                Error:          An error is expected but got nil.
                Test:           TestLockExpired/get_expired_object_after_lock_attempt
        lock_test.go:375: 
                Error Trace:    /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:375
                Error:          Should be true
                Test:           TestLockExpired/get_expired_object_after_lock_attempt
    --- FAIL: TestLockExpired/inhume_expired_object_after_lock_attempt (0.01s)
        lock_test.go:380: 
                Error Trace:    /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:380
                Error:          Received unexpected error:
                                metabase inhume: status: code = 2050 message = object is locked
                Test:           TestLockExpired/inhume_expired_object_after_lock_attempt
FAIL
FAIL    git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/engine      0.049s
FAIL

Context

It prevents the garbage collector from handling expired regular objects as expected, because the GC uses the shard.(*Shard).Inhume method to delete them (the same method used in the test above).

The method `(*StorageEngine).Lock` allows locking expired regular objects. **Furthermore**, once an expired object is locked, it becomes available for retrieval because of a tricky check in the `meta.objectStatus` method. This method is indirectly used by both `meta.(*DB).Get` and `meta.(*DB).Exists`: ```go func objectStatus(tx *bbolt.Tx, addr oid.Address, currEpoch uint64) (uint8, error) { // locked object could not be removed/marked with GC/expired if objectLocked(tx, addr.Container(), addr.Object()) { return 0, nil } ``` ## Expected Behavior The engine shouldn't lock the expired object and should return some error. ## Current Behavior The engine locks the expired object. ## Steps to Reproduce (for bugs) Consider the following [unit test](https://git.frostfs.info/a-savchuk/frostfs-node/src/commit/435803b00d9ca674fded0fce393ec3097f6484fc/pkg/local_object_storage/engine/lock_test.go#L300-L382) which fails: ```console $ go test -run TestLockExpired ./pkg/local_object_storage/engine/ | grep --invert-match "logger" --- FAIL: TestLockExpired (0.04s) --- FAIL: TestLockExpired/lock_expired_object (0.01s) lock_test.go:363: Error Trace: /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:363 Error: An error is expected but got nil. Test: TestLockExpired/lock_expired_object --- FAIL: TestLockExpired/check_object_presence_if_it's_expired_after_lock_attempt (0.00s) lock_test.go:368: Error Trace: /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:368 Error: An error is expected but got nil. Test: TestLockExpired/check_object_presence_if_it's_expired_after_lock_attempt lock_test.go:369: Error Trace: /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:369 Error: Should be true Test: TestLockExpired/check_object_presence_if_it's_expired_after_lock_attempt --- FAIL: TestLockExpired/get_expired_object_after_lock_attempt (0.00s) lock_test.go:374: Error Trace: /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:374 Error: An error is expected but got nil. Test: TestLockExpired/get_expired_object_after_lock_attempt lock_test.go:375: Error Trace: /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:375 Error: Should be true Test: TestLockExpired/get_expired_object_after_lock_attempt --- FAIL: TestLockExpired/inhume_expired_object_after_lock_attempt (0.01s) lock_test.go:380: Error Trace: /home/a_savchuk/repos/frostfs/node/pkg/local_object_storage/engine/lock_test.go:380 Error: Received unexpected error: metabase inhume: status: code = 2050 message = object is locked Test: TestLockExpired/inhume_expired_object_after_lock_attempt FAIL FAIL git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/engine 0.049s FAIL ``` ## Context It prevents the garbage collector from handling expired regular objects as expected, because the GC uses the `shard.(*Shard).Inhume` method to delete them (the same method used in the test above).
a-savchuk added the
bug
frostfs-node
P3
triage
labels 2024-12-02 09:58:00 +00:00
Owner

The engine shouldn't lock the expired object and should return some error.

It is like this by design, locked objects must be able to be retrieved until the last lock is removed.

>The engine shouldn't lock the expired object and should return some error. It is like this by design, locked objects must be able to be retrieved until the last lock is removed.
fyrchik removed the
P3
triage
labels 2024-12-02 11:27:49 +00:00
Author
Member

It is like this by design, locked objects must be able to be retrieved until the last lock is removed.

I agree, but should it be allowed to lock already expired object?

I mean the following order:

  1. the object expires
  2. someone tries to lock.

and not this order (in this case, the object is available after its expiration):

  1. someone locks
  2. the object expires
> It is like this by design, locked objects must be able to be retrieved until the last lock is removed. I agree, but should it be allowed to lock already expired object? I mean the following order: 1) the object expires 2) someone tries to lock. and not this order (in this case, the object is available after its expiration): 1) someone locks 2) the object expires
Owner

In general, we don't know the order.
E.g. what to do with this case:

  1. Someone locks.
  2. Object expires.
  3. Node A loses the LOCK (or even haven't received it in (1)).
  4. Node B replicates the LOCK on Node A <- ???
In general, we don't know the order. E.g. what to do with this case: 1. Someone locks. 2. Object expires. 3. Node A loses the LOCK (or even haven't received it in (1)). 4. Node B replicates the LOCK on Node A <- ???
Author
Member

Should I add a test* to document this behavior?

*Not the test mentioned above, but another one

Should I add a test* to document this behavior? \*Not the test mentioned above, but another one
Owner

Yes, a test would be nice, if we do not have any.
Metabase resync is similar to the scenario I described btw.

Yes, a test would be nice, if we do not have any. Metabase resync is similar to the scenario I described btw.
a-savchuk self-assigned this 2024-12-02 13:28:40 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1527
No description provided.