Fix big object deletion #896

Merged
fyrchik merged 5 commits from dstepanov-yadro/frostfs-node:fix/test_big_object_delete_is_flaky into master 2024-01-11 12:32:11 +00:00
  1. There was a bug: when GC deletes virtual (parent) object of complex object, it drops GC mark, but does nothing with parent object, so split info returns for Get/Exists requests.

  2. Also changed test logger. Now test related logs grouped after test. There are two differences from the previous implementation:

  • log records start with logger.go 139: prefix, it is testing package behaviour
  • log records don't colored by log levels
    But the ability to see logs for a specific test is more important, in my opinion.
  1. Fixed some tests, when storage engine's GC continues to work after test completed (caught by test -race after logger change)

Closes #895

1. There was a bug: when GC deletes virtual (parent) object of complex object, it drops GC mark, but does nothing with parent object, so split info returns for Get/Exists requests. 2. Also changed test logger. Now test related logs grouped after test. There are two differences from the previous implementation: - log records start with `logger.go 139: ` prefix, it is `testing` package behaviour - log records don't colored by log levels But the ability to see logs for a specific test is more important, in my opinion. 3. Fixed some tests, when storage engine's GC continues to work after test completed (caught by `test -race` after logger change) Closes #895
dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from e2debba928 to 8ec5eb85f1 2024-01-09 13:27:09 +00:00 Compare
dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from 4521ed1090 to ad47b177b5 2024-01-09 14:28:17 +00:00 Compare
dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from d84d846af9 to 9e78baa3da 2024-01-10 09:16:14 +00:00 Compare
dstepanov-yadro reviewed 2024-01-10 09:26:53 +00:00
@ -83,3 +83,1 @@
t.Cleanup(func() {
e.Close(context.Background())
})
defer func() {
Author
Member
https://github.com/golang/go/issues/40908
Owner

This is exactly why we got rid of zaptest and used zap.L(), these problems occured all over the tests (and writing t.Cleanup in the constructor is much easier that to remember writing it everywhere else.

This is exactly why we got rid of `zaptest` and used `zap.L()`, these problems occured all over the tests (and writing `t.Cleanup` in the constructor is much easier that to remember writing it everywhere else.
Owner
https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/621
Author
Member

There was only one place in the entire project that needed to be fixed. But clear logs make sense for all of tests.

There was only one place in the entire project that needed to be fixed. But clear logs make sense for all of tests.
Owner

Debatable: we do not need logs at all and when debugging tests usually a single test can be run.
Actually, I see lots of Cleanup with Close inside, they could trigger race detector later at some point.

I don't like returning the behaviour which clearly has problems and which we have intentionally fixed at some point.

Debatable: we do not need logs at all and when debugging tests usually a single test can be run. Actually, I see lots of `Cleanup` with `Close` inside, they could trigger race detector later at some point. I don't like returning the behaviour which clearly has problems and which we have intentionally fixed at some point.
Author
Member

I disagree with the statement: we do not need logs at all. For several tasks already, I needed normal logs of falling tests.

I disagree with the statement: `we do not need logs at all`. For several tasks already, I needed normal logs of falling tests.
Owner

Ok, but reverting a fix to a real problem is not the right approach here.

Ok, but reverting a fix to a real problem is not the right approach here.
Author
Member

Now there is no problem: race condition for t.Cleanup is actual only for engine after Init().
Looks like it was an inappropriate fix.

Now there is no problem: race condition for `t.Cleanup` is actual only for engine after `Init()`. Looks like it was an inappropriate fix.
Author
Member

Also see this comment (testing.go: 1580):

		// Do not lock t.done to allow race detector to detect race in case
		// the user does not appropriately synchronize a goroutine.

As far as I understand Cleanup requires all test background goroutines must be stopped. So using Cleanup for engine.Close is invalid usage.

Also see this comment (testing.go: 1580): ``` // Do not lock t.done to allow race detector to detect race in case // the user does not appropriately synchronize a goroutine. ``` As far as I understand `Cleanup` requires all test background goroutines must be stopped. So using `Cleanup` for `engine.Close` is invalid usage.
Owner

I am not sure the problems is gone now. The problem is that

  1. In logger we read done field cc85462b3d/src/testing/testing.go (L1017)
  2. done is written to intentionally without a mutex in cc85462b3d/src/testing/testing.go (L1580)

If you do rg 'Cleanup\(' -A4 over the codebase, there are multiple calls to releaseShard in Cleanup (and to writecache etc.), because we currently use Cleanup() in tests. Are you sure there are no goroutines in Shard which can log and run until Close() is called? In the writecache?

Or here, is it different from the list_test.go situation: cbc78a8efb/pkg/local_object_storage/engine/shards_test.go (L16) ?

I would rather see a discussion first.

I am not sure the problems is gone now. The problem is that 1. In logger we read `done` field https://github.com/golang/go/blob/cc85462b3d23193e4861813ea85e254cfe372403/src/testing/testing.go#L1017 2. `done` is written to intentionally without a mutex in https://github.com/golang/go/blob/cc85462b3d23193e4861813ea85e254cfe372403/src/testing/testing.go#L1580 If you do `rg 'Cleanup\(' -A4` over the codebase, there are multiple calls to `releaseShard` in `Cleanup` (and to writecache etc.), because we currently use `Cleanup()` in tests. Are you _sure_ there are no goroutines in `Shard` which can log and run until `Close()` is called? In the writecache? Or here, is it different from the `list_test.go` situation: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/cbc78a8efb72c40c7e39cccdc0aed4dc387fb053/pkg/local_object_storage/engine/shards_test.go#L16 ? I would rather see a discussion first.
dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from 9e78baa3da to f658b085e7 2024-01-10 09:27:57 +00:00 Compare
dstepanov-yadro changed title from WIP: Fix big object delete test to Fix big object deletion 2024-01-10 09:28:19 +00:00
dstepanov-yadro requested review from storage-core-committers 2024-01-10 09:28:25 +00:00
acid-ant approved these changes 2024-01-10 10:03:04 +00:00
@ -330,0 +333,4 @@
garbageBKT := tx.Bucket(garbageBucketName)
key := make([]byte, addressKeySize)
addrKey := addressKey(object.AddressOf(obj), key)
if garbageBKT != nil {
Member

Why not to check it earlier?

Why not to check it earlier?
Author
Member

Fixed

Fixed
acid-ant marked this conversation as resolved
dstepanov-yadro requested review from storage-core-developers 2024-01-10 10:16:27 +00:00
dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from f658b085e7 to 1dc53e0120 2024-01-10 10:24:43 +00:00 Compare
dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from 1dc53e0120 to cbc78a8efb 2024-01-10 10:32:40 +00:00 Compare
aarifullin approved these changes 2024-01-10 10:39:00 +00:00
aarifullin left a comment
Member

LGTM

LGTM
dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from cbc78a8efb to 9494abdb69 2024-01-11 07:22:40 +00:00 Compare
dstepanov-yadro reviewed 2024-01-11 07:24:04 +00:00
@ -18,4 +17,0 @@
t.Cleanup(func() {
blz.Close()
os.RemoveAll(filename)
Author
Member

t.TempDir will be removed by testing engine.

`t.TempDir` will be removed by testing engine.
dstepanov-yadro reviewed 2024-01-11 07:26:13 +00:00
@ -25,10 +25,6 @@ func (s storage) open(b *testing.B) common.Storage {
require.NoError(b, st.Open(false))
require.NoError(b, st.Init())
b.Cleanup(func() {
Author
Member

Close call moved to caller.

`Close` call moved to caller.
fyrchik approved these changes 2024-01-11 07:28:04 +00:00
Owner

Tests with -race fail, disregard approval.

Tests with `-race` fail, disregard approval.
dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from 9494abdb69 to 284ef903ac 2024-01-11 07:42:00 +00:00 Compare
dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from 284ef903ac to 0cfe98b9be 2024-01-11 08:09:59 +00:00 Compare
dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from 0cfe98b9be to 50e648e3b1 2024-01-11 08:10:52 +00:00 Compare
Author
Member

Tests with -race fail, disregard approval.

Fixed

> Tests with `-race` fail, disregard approval. Fixed
fyrchik approved these changes 2024-01-11 12:32:03 +00:00
fyrchik merged commit a2ab373a0a into master 2024-01-11 12:32:11 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#896
No description provided.