Fix big object deletion #896

dstepanov-yadro · 2024-01-09T09:42:21Z

dstepanov-yadro commented

2024-01-09 09:42:21 +00:00

There was a bug: when GC deletes virtual (parent) object of complex object, it drops GC mark, but does nothing with parent object, so split info returns for Get/Exists requests.
Also changed test logger. Now test related logs grouped after test. There are two differences from the previous implementation:

log records start with logger.go 139: prefix, it is testing package behaviour
log records don't colored by log levels
But the ability to see logs for a specific test is more important, in my opinion.

Fixed some tests, when storage engine's GC continues to work after test completed (caught by test -race after logger change)

Closes #895

1. There was a bug: when GC deletes virtual (parent) object of complex object, it drops GC mark, but does nothing with parent object, so split info returns for Get/Exists requests. 2. Also changed test logger. Now test related logs grouped after test. There are two differences from the previous implementation: - log records start with `logger.go 139: ` prefix, it is `testing` package behaviour - log records don't colored by log levels But the ability to see logs for a specific test is more important, in my opinion. 3. Fixed some tests, when storage engine's GC continues to work after test completed (caught by `test -race` after logger change) Closes #895

dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from e2debba928 to 8ec5eb85f1

2024-01-09 13:27:09 +00:00

Compare

dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from 4521ed1090 to ad47b177b5

2024-01-09 14:28:17 +00:00

Compare

dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from d84d846af9 to 9e78baa3da

2024-01-10 09:16:14 +00:00

Compare

dstepanov-yadro reviewed 2024-01-10 09:26:53 +00:00

pkg/local_object_storage/engine/list_test.go Outdated

					
				@ -83,3 +83,1 @@

							t.Cleanup(func() {

								e.Close(context.Background())

							})

							defer func() {

dstepanov-yadro commented

2024-01-10 09:26:53 +00:00

https://github.com/golang/go/issues/40908

fyrchik commented

2024-01-10 10:03:56 +00:00

This is exactly why we got rid of zaptest and used zap.L(), these problems occured all over the tests (and writing t.Cleanup in the constructor is much easier that to remember writing it everywhere else.

This is exactly why we got rid of `zaptest` and used `zap.L()`, these problems occured all over the tests (and writing `t.Cleanup` in the constructor is much easier that to remember writing it everywhere else.

fyrchik commented

2024-01-10 10:07:05 +00:00

#621

https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/621

dstepanov-yadro commented

2024-01-10 10:31:12 +00:00

There was only one place in the entire project that needed to be fixed. But clear logs make sense for all of tests.

fyrchik commented

2024-01-10 12:30:20 +00:00

Debatable: we do not need logs at all and when debugging tests usually a single test can be run.
Actually, I see lots of Cleanup with Close inside, they could trigger race detector later at some point.

I don't like returning the behaviour which clearly has problems and which we have intentionally fixed at some point.

Debatable: we do not need logs at all and when debugging tests usually a single test can be run. Actually, I see lots of `Cleanup` with `Close` inside, they could trigger race detector later at some point. I don't like returning the behaviour which clearly has problems and which we have intentionally fixed at some point.

dstepanov-yadro commented

2024-01-10 13:04:44 +00:00

I disagree with the statement: we do not need logs at all. For several tasks already, I needed normal logs of falling tests.

I disagree with the statement: `we do not need logs at all`. For several tasks already, I needed normal logs of falling tests.

fyrchik commented

2024-01-10 15:28:47 +00:00

Ok, but reverting a fix to a real problem is not the right approach here.

dstepanov-yadro commented

2024-01-10 15:48:30 +00:00

Now there is no problem: race condition for t.Cleanup is actual only for engine after Init().
Looks like it was an inappropriate fix.

Now there is no problem: race condition for `t.Cleanup` is actual only for engine after `Init()`. Looks like it was an inappropriate fix.

dstepanov-yadro commented

2024-01-10 16:10:05 +00:00

Also see this comment (testing.go: 1580):

		// Do not lock t.done to allow race detector to detect race in case
		// the user does not appropriately synchronize a goroutine.

As far as I understand Cleanup requires all test background goroutines must be stopped. So using Cleanup for engine.Close is invalid usage.

Also see this comment (testing.go: 1580): ``` // Do not lock t.done to allow race detector to detect race in case // the user does not appropriately synchronize a goroutine. ``` As far as I understand `Cleanup` requires all test background goroutines must be stopped. So using `Cleanup` for `engine.Close` is invalid usage.

fyrchik commented

2024-01-10 16:33:42 +00:00

I am not sure the problems is gone now. The problem is that

In logger we read done field cc85462b3d/src/testing/testing.go (L1017)
done is written to intentionally without a mutex in cc85462b3d/src/testing/testing.go (L1580)

If you do rg 'Cleanup\(' -A4 over the codebase, there are multiple calls to releaseShard in Cleanup (and to writecache etc.), because we currently use Cleanup() in tests. Are you sure there are no goroutines in Shard which can log and run until Close() is called? In the writecache?

Or here, is it different from the list_test.go situation: cbc78a8efb/pkg/local_object_storage/engine/shards_test.go (L16) ?

I would rather see a discussion first.

I am not sure the problems is gone now. The problem is that 1. In logger we read `done` field https://github.com/golang/go/blob/cc85462b3d23193e4861813ea85e254cfe372403/src/testing/testing.go#L1017 2. `done` is written to intentionally without a mutex in https://github.com/golang/go/blob/cc85462b3d23193e4861813ea85e254cfe372403/src/testing/testing.go#L1580 If you do `rg 'Cleanup\(' -A4` over the codebase, there are multiple calls to `releaseShard` in `Cleanup` (and to writecache etc.), because we currently use `Cleanup()` in tests. Are you _sure_ there are no goroutines in `Shard` which can log and run until `Close()` is called? In the writecache? Or here, is it different from the `list_test.go` situation: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/cbc78a8efb72c40c7e39cccdc0aed4dc387fb053/pkg/local_object_storage/engine/shards_test.go#L16 ? I would rather see a discussion first.

dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from 9e78baa3da to f658b085e7

2024-01-10 09:27:57 +00:00

Compare

dstepanov-yadro changed title from ~~WIP: Fix big object delete test~~ to Fix big object deletion

2024-01-10 09:28:19 +00:00

dstepanov-yadro requested review from storage-core-committers 2024-01-10 09:28:25 +00:00

acid-ant approved these changes 2024-01-10 10:03:04 +00:00

pkg/local_object_storage/metabase/delete.go Outdated

					
				@ -330,0 +333,4 @@

						garbageBKT := tx.Bucket(garbageBucketName)

						key := make([]byte, addressKeySize)

						addrKey := addressKey(object.AddressOf(obj), key)

						if garbageBKT != nil {

acid-ant commented

2024-01-10 10:02:27 +00:00

Why not to check it earlier?

dstepanov-yadro commented

2024-01-10 10:25:35 +00:00

Fixed

acid-ant marked this conversation as resolved

dstepanov-yadro requested review from storage-core-developers 2024-01-10 10:16:27 +00:00

dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from f658b085e7 to 1dc53e0120

2024-01-10 10:24:43 +00:00

Compare

dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from 1dc53e0120 to cbc78a8efb

2024-01-10 10:32:40 +00:00

Compare

aarifullin approved these changes 2024-01-10 10:39:00 +00:00

aarifullin left a comment

LGTM

fyrchik referenced this pull request

2024-01-11 07:20:28 +00:00

Fix invalid session token type for container creation #900

dstepanov-yadro force-pushed fix/test_big_object_delete_is_flaky from cbc78a8efb to 9494abdb69

2024-01-11 07:22:40 +00:00

Compare

dstepanov-yadro reviewed 2024-01-11 07:24:04 +00:00

pkg/local_object_storage/blobovnicza/get_test.go Outdated

					
				@ -18,4 +17,0 @@

						t.Cleanup(func() {

							blz.Close()

							os.RemoveAll(filename)

dstepanov-yadro commented

2024-01-11 07:24:04 +00:00

t.TempDir will be removed by testing engine.

`t.TempDir` will be removed by testing engine.

dstepanov-yadro reviewed 2024-01-11 07:26:13 +00:00

pkg/local_object_storage/blobstor/perf_test.go Outdated

					
				@ -25,10 +25,6 @@ func (s storage) open(b *testing.B) common.Storage {

					require.NoError(b, st.Open(false))

					require.NoError(b, st.Init())

					b.Cleanup(func() {