TestGCDropsObjectInhumedFromWritecache is flaky #1320

Closed
opened 2024-08-19 13:41:47 +00:00 by dstepanov-yadro · 2 comments

Expected Behavior

Tests pass.

Current Behavior

--- FAIL: TestGCDropsObjectInhumedFromWritecache (0.00s)
    --- FAIL: TestGCDropsObjectInhumedFromWritecache/flush_write-cache_before_inhume (0.10s)
        logger.go:146: 2024-08-19T13:28:15.844Z	DEBUG	blobstor/control.go:15	opening...	{"component": "BlobStor"}
        logger.go:146: 2024-08-19T13:28:15.862Z	DEBUG	blobstor/control.go:54	initializing...	{"component": "BlobStor"}
        logger.go:146: 2024-08-19T13:28:15.862Z	DEBUG	blobovniczatree/control.go:26	initializing Blobovnicza's
        logger.go:146: 2024-08-19T13:28:15.871Z	INFO	shard/rebuilder.go:49	blobstore rebuild started
        logger.go:146: 2024-08-19T13:28:15.871Z	DEBUG	blobovniczatree/rebuild.go:48	completing previous rebuild if failed...
        logger.go:146: 2024-08-19T13:28:15.878Z	DEBUG	blobovniczatree/rebuild.go:56	previous rebuild completed successfully
        logger.go:146: 2024-08-19T13:28:15.878Z	DEBUG	blobovniczatree/rebuild.go:58	collecting blobovniczas to rebuild...
        logger.go:146: 2024-08-19T13:28:15.895Z	DEBUG	blobovnicza/control.go:26	creating directory for BoltDB	{"component": "Blobovnicza", "path": "/tmp/TestGCDropsObjectInhumedFromWritecacheflush_write-cache_before_inhume3572293868/001/blob/blobovnicza/0/0.db", "ro": false}
        logger.go:146: 2024-08-19T13:28:15.895Z	DEBUG	blobovnicza/control.go:40	opening BoltDB	{"component": "Blobovnicza", "path": "/tmp/TestGCDropsObjectInhumedFromWritecacheflush_write-cache_before_inhume3572293868/001/blob/blobovnicza/0/0.db", "permissions": "-rwxrwxrwx"}
        logger.go:146: 2024-08-19T13:28:15.896Z	DEBUG	blobovnicza/control.go:66	initializing...	{"component": "Blobovnicza", "object size limit": 1048576, "storage size limit": 1073741824}
        logger.go:146: 2024-08-19T13:28:15.896Z	DEBUG	blobovnicza/control.go:84	creating bucket for size range	{"component": "Blobovnicza", "range": "[0:32768]"}
        logger.go:146: 2024-08-19T13:28:15.896Z	DEBUG	blobovnicza/control.go:84	creating bucket for size range	{"component": "Blobovnicza", "range": "[32769:65536]"}
        logger.go:146: 2024-08-19T13:28:15.896Z	DEBUG	blobovnicza/control.go:84	creating bucket for size range	{"component": "Blobovnicza", "range": "[65537:131072]"}
        logger.go:146: 2024-08-19T13:28:15.896Z	DEBUG	blobovnicza/control.go:84	creating bucket for size range	{"component": "Blobovnicza", "range": "[131073:262144]"}
        logger.go:146: 2024-08-19T13:28:15.896Z	DEBUG	blobovnicza/control.go:84	creating bucket for size range	{"component": "Blobovnicza", "range": "[262145:524288]"}
        logger.go:146: 2024-08-19T13:28:15.896Z	DEBUG	blobovnicza/control.go:84	creating bucket for size range	{"component": "Blobovnicza", "range": "[524289:1048576]"}
        logger.go:146: 2024-08-19T13:28:15.898Z	DEBUG	blobovnicza/control.go:134	saving counters to blobovnicza's meta...	{"component": "Blobovnicza", "size": 0, "items": 0}
        logger.go:146: 2024-08-19T13:28:15.898Z	DEBUG	blobovnicza/control.go:144	saving counters to blobovnicza's meta completed successfully	{"component": "Blobovnicza", "size": 0, "items": 0}
        logger.go:146: 2024-08-19T13:28:15.911Z	DEBUG	log/log.go:11	local object storage operation	{"component": "BlobStor", "address": "3qSmXCpGnFvbdiSCULcZ5fjowbNGZkRrCHAC7tE2PvUQ/GrB4gafsTLbFm5q3wTw8WWdoQZKz85CU8GfHW33DxrGp", "op": "PUT", "type": "blobovnicza", "storage_id": "0/0"}
        logger.go:146: 2024-08-19T13:28:15.923Z	DEBUG	shard/gc.go:289	garbage remove started
        logger.go:146: 2024-08-19T13:28:15.923Z	DEBUG	shard/delete.go:133	can't remove object from blobStor	{"object_address": "3qSmXCpGnFvbdiSCULcZ5fjowbNGZkRrCHAC7tE2PvUQ/GrB4gafsTLbFm5q3wTw8WWdoQZKz85CU8GfHW33DxrGp", "error": "rebuild is in progress, the operation cannot be performed", "trace_id": ""}
        logger.go:146: 2024-08-19T13:28:15.923Z	DEBUG	shard/gc.go:342	garbage remove completed
        gc_test.go:249: 
            	Error Trace:	/workspace/TrueCloudLab/frostfs-node/pkg/local_object_storage/shard/gc_test.go:249
            	            				/workspace/TrueCloudLab/frostfs-node/pkg/local_object_storage/shard/gc_test.go:179
            	Error:      	Not equal: 
            	            	expected: 0x1
            	            	actual  : 0x0
            	Test:       	TestGCDropsObjectInhumedFromWritecache/flush_write-cache_before_inhume
        logger.go:146: 2024-08-19T13:28:15.939Z	WARN	blobovniczatree/rebuild.go:61	collecting blobovniczas to rebuild failed	{"error": "context canceled"}
        logger.go:146: 2024-08-19T13:28:15.939Z	ERROR	blobstor/rebuild.go:32	failed to rebuild storages	{"component": "BlobStor", "failed_storage_path": "/tmp/TestGCDropsObjectInhumedFromWritecacheflush_write-cache_before_inhume3572293868/001/blob/blobovnicza", "failed_storage_type": "blobovnicza", "error": "context canceled"}
        logger.go:146: 2024-08-19T13:28:15.939Z	INFO	blobstor/rebuild.go:40	storages rebuild completed	{"component": "BlobStor", "success": false, "total_files_removed": 0, "total_objects_moved": 0}
        logger.go:146: 2024-08-19T13:28:15.939Z	WARN	shard/rebuilder.go:51	failed to rebuild blobstore	{"error": "context canceled"}
        logger.go:146: 2024-08-19T13:28:15.939Z	INFO	shard/rebuilder.go:73	blobstore rebuild stopped
        logger.go:146: 2024-08-19T13:28:15.940Z	DEBUG	blobstor/control.go:71	closing...	{"component": "BlobStor"}
        logger.go:146: 2024-08-19T13:28:15.940Z	DEBUG	blobovnicza/control.go:165	closing BoltDB	{"component": "Blobovnicza", "path": "/tmp/TestGCDropsObjectInhumedFromWritecacheflush_write-cache_before_inhume3572293868/001/blob/blobovnicza/0/0.db"}
        logger.go:146: 2024-08-19T13:28:15.940Z	DEBUG	shard/gc.go:225	GC is stopped
        logger.go:146: 2024-08-19T13:28:15.940Z	INFO	shard/gc.go:266	waiting for GC workers to stop...
        logger.go:146: 2024-08-19T13:28:15.940Z	WARN	shard/gc.go:164	stop event listener by closed `stop` channel
FAIL
FAIL	git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/shard	0.948s
## Expected Behavior Tests pass. ## Current Behavior ``` --- FAIL: TestGCDropsObjectInhumedFromWritecache (0.00s) --- FAIL: TestGCDropsObjectInhumedFromWritecache/flush_write-cache_before_inhume (0.10s) logger.go:146: 2024-08-19T13:28:15.844Z DEBUG blobstor/control.go:15 opening... {"component": "BlobStor"} logger.go:146: 2024-08-19T13:28:15.862Z DEBUG blobstor/control.go:54 initializing... {"component": "BlobStor"} logger.go:146: 2024-08-19T13:28:15.862Z DEBUG blobovniczatree/control.go:26 initializing Blobovnicza's logger.go:146: 2024-08-19T13:28:15.871Z INFO shard/rebuilder.go:49 blobstore rebuild started logger.go:146: 2024-08-19T13:28:15.871Z DEBUG blobovniczatree/rebuild.go:48 completing previous rebuild if failed... logger.go:146: 2024-08-19T13:28:15.878Z DEBUG blobovniczatree/rebuild.go:56 previous rebuild completed successfully logger.go:146: 2024-08-19T13:28:15.878Z DEBUG blobovniczatree/rebuild.go:58 collecting blobovniczas to rebuild... logger.go:146: 2024-08-19T13:28:15.895Z DEBUG blobovnicza/control.go:26 creating directory for BoltDB {"component": "Blobovnicza", "path": "/tmp/TestGCDropsObjectInhumedFromWritecacheflush_write-cache_before_inhume3572293868/001/blob/blobovnicza/0/0.db", "ro": false} logger.go:146: 2024-08-19T13:28:15.895Z DEBUG blobovnicza/control.go:40 opening BoltDB {"component": "Blobovnicza", "path": "/tmp/TestGCDropsObjectInhumedFromWritecacheflush_write-cache_before_inhume3572293868/001/blob/blobovnicza/0/0.db", "permissions": "-rwxrwxrwx"} logger.go:146: 2024-08-19T13:28:15.896Z DEBUG blobovnicza/control.go:66 initializing... {"component": "Blobovnicza", "object size limit": 1048576, "storage size limit": 1073741824} logger.go:146: 2024-08-19T13:28:15.896Z DEBUG blobovnicza/control.go:84 creating bucket for size range {"component": "Blobovnicza", "range": "[0:32768]"} logger.go:146: 2024-08-19T13:28:15.896Z DEBUG blobovnicza/control.go:84 creating bucket for size range {"component": "Blobovnicza", "range": "[32769:65536]"} logger.go:146: 2024-08-19T13:28:15.896Z DEBUG blobovnicza/control.go:84 creating bucket for size range {"component": "Blobovnicza", "range": "[65537:131072]"} logger.go:146: 2024-08-19T13:28:15.896Z DEBUG blobovnicza/control.go:84 creating bucket for size range {"component": "Blobovnicza", "range": "[131073:262144]"} logger.go:146: 2024-08-19T13:28:15.896Z DEBUG blobovnicza/control.go:84 creating bucket for size range {"component": "Blobovnicza", "range": "[262145:524288]"} logger.go:146: 2024-08-19T13:28:15.896Z DEBUG blobovnicza/control.go:84 creating bucket for size range {"component": "Blobovnicza", "range": "[524289:1048576]"} logger.go:146: 2024-08-19T13:28:15.898Z DEBUG blobovnicza/control.go:134 saving counters to blobovnicza's meta... {"component": "Blobovnicza", "size": 0, "items": 0} logger.go:146: 2024-08-19T13:28:15.898Z DEBUG blobovnicza/control.go:144 saving counters to blobovnicza's meta completed successfully {"component": "Blobovnicza", "size": 0, "items": 0} logger.go:146: 2024-08-19T13:28:15.911Z DEBUG log/log.go:11 local object storage operation {"component": "BlobStor", "address": "3qSmXCpGnFvbdiSCULcZ5fjowbNGZkRrCHAC7tE2PvUQ/GrB4gafsTLbFm5q3wTw8WWdoQZKz85CU8GfHW33DxrGp", "op": "PUT", "type": "blobovnicza", "storage_id": "0/0"} logger.go:146: 2024-08-19T13:28:15.923Z DEBUG shard/gc.go:289 garbage remove started logger.go:146: 2024-08-19T13:28:15.923Z DEBUG shard/delete.go:133 can't remove object from blobStor {"object_address": "3qSmXCpGnFvbdiSCULcZ5fjowbNGZkRrCHAC7tE2PvUQ/GrB4gafsTLbFm5q3wTw8WWdoQZKz85CU8GfHW33DxrGp", "error": "rebuild is in progress, the operation cannot be performed", "trace_id": ""} logger.go:146: 2024-08-19T13:28:15.923Z DEBUG shard/gc.go:342 garbage remove completed gc_test.go:249: Error Trace: /workspace/TrueCloudLab/frostfs-node/pkg/local_object_storage/shard/gc_test.go:249 /workspace/TrueCloudLab/frostfs-node/pkg/local_object_storage/shard/gc_test.go:179 Error: Not equal: expected: 0x1 actual : 0x0 Test: TestGCDropsObjectInhumedFromWritecache/flush_write-cache_before_inhume logger.go:146: 2024-08-19T13:28:15.939Z WARN blobovniczatree/rebuild.go:61 collecting blobovniczas to rebuild failed {"error": "context canceled"} logger.go:146: 2024-08-19T13:28:15.939Z ERROR blobstor/rebuild.go:32 failed to rebuild storages {"component": "BlobStor", "failed_storage_path": "/tmp/TestGCDropsObjectInhumedFromWritecacheflush_write-cache_before_inhume3572293868/001/blob/blobovnicza", "failed_storage_type": "blobovnicza", "error": "context canceled"} logger.go:146: 2024-08-19T13:28:15.939Z INFO blobstor/rebuild.go:40 storages rebuild completed {"component": "BlobStor", "success": false, "total_files_removed": 0, "total_objects_moved": 0} logger.go:146: 2024-08-19T13:28:15.939Z WARN shard/rebuilder.go:51 failed to rebuild blobstore {"error": "context canceled"} logger.go:146: 2024-08-19T13:28:15.939Z INFO shard/rebuilder.go:73 blobstore rebuild stopped logger.go:146: 2024-08-19T13:28:15.940Z DEBUG blobstor/control.go:71 closing... {"component": "BlobStor"} logger.go:146: 2024-08-19T13:28:15.940Z DEBUG blobovnicza/control.go:165 closing BoltDB {"component": "Blobovnicza", "path": "/tmp/TestGCDropsObjectInhumedFromWritecacheflush_write-cache_before_inhume3572293868/001/blob/blobovnicza/0/0.db"} logger.go:146: 2024-08-19T13:28:15.940Z DEBUG shard/gc.go:225 GC is stopped logger.go:146: 2024-08-19T13:28:15.940Z INFO shard/gc.go:266 waiting for GC workers to stop... logger.go:146: 2024-08-19T13:28:15.940Z WARN shard/gc.go:164 stop event listener by closed `stop` channel FAIL FAIL git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/shard 0.948s ```
dstepanov-yadro added the
bug
triage
labels 2024-08-19 13:41:47 +00:00
fyrchik changed title from Fix unit tests to TestGCDropsObjectInhumedFromWritecache is flaky 2024-08-19 14:36:40 +00:00
fyrchik added
frostfs-node
internal
and removed
triage
labels 2024-08-19 14:36:55 +00:00
a-savchuk was assigned by fyrchik 2024-08-23 07:32:49 +00:00
Member

When a test shard is created, the rebuild process starts and runs asynchronously. However, if the GC begins before the rebuild is complete, it can lead to test failures.

     [pkg/local_object_storage/shard/shard_test.go]
      44 func newCustomShard(t testing.TB, enableWriteCache bool, o shardOptions) *Shard {
     ...
     110 	require.NoError(t, sh.Init(context.Background()))
     ...
     113 }

     [pkg/local_object_storage/shard/control.go]
     101 func (s *Shard) Init(ctx context.Context) error {
     ...
     177 	s.rb = newRebuilder(s.rebuildLimiter)
     178 	if !m.NoMetabase() {
     179 		s.rb.Start(ctx, s.blobStor, s.metaBase, s.log)
     180 	}
     182 }

     [pkg/local_object_storage/shard/rebuilder.go]
     32 func (r *rebuilder) Start(ctx context.Context, bs *blobstor.BlobStor, mb *meta.DB, log *logger.Logger) {
     ...
     36 	r.start(ctx, bs, mb, log)
     37 }
     ...
     39 func (r *rebuilder) start(ctx context.Context, bs *blobstor.BlobStor, mb *meta.DB, log *logger.Logger) {
     ...
     45 	r.wg.Add(1)
     46 	go func() {
     47 		defer r.wg.Done()
     48 
     49 		log.Info(logs.BlobstoreRebuildStarted)
     50 		if err := bs.Rebuild(ctx, &mbStorageIDUpdate{mb: mb}, r.limiter); err != nil {
     51 			log.Warn(logs.FailedToRebuildBlobstore, zap.Error(err))
     52 		} else {
     53 			log.Info(logs.BlobstoreRebuildCompletedSuccessfully)
     54 		}
     55 	}() <-- Fire and forget
     56 }

@fyrchik suggested either disabling the rebuild for this test or adding an option to make the rebuild process synchronous. @dstepanov-yadro, could you advise on the best approach?

When a test shard is created, the rebuild process starts and runs asynchronously. However, if the GC begins before the rebuild is complete, it can lead to test failures. ``` [pkg/local_object_storage/shard/shard_test.go] 44 func newCustomShard(t testing.TB, enableWriteCache bool, o shardOptions) *Shard { ... 110 require.NoError(t, sh.Init(context.Background())) ... 113 } [pkg/local_object_storage/shard/control.go] 101 func (s *Shard) Init(ctx context.Context) error { ... 177 s.rb = newRebuilder(s.rebuildLimiter) 178 if !m.NoMetabase() { 179 s.rb.Start(ctx, s.blobStor, s.metaBase, s.log) 180 } 182 } [pkg/local_object_storage/shard/rebuilder.go] 32 func (r *rebuilder) Start(ctx context.Context, bs *blobstor.BlobStor, mb *meta.DB, log *logger.Logger) { ... 36 r.start(ctx, bs, mb, log) 37 } ... 39 func (r *rebuilder) start(ctx context.Context, bs *blobstor.BlobStor, mb *meta.DB, log *logger.Logger) { ... 45 r.wg.Add(1) 46 go func() { 47 defer r.wg.Done() 48 49 log.Info(logs.BlobstoreRebuildStarted) 50 if err := bs.Rebuild(ctx, &mbStorageIDUpdate{mb: mb}, r.limiter); err != nil { 51 log.Warn(logs.FailedToRebuildBlobstore, zap.Error(err)) 52 } else { 53 log.Info(logs.BlobstoreRebuildCompletedSuccessfully) 54 } 55 }() <-- Fire and forget 56 } ``` @fyrchik suggested either disabling the rebuild for this test or adding an option to make the rebuild process synchronous. @dstepanov-yadro, could you advise on the best approach?
Author
Member

My advise is adding an option to disabling the rebuild

My advise is `adding an option to disabling the rebuild`
fyrchik added this to the v0.43.0 milestone 2024-09-02 07:45:30 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1320
No description provided.