Flaky test in CI: testWaitForEvacuationCompleted #1705

Closed
opened 2025-04-04 10:20:32 +00:00 by potyarkin · 1 comment
Member

I've noticed another flaky test in Jenkins: one, two, three, four. In all cases reruns were successful.

Failing tests are TestEvacuateShardObjects, TestEvacuateShardObjectsRepOneOnly, TestEvacuateTreesLocal but they all fail in one place:

func testWaitForEvacuationCompleted(t *testing.T, e *StorageEngine) *EvacuationState {
var st *EvacuationState
var err error
require.Eventually(t, func() bool {
st, err = e.GetEvacuationState(context.Background())
require.NoError(t, err)
return st.ProcessingStatus() == EvacuateProcessStateCompleted
}, 3*time.Second, 10*time.Millisecond)
return st

Error message:

evacuate_test.go:207: 
  Error Trace: .../pkg/local_object_storage/engine/evacuate_test.go:207
                 .../pkg/local_object_storage/engine/evacuate_test.go:757
  Error:       Condition never satisfied
  Test:        TestEvacuateShardObjectsRepOneOnly

I suppose that failure is caused by CI runner being slow while executing other jobs in parallel. Test code just does not get to the desired state fast enough. If this suggestion seems sensible, let's increase the timeout: https://review.frostfs.info/c/TrueCloudLab/frostfs-node/+/100

I also welcome anyone coming up with a better (not time-based) solution.

I've noticed another flaky test in Jenkins: [one](https://ci.frostfs.info/job/gerrit/job/frostfs-node/296/pipeline-console/?selected-node=17), [two](https://ci.frostfs.info/job/gerrit/job/frostfs-node/295/pipeline-console/?selected-node=19), [three](https://ci.frostfs.info/job/gerrit/job/frostfs-node/281/pipeline-console/?selected-node=21), [four](https://ci.frostfs.info/job/gerrit/job/frostfs-node/279/pipeline-console/?selected-node=21). In all cases reruns were successful. Failing tests are TestEvacuateShardObjects, TestEvacuateShardObjectsRepOneOnly, TestEvacuateTreesLocal but they all fail in one place: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/634de975094b8f83b5b9eeff15b7fd2bfdd6f1fc/pkg/local_object_storage/engine/evacuate_test.go#L204-L212 Error message: ``` evacuate_test.go:207: Error Trace: .../pkg/local_object_storage/engine/evacuate_test.go:207 .../pkg/local_object_storage/engine/evacuate_test.go:757 Error: Condition never satisfied Test: TestEvacuateShardObjectsRepOneOnly ``` I suppose that failure is caused by CI runner being slow while executing other jobs in parallel. Test code just does not get to the desired state fast enough. If this suggestion seems sensible, let's increase the timeout: https://review.frostfs.info/c/TrueCloudLab/frostfs-node/+/100 I also welcome anyone coming up with a better (not time-based) solution.
potyarkin added the
bug
triage
labels 2025-04-04 10:20:32 +00:00
potyarkin removed the
triage
label 2025-04-04 10:22:00 +00:00
Author
Member

Increasing the timeout from 3 to 6 seconds did not help: build 361, build 363 and build 402 have failed even with increased timeout.

It is possible that 6 seconds is just not enough and we should increase the timeout further. I'm not familiar with the codebase enough to make that judgement so I'll stop my guesswork here.

Increasing the timeout from 3 to 6 seconds did not help: [build 361](https://ci.frostfs.info/job/gerrit/job/frostfs-node/361/pipeline-console/?selected-node=19), [build 363](https://ci.frostfs.info/job/gerrit/job/frostfs-node/363/pipeline-console/?selected-node=21) and [build 402](https://ci.frostfs.info/job/gerrit/job/frostfs-node/402/pipeline-console/?selected-node=17) have failed even with increased timeout. It is possible that 6 seconds is just not enough and we should increase the timeout further. I'm not familiar with the codebase enough to make that judgement so I'll stop my guesswork here.
potyarkin added the
help wanted
label 2025-04-08 07:18:53 +00:00
dstepanov-yadro was assigned by fyrchik 2025-04-08 10:36:17 +00:00
fyrchik added the
frostfs-node
internal
labels 2025-04-08 10:36:57 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1705
No description provided.