Flaky test: engine_test.go (TestLockExpiration, TestLockUserScenario, ...) #1700
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#1700
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I've noticed that one of our tests is flaky: example A, example B. Crash happens here, this method is being called from several test cases:
TestLockExpiration
,TestLockUserScenario
, probably others.func (t *testQoSLimiter) Close() {
require.Equal(t.t, int64(0), t.read.Load(), "read requests count after limiter close must be 0")
require.Equal(t.t, int64(0), t.write.Load(), "write requests count after limiter close must be 0")
}
Test failure looks like this:
This is probably related to (or is a regression of) #788.
My intuition says that the flakiness might be caused by running tests in parallel:
func TestLockExpiration(t *testing.T) {
t.Parallel()
func TestLockUserScenario(t *testing.T) {
t.Parallel()
To confirm my intuition I've ran a 1000 repetitions of
TestLockExpiration
and a 1000 repetitions of the full test suite. Without parallel testsTestLockExpiration
didn't fail and with parallel tests the expected error occured long before 1000 iterations (same error although in another test case,TestLockUserScenario
). Here are both runs (beware that second log is 600MB+, don't click "View full" in Firefox - download to a local file). This is by no means an exhaustive evidence, but I think this hunch warrants further investigation.I'm not sure if the particular test cases just should not be parallelized or it's a sign of some deeper underlying problem, so I'm bringing it here.
That's interesting. Each test uses its own engine instance, so they shouldn't affect each other, but your experiment says otherwise.
My experiment is circumstantial evidence at best. There is a non-negligible possibility that my intuition is wrong.