[#118] node: add unit concurrent tests for blobstor #233

Merged
fyrchik merged 1 commit from aarifullin/frostfs-node:feature/blobstor_concurrent_tests into master 2023-04-19 10:22:52 +00:00
Member

Signed-off-by: Airat Arifullin a.arifullin@yadro.com

Signed-off-by: Airat Arifullin a.arifullin@yadro.com
aarifullin reviewed 2023-04-10 18:50:07 +00:00
@ -176,0 +236,4 @@
})
}
func TestConcurrentDelete(t *testing.T) {
Author
Member

No more good useful scenario came to my mind apart from the two ones. Would be glad to listen to your ideas

No more good useful scenario came to my mind apart from the two ones. Would be glad to listen to your ideas
aarifullin reviewed 2023-04-10 18:53:20 +00:00
@ -176,0 +182,4 @@
smallSizeLimit = 512
// concurrentPutCount is fstree implementation specific
concurrentPutCount = 5
Author
Member

@fyrchik
I've got this from this constant.
It looks like the constant sets the restriction for the concurent putting. Should we get this documented somehow?

@fyrchik I've got this from this [constant](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/pkg/local_object_storage/blobstor/fstree/fstree.go#L262). It looks like the constant sets the restriction for the concurent putting. Should we get this documented somehow?
aarifullin requested review from storage-core-committers 2023-04-10 18:53:32 +00:00
aarifullin requested review from storage-core-developers 2023-04-10 18:53:32 +00:00
Member

I'm a bit skeptical about the value of these tests. A few questions:

  1. What kind of bugs are they supposed to catch?
  2. Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky?
  3. Are they supposed to run as regular tests or specifically under the race detector?

I guess they will still be useful to catch obvious/basic concurrency issues, just wondering whether we can or should do better.

I'm a bit skeptical about the value of these tests. A few questions: 1. What kind of bugs are they supposed to catch? 2. Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky? 3. Are they supposed to run as regular tests or specifically under the race detector? I guess they will still be useful to catch obvious/basic concurrency issues, just wondering whether we can or should do better.
dstepanov-yadro reviewed 2023-04-11 08:47:32 +00:00
@ -176,0 +264,4 @@
prm.Address = object.AddressOf(obj)
if _, err := b.Delete(prm); err != nil {
require.ErrorContains(t, err, "object not found")
}

Maybe add check that object is not exists?

Maybe add check that object is not exists?
Author
Member

There are many ways to check if an object exists. But for me it's more logically correct to check the error from deletion because I check concurrent deletes

There are many ways to check if an object exists. But for me it's more logically correct to check the error from deletion because I check concurrent deletes

to check the error from deletion because I check concurrent deletes

Yes, it's true. I'm talking about the fact that after deletion there should be no object exist. Maybe delete does not return an error, but the object is not deleted.

> to check the error from deletion because I check concurrent deletes Yes, it's true. I'm talking about the fact that after deletion there should be no object exist. Maybe `delete` does not return an error, but the object is not deleted.
Author
Member

Maybe delete does not return an error, but the object is not deleted

You're right. That must be checked. I've added check about object's existence

> Maybe delete does not return an error, but the object is not deleted You're right. That must be checked. I've added check about object's existence
dstepanov-yadro marked this conversation as resolved

I think at least we should add -race flag for go test.

I think at least we should add `-race` flag for `go test`.
aarifullin force-pushed feature/blobstor_concurrent_tests from a3a54534af to 91213fede3 2023-04-11 15:53:41 +00:00 Compare
Author
Member
  1. What kind of bugs are they supposed to catch?

As far as I understood the issue, here we should check logical correctness and data consistency (in the case of concurrent writes)

  1. Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky?

Yes, I did. See the test "put the same big object with error"

  1. Are they supposed to run as regular tests or specifically under the race detector?

From my point of view they should be run as regular tests because we rather try to check logical error but not incorrect resource sharing, but I don't mind if enable -race option for go test

> 1. What kind of bugs are they supposed to catch? As far as I understood [the issue](https://git.frostfs.info/TrueCloudLab/frostfs-node/issues/118), here we should check logical correctness and data consistency (in the case of concurrent writes) > 2. Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky? Yes, I did. See the test `"put the same big object with error"` > 3. Are they supposed to run as regular tests or specifically under the race detector? From my point of view they should be run as regular tests because we rather try to check logical error but not incorrect resource sharing, but I don't mind if enable `-race` option for `go test`
aarifullin force-pushed feature/blobstor_concurrent_tests from 91213fede3 to ea6ad1e13b 2023-04-11 16:01:34 +00:00 Compare
carpawell reviewed 2023-04-11 19:24:16 +00:00
@ -176,0 +182,4 @@
smallSizeLimit = 512
// concurrentPutCount is fstree implementation specific
concurrentPutCount = 5
Contributor

kinda magic number to me: the const from the FSTree may vary independently

@fyrchik, what problem cases are expected to be tested by that PR?

kinda magic number to me: the const from the `FSTree` may vary independently @fyrchik, what problem cases are expected to be tested by that PR?
Author
Member

kinda magic number to me: the const from the FSTree may vary independently

Agree. That constant in Tree affects the concurrent write. If concurrent writers are > concurrentPutCount, then errors occur

> kinda magic number to me: the const from the FSTree may vary independently Agree. That constant in Tree affects the concurrent write. If concurrent writers are `> concurrentPutCount`, then errors occur
@ -176,0 +185,4 @@
concurrentPutCount = 5
)
newBlobStor := func(t *testing.T, compress bool) *BlobStor {
Contributor

not used var here and below?

not used var here and below?
carpawell marked this conversation as resolved
@ -176,0 +202,4 @@
}
testPut := func(t *testing.T, wg *sync.WaitGroup, b *BlobStor, obj *objectSDK.Object) {
defer wg.Done()
Contributor

why defer here? looks useless without multiple return

same in testPutFileExistsError

why `defer` here? looks useless without multiple `return` same in `testPutFileExistsError`
Author
Member

wg is no longer used in the such methods

wg is no longer used in the such methods
carpawell marked this conversation as resolved
@ -176,0 +235,4 @@
bigObj := testObject(smallSizeLimit * 2)
wg := &sync.WaitGroup{}
for range [concurrentPutCount + 1]int{} {
Contributor

have never seen such a [0; n) range -- live and learn :)

but may the for i := 0; ...; i++ be more expected by a regular GO reader? also, your way should allocate more memory for nothing

have never seen such a [0; n) range -- live and learn :) but may the `for i := 0; ...; i++` be more expected by a regular GO reader? also, your way should allocate more memory for nothing
carpawell marked this conversation as resolved
@ -176,0 +304,4 @@
wg := &sync.WaitGroup{}
for range [2]int{} {
wg.Add(1)
go testDelete(t, wg, blobStor, bigObj)
Contributor

how about:

go func() {
    wg.Add(1)
    testDelete(...)
    wg.Done()
}()

and do not pass wg to the testing funcs?

how about: ``` go func() { wg.Add(1) testDelete(...) wg.Done() }() ``` and do not pass `wg` to the testing funcs?
Author
Member

Sounds reasonble!
But with one fix:

wg.Add(1)
go func() {
    testDelete(...)
    wg.Done()
}()

Otherwise wg.Wait() can be passed meanwhile goroutine has not been run yet (no wg.Add)

Sounds reasonble! But with one fix: ``` wg.Add(1) go func() { testDelete(...) wg.Done() }() ``` Otherwise `wg.Wait()` can be passed meanwhile goroutine has not been run yet (no `wg.Add`)
Contributor

But with one fix

oh, yeah, for sure. fast browser programming may lead to stupid bugs

> But with one fix oh, yeah, for sure. fast browser programming may lead to stupid bugs
carpawell marked this conversation as resolved
aarifullin force-pushed feature/blobstor_concurrent_tests from ea6ad1e13b to 1dafa8fb07 2023-04-12 08:09:35 +00:00 Compare
Member
  1. What kind of bugs are they supposed to catch?

As far as I understood the issue, here we should check logical correctness and data consistency (in the case of concurrent writes)

The issue only suggests to test the behavior when get/put is done concurrently, but doesn't suggest how to do it exactly or what properties the author (who's the author, btw?) had in mind. That's why I'm asking.

  1. Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky?

Yes, I did. See the test "put the same big object with error"

This is not what I meant. What I meant is, whether you checked whether introducing a relevant bug in the codebase itself causes this verbatim test to catch it, and how often, if at all.

  1. Are they supposed to run as regular tests or specifically under the race detector?

From my point of view they should be run as regular tests because we rather try to check logical error but not incorrect resource sharing, but I don't mind if enable -race option for go test

I don't quite get the "check logical error but not incorrect resource sharing". Can you elaborate? It would seem like "incorrect resource sharing" (i.e. a concurrency bug) is a potential source of "logical error"s, no?

My concern is simply to evaluate the utility of these tests somehow. Using the race detector is a good builtin tool to use random scheduling to try to surface some of the possible bugs, but it's probably not the only way.

> > 1. What kind of bugs are they supposed to catch? > > As far as I understood [the issue](https://git.frostfs.info/TrueCloudLab/frostfs-node/issues/118), here we should check logical correctness and data consistency (in the case of concurrent writes) > The issue only suggests to test the behavior when get/put is done concurrently, but doesn't suggest how to do it exactly or what properties the author (who's the author, btw?) had in mind. That's why I'm asking. > > 2. Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky? > > Yes, I did. See the test `"put the same big object with error"` > This is not what I meant. What I meant is, whether you checked whether introducing a relevant bug in the codebase itself causes this verbatim test to catch it, and how often, if at all. > > 3. Are they supposed to run as regular tests or specifically under the race detector? > > From my point of view they should be run as regular tests because we rather try to check logical error but not incorrect resource sharing, but I don't mind if enable `-race` option for `go test` > I don't quite get the "check logical error but not incorrect resource sharing". Can you elaborate? It would seem like "incorrect resource sharing" (i.e. a concurrency bug) is a potential source of "logical error"s, no? My concern is simply to evaluate the utility of these tests somehow. Using the race detector is a good builtin tool to use random scheduling to try to surface some of the possible bugs, but it's probably not the only way.
ale64bit reviewed 2023-04-13 08:10:35 +00:00
@ -176,0 +215,4 @@
bigObj := testObject(smallSizeLimit * 2)
wg := &sync.WaitGroup{}
for range [concurrentPutCount]int{} {
Member

cute loop. But why? Just because it's slightly shorter to type or is there another reason?

cute loop. But why? Just because it's slightly shorter to type or is there another reason?
Author
Member

It shouldn't be here. I just wanted to try if this expression would be valid and it worked out :)
I'll replace it by for i:=0...

It shouldn't be here. I just wanted to try if this expression would be valid and it worked out :) I'll replace it by `for i:=0...`
Member

It certainly works. It's probably useful for code-golfing :) first time I see it, so TIL.

It certainly works. It's probably useful for code-golfing :) first time I see it, so TIL.
ale64bit marked this conversation as resolved
ale64bit reviewed 2023-04-13 08:11:20 +00:00
@ -176,0 +214,4 @@
t.Run("put the same big object", func(t *testing.T) {
bigObj := testObject(smallSizeLimit * 2)
wg := &sync.WaitGroup{}
Member

the zero WaitGroup is valid:

var wg sync.WaitGroup
the zero `WaitGroup` is valid: ``` var wg sync.WaitGroup ```
Author
Member

Fixed 👍

Fixed 👍
ale64bit marked this conversation as resolved
Owner

@ale64bit we would like to test the behaviour when different operations are being done on the same object: there should be no data corruption and consistent results for both.

@ale64bit we would like to test the behaviour when different operations are being done on the same object: there should be no data corruption and consistent results for both.

@fyrchik can we enable race flag for go test then?

@fyrchik can we enable `race` flag for `go test` then?
dstepanov-yadro approved these changes 2023-04-14 12:10:36 +00:00
aarifullin force-pushed feature/blobstor_concurrent_tests from 1dafa8fb07 to 1478cf631a 2023-04-16 18:32:15 +00:00 Compare
aarifullin requested review from dstepanov-yadro 2023-04-16 18:32:40 +00:00
dstepanov-yadro approved these changes 2023-04-18 15:56:06 +00:00
carpawell approved these changes 2023-04-18 16:04:00 +00:00
fyrchik approved these changes 2023-04-19 10:22:43 +00:00
fyrchik merged commit 13c8afcb02 into master 2023-04-19 10:22:52 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#233
No description provided.