[#118] node: add unit concurrent tests for blobstor #233

aarifullin · 2023-04-10T18:47:56Z

aarifullin commented

2023-04-10 18:47:56 +00:00

Signed-off-by: Airat Arifullin a.arifullin@yadro.com

aarifullin reviewed 2023-04-10 18:50:07 +00:00

pkg/local_object_storage/blobstor/blobstor_test.go Outdated

					
				@ -176,0 +236,4 @@

					})

				}

				func TestConcurrentDelete(t *testing.T) {

aarifullin commented

2023-04-10 18:50:07 +00:00

No more good useful scenario came to my mind apart from the two ones. Would be glad to listen to your ideas

aarifullin reviewed 2023-04-10 18:53:20 +00:00

pkg/local_object_storage/blobstor/blobstor_test.go Outdated

					
				@ -176,0 +182,4 @@

						smallSizeLimit = 512

						// concurrentPutCount is fstree implementation specific

						concurrentPutCount = 5

aarifullin commented

2023-04-10 18:53:20 +00:00

@fyrchik
I've got this from this constant.
It looks like the constant sets the restriction for the concurent putting. Should we get this documented somehow?

@fyrchik I've got this from this [constant](https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/pkg/local_object_storage/blobstor/fstree/fstree.go#L262). It looks like the constant sets the restriction for the concurent putting. Should we get this documented somehow?

requested reviews from storage-core-committers, storage-core-developers

2023-04-10 18:53:32 +00:00

ale64bit commented

2023-04-11 08:13:13 +00:00

I'm a bit skeptical about the value of these tests. A few questions:

What kind of bugs are they supposed to catch?
Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky?
Are they supposed to run as regular tests or specifically under the race detector?

I guess they will still be useful to catch obvious/basic concurrency issues, just wondering whether we can or should do better.

I'm a bit skeptical about the value of these tests. A few questions: 1. What kind of bugs are they supposed to catch? 2. Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky? 3. Are they supposed to run as regular tests or specifically under the race detector? I guess they will still be useful to catch obvious/basic concurrency issues, just wondering whether we can or should do better.

dstepanov-yadro reviewed 2023-04-11 08:47:32 +00:00

pkg/local_object_storage/blobstor/blobstor_test.go Outdated

					
				@ -176,0 +264,4 @@

						prm.Address = object.AddressOf(obj)

						if _, err := b.Delete(prm); err != nil {

							require.ErrorContains(t, err, "object not found")

						}

dstepanov-yadro commented

2023-04-11 08:47:28 +00:00

Maybe add check that object is not exists?

aarifullin commented

2023-04-11 11:08:37 +00:00

There are many ways to check if an object exists. But for me it's more logically correct to check the error from deletion because I check concurrent deletes

dstepanov-yadro commented

2023-04-11 11:53:05 +00:00

to check the error from deletion because I check concurrent deletes

Yes, it's true. I'm talking about the fact that after deletion there should be no object exist. Maybe delete does not return an error, but the object is not deleted.

> to check the error from deletion because I check concurrent deletes Yes, it's true. I'm talking about the fact that after deletion there should be no object exist. Maybe `delete` does not return an error, but the object is not deleted.

aarifullin commented

2023-04-11 16:02:30 +00:00

Maybe delete does not return an error, but the object is not deleted

You're right. That must be checked. I've added check about object's existence

> Maybe delete does not return an error, but the object is not deleted You're right. That must be checked. I've added check about object's existence

dstepanov-yadro marked this conversation as resolved

dstepanov-yadro commented

2023-04-11 08:48:40 +00:00

I think at least we should add -race flag for go test.

I think at least we should add `-race` flag for `go test`.

👍 1

aarifullin force-pushed feature/blobstor_concurrent_tests from a3a54534af to 91213fede3

2023-04-11 15:53:41 +00:00

Compare

aarifullin commented

2023-04-11 15:56:26 +00:00

What kind of bugs are they supposed to catch?

As far as I understood the issue, here we should check logical correctness and data consistency (in the case of concurrent writes)

Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky?

Yes, I did. See the test "put the same big object with error"

Are they supposed to run as regular tests or specifically under the race detector?

From my point of view they should be run as regular tests because we rather try to check logical error but not incorrect resource sharing, but I don't mind if enable -race option for go test

> 1. What kind of bugs are they supposed to catch? As far as I understood [the issue](https://git.frostfs.info/TrueCloudLab/frostfs-node/issues/118), here we should check logical correctness and data consistency (in the case of concurrent writes) > 2. Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky? Yes, I did. See the test `"put the same big object with error"` > 3. Are they supposed to run as regular tests or specifically under the race detector? From my point of view they should be run as regular tests because we rather try to check logical error but not incorrect resource sharing, but I don't mind if enable `-race` option for `go test`

aarifullin force-pushed feature/blobstor_concurrent_tests from 91213fede3 to ea6ad1e13b

2023-04-11 16:01:34 +00:00

Compare

carpawell reviewed 2023-04-11 19:24:16 +00:00

pkg/local_object_storage/blobstor/blobstor_test.go Outdated

					
				@ -176,0 +182,4 @@

						smallSizeLimit = 512

						// concurrentPutCount is fstree implementation specific

						concurrentPutCount = 5

carpawell commented

2023-04-11 19:19:10 +00:00

kinda magic number to me: the const from the FSTree may vary independently

@fyrchik, what problem cases are expected to be tested by that PR?

kinda magic number to me: the const from the `FSTree` may vary independently @fyrchik, what problem cases are expected to be tested by that PR?

aarifullin commented

2023-04-12 07:55:45 +00:00

kinda magic number to me: the const from the FSTree may vary independently

Agree. That constant in Tree affects the concurrent write. If concurrent writers are > concurrentPutCount, then errors occur

> kinda magic number to me: the const from the FSTree may vary independently Agree. That constant in Tree affects the concurrent write. If concurrent writers are `> concurrentPutCount`, then errors occur

pkg/local_object_storage/blobstor/blobstor_test.go Outdated

					
				@ -176,0 +185,4 @@

						concurrentPutCount = 5

					)

					newBlobStor := func(t *testing.T, compress bool) *BlobStor {

carpawell commented

2023-04-11 18:59:52 +00:00

not used var here and below?

👍 1

carpawell marked this conversation as resolved

pkg/local_object_storage/blobstor/blobstor_test.go Outdated

					
				@ -176,0 +202,4 @@

					}

					testPut := func(t *testing.T, wg *sync.WaitGroup, b *BlobStor, obj *objectSDK.Object) {

						defer wg.Done()

carpawell commented

2023-04-11 19:00:56 +00:00

why defer here? looks useless without multiple return

same in testPutFileExistsError

why `defer` here? looks useless without multiple `return` same in `testPutFileExistsError`

aarifullin commented

2023-04-12 08:12:02 +00:00

wg is no longer used in the such methods

carpawell marked this conversation as resolved

pkg/local_object_storage/blobstor/blobstor_test.go Outdated

					
				@ -176,0 +235,4 @@

						bigObj := testObject(smallSizeLimit * 2)

						wg := &sync.WaitGroup{}

						for range [concurrentPutCount + 1]int{} {

carpawell commented

2023-04-11 19:14:54 +00:00

have never seen such a [0; n) range -- live and learn :)

but may the for i := 0; ...; i++ be more expected by a regular GO reader? also, your way should allocate more memory for nothing

have never seen such a [0; n) range -- live and learn :) but may the `for i := 0; ...; i++` be more expected by a regular GO reader? also, your way should allocate more memory for nothing

👍 1

carpawell marked this conversation as resolved

pkg/local_object_storage/blobstor/blobstor_test.go Outdated

					
				@ -176,0 +304,4 @@

						wg := &sync.WaitGroup{}

						for range [2]int{} {

							wg.Add(1)

							go testDelete(t, wg, blobStor, bigObj)

carpawell commented

2023-04-11 19:24:12 +00:00

how about:

go func() {
    wg.Add(1)
    testDelete(...)
    wg.Done()
}()

and do not pass wg to the testing funcs?

how about: ``` go func() { wg.Add(1) testDelete(...) wg.Done() }() ``` and do not pass `wg` to the testing funcs?

aarifullin commented

2023-04-12 08:06:11 +00:00

Sounds reasonble!
But with one fix:

wg.Add(1)
go func() {
    testDelete(...)
    wg.Done()
}()

Otherwise wg.Wait() can be passed meanwhile goroutine has not been run yet (no wg.Add)

Sounds reasonble! But with one fix: ``` wg.Add(1) go func() { testDelete(...) wg.Done() }() ``` Otherwise `wg.Wait()` can be passed meanwhile goroutine has not been run yet (no `wg.Add`)

carpawell commented

2023-04-18 15:58:08 +00:00

But with one fix

oh, yeah, for sure. fast browser programming may lead to stupid bugs

> But with one fix oh, yeah, for sure. fast browser programming may lead to stupid bugs

carpawell marked this conversation as resolved

aarifullin force-pushed feature/blobstor_concurrent_tests from ea6ad1e13b to 1dafa8fb07

2023-04-12 08:09:35 +00:00

Compare

ale64bit commented

2023-04-13 08:09:14 +00:00

What kind of bugs are they supposed to catch?

As far as I understood the issue, here we should check logical correctness and data consistency (in the case of concurrent writes)

The issue only suggests to test the behavior when get/put is done concurrently, but doesn't suggest how to do it exactly or what properties the author (who's the author, btw?) had in mind. That's why I'm asking.

Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky?

Yes, I did. See the test "put the same big object with error"

This is not what I meant. What I meant is, whether you checked whether introducing a relevant bug in the codebase itself causes this verbatim test to catch it, and how often, if at all.

Are they supposed to run as regular tests or specifically under the race detector?

From my point of view they should be run as regular tests because we rather try to check logical error but not incorrect resource sharing, but I don't mind if enable -race option for go test

I don't quite get the "check logical error but not incorrect resource sharing". Can you elaborate? It would seem like "incorrect resource sharing" (i.e. a concurrency bug) is a potential source of "logical error"s, no?

My concern is simply to evaluate the utility of these tests somehow. Using the race detector is a good builtin tool to use random scheduling to try to surface some of the possible bugs, but it's probably not the only way.

> > 1. What kind of bugs are they supposed to catch? > > As far as I understood [the issue](https://git.frostfs.info/TrueCloudLab/frostfs-node/issues/118), here we should check logical correctness and data consistency (in the case of concurrent writes) > The issue only suggests to test the behavior when get/put is done concurrently, but doesn't suggest how to do it exactly or what properties the author (who's the author, btw?) had in mind. That's why I'm asking. > > 2. Did you try introducing such a bug intentionally to verify that they indeed catch it? And how likely it is to catch it vs appearing just flaky? > > Yes, I did. See the test `"put the same big object with error"` > This is not what I meant. What I meant is, whether you checked whether introducing a relevant bug in the codebase itself causes this verbatim test to catch it, and how often, if at all. > > 3. Are they supposed to run as regular tests or specifically under the race detector? > > From my point of view they should be run as regular tests because we rather try to check logical error but not incorrect resource sharing, but I don't mind if enable `-race` option for `go test` > I don't quite get the "check logical error but not incorrect resource sharing". Can you elaborate? It would seem like "incorrect resource sharing" (i.e. a concurrency bug) is a potential source of "logical error"s, no? My concern is simply to evaluate the utility of these tests somehow. Using the race detector is a good builtin tool to use random scheduling to try to surface some of the possible bugs, but it's probably not the only way.

ale64bit reviewed 2023-04-13 08:10:35 +00:00

pkg/local_object_storage/blobstor/blobstor_test.go Outdated

					
				@ -176,0 +215,4 @@

						bigObj := testObject(smallSizeLimit * 2)

						wg := &sync.WaitGroup{}

						for range [concurrentPutCount]int{} {

ale64bit commented

2023-04-13 08:10:35 +00:00

cute loop. But why? Just because it's slightly shorter to type or is there another reason?

aarifullin commented

2023-04-13 12:25:38 +00:00

It shouldn't be here. I just wanted to try if this expression would be valid and it worked out :)
I'll replace it by for i:=0...

It shouldn't be here. I just wanted to try if this expression would be valid and it worked out :) I'll replace it by `for i:=0...`

ale64bit commented

2023-04-13 12:36:53 +00:00

It certainly works. It's probably useful for code-golfing :) first time I see it, so TIL.

ale64bit marked this conversation as resolved

ale64bit reviewed 2023-04-13 08:11:20 +00:00

pkg/local_object_storage/blobstor/blobstor_test.go Outdated

					
				@ -176,0 +214,4 @@

					t.Run("put the same big object", func(t *testing.T) {

						bigObj := testObject(smallSizeLimit * 2)

						wg := &sync.WaitGroup{}

ale64bit commented

2023-04-13 08:11:20 +00:00

the zero WaitGroup is valid:

var wg sync.WaitGroup

the zero `WaitGroup` is valid: ``` var wg sync.WaitGroup ```

👍 2

aarifullin commented

2023-04-16 18:32:36 +00:00

Fixed 👍

ale64bit marked this conversation as resolved

fyrchik commented

2023-04-14 06:11:51 +00:00

@ale64bit we would like to test the behaviour when different operations are being done on the same object: there should be no data corruption and consistent results for both.