[#19] node: Make policier read shards concurrently #257

aarifullin · 2023-04-16T14:08:57Z

aarifullin commented

2023-04-16 14:08:57 +00:00

Introduce ListWithMultiCursor that simultaneously reads objects
from different shards

Signed-off-by: Airat Arifullin a.arifullin@yadro.com

* Introduce ListWithMultiCursor that simultaneously reads objects from different shards Signed-off-by: Airat Arifullin a.arifullin@yadro.com

aarifullin added the

frostfs-node

label 2023-04-16 14:08:57 +00:00

aarifullin changed title from ~~[#19] node: Make policier read shards concurrently~~ to WIP: [#19] node: Make policier read shards concurrently

2023-04-16 14:09:00 +00:00

aarifullin changed title from ~~WIP: [#19] node: Make policier read shards concurrently~~ to [#19] node: Make policier read shards concurrently

2023-04-16 14:10:33 +00:00

requested reviews from storage-core-committers, storage-core-developers

2023-04-17 06:32:06 +00:00

dstepanov-yadro reviewed 2023-04-17 06:59:11 +00:00

pkg/local_object_storage/engine/list.go Outdated

					
				@ -128,2 +170,4 @@

					}, nil

				}

				func (e *StorageEngine) ListWithMultiCursor(prm ListWithMultiCursorPrm) (ListWithMultiCursorRes, error) {

dstepanov-yadro commented

2023-04-17 06:57:10 +00:00

Does funlen linter marks this function as function with too many lines of code?
I propose to divide this function: getShardIDs, createMultiCursor, listWithMultiCursor and so on.

Does `funlen` linter marks this function as function with too many lines of code? I propose to divide this function: `getShardIDs`, `createMultiCursor`, `listWithMultiCursor` and so on.

aarifullin commented

2023-04-17 07:06:29 +00:00

Ok, fare point

aarifullin commented

2023-04-17 10:42:51 +00:00

Added getShardIDs and createMultiCursor but left listWithMultiCursor because the function became uglier with that :) But funlen is fine with this function anyway

Added `getShardIDs` and `createMultiCursor` but left `listWithMultiCursor` because the function became uglier with that :) But `funlen` is fine with this function anyway

dstepanov-yadro marked this conversation as resolved

pkg/local_object_storage/engine/list.go Outdated

					
				@ -130,0 +204,4 @@

						}

						for i := range multiCursor.cursors {

							if multiCursor.cursors[i] == nil {

dstepanov-yadro commented

2023-04-17 06:52:37 +00:00

Can it be not nil?

aarifullin commented

2023-04-17 07:05:49 +00:00

Yes, it can. You can check ListWithCursor implementation that has the same logic.
Also, you can check shardPolicyWorker to understand the principle of cursor usage

Yes, it can. You can check `ListWithCursor` implementation that has the same logic. Also, you can check `shardPolicyWorker` to understand the principle of cursor usage

dstepanov-yadro commented

2023-04-17 07:10:56 +00:00

How it can be not nil?

multiCursor := prm.multiCursor
	if multiCursor == nil {
    	multiCursor = &MultiCursor{
			cursors: make([]*Cursor, cursorCount),
		}
        for i := range multiCursor.cursors {
			if multiCursor.cursors[i] == nil {
            ...
            }
        }
     }

How it can be not nil? ``` multiCursor := prm.multiCursor if multiCursor == nil { multiCursor = &MultiCursor{ cursors: make([]*Cursor, cursorCount), } for i := range multiCursor.cursors { if multiCursor.cursors[i] == nil { ... } } } ```

aarifullin commented

2023-04-17 07:29:55 +00:00

Sorry. Now I've got it. I'll fix it, thanks!

aarifullin commented

2023-04-17 10:43:00 +00:00

Fixed

dstepanov-yadro marked this conversation as resolved

fyrchik approved these changes 2023-04-17 07:27:52 +00:00

pkg/local_object_storage/engine/list.go Outdated

					
				@ -127,3 +169,121 @@ func (e *StorageEngine) ListWithCursor(prm ListWithCursorPrm) (ListWithCursorRes

						cursor:   cursor,

					}, nil

				}

fyrchik commented

2023-04-17 07:24:16 +00:00

Why did you decide to implement a separate engine method instead of a simple wrapper in the policer package?

aarifullin commented

2023-04-17 07:35:41 +00:00

Briefly: no way

I tried many ways to manage reading shards from the policer but nothing succeeded. If ListWithCursor is used, then shards are re-read few times. We don't need it

Briefly: no way I tried many ways to manage reading shards from the policer but nothing succeeded. If `ListWithCursor` is used, then shards are re-read few times. We don't need it

fyrchik commented

2023-04-17 07:39:55 +00:00

Ok, could you describe the behaviour in the doc-comment?
I am specifically interested in (1) what do we do if the shard is removed/added and in (2) error handling (one of the shards went degraded -> we get error on list).

Ok, could you describe the behaviour in the doc-comment? I am specifically interested in (1) what do we do if the shard is removed/added and in (2) error handling (one of the shards went degraded -> we get error on list).

aarifullin commented

2023-04-17 13:04:14 +00:00

Please, check new comments for the method. If it lacks something, I'll add

fyrchik marked this conversation as resolved

pkg/local_object_storage/engine/list.go Outdated

					
				@ -130,0 +189,4 @@

					// Set initial cursors

					multiCursor := prm.multiCursor

					if multiCursor == nil {

						var cursorCount uint32 = defaultCursorsNum

fyrchik commented

2023-04-17 07:27:42 +00:00

What is the meaning of this param? If I say count=5, will it iterate over 5 random shards?

What is the meaning of this param? If I say `count=5`, will it iterate over 5 random shards?

aarifullin commented

2023-04-17 07:40:06 +00:00

If I say count=5, will it iterate over 5 random shards?

Yes, if len(shards) == 5. But if count == 5 and len(shards) == 4, then count gets 4. One cursor matches to one shard

As you can see count can be adjusted:

by shard number
by batch size (we cannot split 1 for few cursors - non-sense)

If there is one shard or batch size is 1, then multicursor behave itself the same like a Cursor

> If I say count=5, will it iterate over 5 random shards? Yes, if `len(shards) == 5`. But if `count == 5` and `len(shards) == 4`, then `count` gets `4`. One cursor matches to one shard As you can see `count` can be adjusted: - by shard number - by batch size (we cannot split `1` for few cursors - non-sense) If there is one shard or batch size is 1, then multicursor behave itself the same like a `Cursor`

fyrchik commented

2023-04-17 07:51:10 +00:00

If there is one shard or batch size is 1, then multicursor behave itself the same like a Cursor

Why did you choose to add a method instead of extending a ListWithCursor then?

> If there is one shard or batch size is 1, then multicursor behave itself the same like a Cursor Why did you choose to add a method instead of extending a `ListWithCursor` then?

aarifullin commented

2023-04-17 08:04:36 +00:00

Why did you choose to add a method instead of extending a ListWithCursor then?

ListWithCursor tells the client that shards won't be read concurrently wherever ListWithMultiCursor tells they will.
But anyway your point sounds reasonable. I merge these implementations

> Why did you choose to add a method instead of extending a ListWithCursor then? `ListWithCursor` tells the client that shards won't be read concurrently wherever `ListWithMultiCursor` tells they will. But anyway your point sounds reasonable. I merge these implementations

fyrchik marked this conversation as resolved

pkg/services/policer/queue.go Outdated

					
				@ -24,2 +24,4 @@

					return res.AddressList(), res.Cursor(), nil

				}

				func (q *jobQueue) SelectWithMultiCursor(multiCursor *engine.MultiCursor, count uint32) ([]objectcore.AddressWithType, *engine.MultiCursor, error) {

fyrchik commented

2023-04-17 07:24:54 +00:00

Is Select still in use? If no, why not just replace it?

Is `Select` still in use? If no, why not just replace it?

👍 1

fyrchik marked this conversation as resolved

requested review from fyrchik

2023-04-17 07:27:57 +00:00

aarifullin force-pushed feature/19-list_mul_cursor from d99b94340b to 4ae6ada5f3

2023-04-17 10:30:55 +00:00

Compare

aarifullin force-pushed feature/19-list_mul_cursor from 4ae6ada5f3 to e8f74261ac

2023-04-17 10:44:56 +00:00

Compare

aarifullin force-pushed feature/19-list_mul_cursor from e8f74261ac to b4014aec7b

2023-04-17 11:12:07 +00:00

Compare

dstepanov-yadro approved these changes 2023-04-17 11:22:59 +00:00

fyrchik reviewed 2023-04-18 08:04:47 +00:00

pkg/local_object_storage/engine/list.go Outdated

					
				@ -59,1 +70,4 @@

				//

				// If count parameter (i.e. batch size) and shard number are big enough,

				// then few (for defaultCursorsNum) shard cursors read objects concurrently.

				// In this case sorted shard ids are divided into groups by shardIndex % cursorIndex

fyrchik commented

2023-04-18 08:04:37 +00:00

I think modulo is not useful here, because HRW sort is not something we can usually predict and use.

I think `modulo` is not useful here, because HRW sort is not something we can usually predict and use.

aarifullin commented

2023-04-18 10:17:36 +00:00

I don't agree it's not useful. Imagine this case

[shard #1][shard #2][shard #3][shard #4][shard #5]
     ^         ^        ^         ^          
     |         |        |         |    
    cu1       cu2      cu3       cu4

The second cursor finished shard processing and got the free one

[shard #1][shard #2][shard #3][shard #4][shard #5]
     ^                  ^         ^         ^   
     |                  |         |         |
    cu1                cu3       cu4       cu2

The first cursor takes the already processed shard

[shard #1][shard #2][shard #3][shard #4][shard #5]
              ^         ^         ^         ^   
              |         |         |         |
             cu1       cu3       cu4       cu2

If we say to the first cursor to take shards 1, 5, 9... then it's fine

If you don't see the point in this way, I can to try use visited map[string]bool but it requires to change the old way to list. I'll try

I don't agree it's not useful. Imagine this case ``` [shard #1][shard #2][shard #3][shard #4][shard #5] ^ ^ ^ ^ | | | | cu1 cu2 cu3 cu4 ``` The second cursor finished shard processing and got the free one ``` [shard #1][shard #2][shard #3][shard #4][shard #5] ^ ^ ^ ^ | | | | cu1 cu3 cu4 cu2 ``` The first cursor takes the already processed shard ``` [shard #1][shard #2][shard #3][shard #4][shard #5] ^ ^ ^ ^ | | | | cu1 cu3 cu4 cu2 ``` If we say to the first cursor to take shards `1, 5, 9...` then it's fine If you don't see the point in this way, I can to try use `visited map[string]bool` but it requires to change the old way to list. I'll try

fyrchik commented

2023-04-18 10:40:20 +00:00

I mean that it is not necessarily distribute the work in the best possible way, unless all shards have the same amount of objects.

If we precalculate a list of shards id during creation, the logic will be simple: if the iteration is done, we just pick the next available ID and increase it.

I mean that it is not necessarily distribute the work in the best possible way, unless all shards have the same amount of objects. If we precalculate a list of shards id during creation, the logic will be simple: if the iteration is done, we just pick the next available ID and increase it.

fyrchik commented

2023-04-18 10:41:58 +00:00

Speaking from the user POV: the order is not a part of the function contract, it should just be able to iterate over all objects in the engine. How this is done (modulo vs not modulo) is not important. My comment was more about the necessity to include this in the documentation.

Speaking from the user POV: the order is not a part of the function contract, it should just be able to iterate over all objects in the engine. _How_ this is done (modulo vs not modulo) is not important. My comment was more about the necessity to include this in the documentation.

aarifullin commented

2023-04-18 12:24:15 +00:00

I also took in account this comment and make shardCursorWrapper iterate over its shardIDs that are initialized only one time (at the cursor's initialization) and fix comments

Shards are no longer sorted at each invocation
ListWithCursor no longer manages which shards a cursor iterates for

I also took in account [this comment](https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/257#issuecomment-6574) and make `shardCursorWrapper` iterate over its `shardIDs` that are initialized only one time (at the cursor's initialization) and fix comments 1. Shards are no longer sorted at each invocation 2. `ListWithCursor` no longer manages which shards a cursor iterates for

pkg/local_object_storage/engine/list.go Outdated

					
				@ -70,3 +85,1 @@

					}

					e.mtx.RUnlock()

					shardIDs := getSortedShardIDs(e)

fyrchik commented

2023-04-18 08:01:01 +00:00

Doing this each time we call the function seems inefficient. Can we do it only when we really need a new shard?

fyrchik commented

2023-04-18 08:03:08 +00:00

What about creating all cursors at the start of this operation (first call)?
Then we just take the next shard in order.

What about creating _all_ cursors at the start of this operation (first call)? Then we just take the next shard in order.

aarifullin commented

2023-04-18 10:05:05 +00:00

I have looked for a compromise between old implementation and new one. That's why I have kept this logic with shard sorting

I don't mind to keep shardIDs at the creation of the cursor and just keep them in its state like an iterator

I have looked for a compromise between old implementation and new one. That's why I have kept this logic with shard sorting I don't mind to keep shardIDs at the creation of the cursor and just keep them in its state like an iterator

fyrchik commented

2023-04-18 10:09:41 +00:00

I think this way we can achieve more predictable behaviour:

We fix shard IDs after the new call.
New shards won't be processed.

I think this way we can achieve more predictable behaviour: 1. We fix shard IDs after the new call. 2. New shards won't be processed.

👍 1

fyrchik marked this conversation as resolved

pkg/local_object_storage/engine/list.go Outdated

					
				@ -112,3 +96,1 @@

						if err != nil {

							continue

						}

					batchPerCursor := int(prm.count) / len(cursor.shardCursorWrappers)

fyrchik commented

2023-04-18 08:01:39 +00:00

What if it is == 0?

aarifullin commented

2023-04-18 10:00:23 +00:00

Nothing wrong, but let's return ListWithCursorRes{}, ErrEndOfListing.

Also, len(cursor.shardCursorWrappers) <= int(prm.count). See above how it's adjusted

Nothing wrong, but let's return `ListWithCursorRes{}, ErrEndOfListing`. Also, `len(cursor.shardCursorWrappers) <= int(prm.count)`. See above how it's adjusted

fyrchik marked this conversation as resolved

requested reviews from storage-core-developers, storage-core-committers

2023-04-18 09:53:54 +00:00

aarifullin force-pushed feature/19-list_mul_cursor from b4014aec7b to 1b355459a0

2023-04-18 12:19:03 +00:00

Compare

requested review from dstepanov-yadro

2023-04-18 12:32:14 +00:00

dstepanov-yadro approved these changes 2023-04-18 15:51:20 +00:00

carpawell reviewed 2023-04-18 16:54:56 +00:00

pkg/local_object_storage/engine/list.go Outdated

					
				@ -52,72 +78,91 @@ func (l ListWithCursorRes) Cursor() *Cursor {

					return l.cursor

				}

				const (

carpawell commented

2023-04-18 16:16:00 +00:00

redundant parentheses? also some comment is expected, otherwise, const is totally useless

redundant parentheses? also some comment is expected, otherwise, `const` is totally useless

👍 1

carpawell marked this conversation as resolved

pkg/local_object_storage/engine/list.go Outdated

					
				@ -60,0 +93,4 @@

				//

				// Adding a shard between ListWithCursor does not invalidate the cursor but new shard

				// won't be listed.

				// Removing a shard between ListWithCursor leads to the undefined behavior

carpawell commented

2023-04-18 16:19:46 +00:00

is that planned to be solved?

(also, dot at the end of the sentence)

is that planned to be solved? (also, dot at the end of the sentence)

aarifullin commented

2023-04-18 17:42:04 +00:00

is that planned to be solved?

I don't think so. Anyway, it's rather warning for user.

From my POV, it's fine when an iterator gets invalidated because data has been changed

> is that planned to be solved? I don't think so. Anyway, it's rather warning for user. From my POV, it's fine when an iterator gets invalidated because data has been changed

carpawell marked this conversation as resolved

pkg/local_object_storage/engine/list.go Outdated

					
				@ -116,3 +119,1 @@

						result = append(result, res.AddressList()...)

						cursor.shardCursor = res.Cursor()

						cursor.shardID = shardIDs[i]

					wg := &sync.WaitGroup{}

carpawell commented

2023-04-18 16:41:39 +00:00

usually we do just var wg sync.WaitGroup

usually we do just `var wg sync.WaitGroup`

aarifullin commented

2023-04-18 17:54:06 +00:00

I should get used to it :) Fixed!

carpawell marked this conversation as resolved

pkg/local_object_storage/engine/list.go Outdated

					
				@ -130,0 +180,4 @@

						shardIDs = append(shardIDs, id)

					}

					e.mtx.RUnlock()

					sort.Slice(shardIDs, func(i, j int) bool {

carpawell commented

2023-04-18 16:26:30 +00:00

does sort.Strings do the work?

does `sort.Strings` do the work?

👍 1

carpawell marked this conversation as resolved

pkg/local_object_storage/engine/list.go Outdated

					
				@ -130,0 +187,4 @@

				}

				func newCursor(count uint32, shardIDs []string) *Cursor {

					var cursorCount uint32 = defaultCursorsNum

carpawell commented

2023-04-18 16:33:04 +00:00

is that var required?

aarifullin commented

2023-04-18 17:45:15 +00:00

It's alternative for casting cursorCount := uint32(defaultCursorsNum). I prefer to not use casting becasue for me this is not declarative way, but I can fix this if it's needed

It's alternative for casting `cursorCount := uint32(defaultCursorsNum)`. I prefer to not use casting becasue for me this is not declarative way, but I can fix this if it's needed

carpawell marked this conversation as resolved

pkg/local_object_storage/engine/list.go Outdated

					
				@ -130,0 +194,4 @@

					cursor := &Cursor{

						shardCursorWrappers: make([]*shardCursorWrapper, cursorCount),

					}

					if len(shardIDs) < len(cursor.shardCursorWrappers) {

carpawell commented

2023-04-18 16:35:47 +00:00

can we calculate slice size before make and do not allocate more that we need?

can we calculate slice size before `make` and do not allocate more that we need?

👍 1

carpawell marked this conversation as resolved

aarifullin force-pushed feature/19-list_mul_cursor from 1b355459a0 to d4f33faf12

2023-04-18 17:53:21 +00:00

Compare

fyrchik reviewed 2023-04-19 10:21:42 +00:00

pkg/local_object_storage/engine/list.go Outdated

					
				@ -21,0 +33,4 @@

					return

				}

				func (s *shardCursorWrapper) setShardCursor(shardCursor *shard.Cursor) {

fyrchik commented

2023-04-19 10:18:02 +00:00

From your POV, do this methods really improve readability?

aarifullin commented

2023-04-24 11:14:39 +00:00

Outdated. Removed this method

fyrchik marked this conversation as resolved

pkg/local_object_storage/engine/list.go Outdated

					
				@ -21,0 +44,4 @@

				// that point to one or few different shards. This allows to read objects from different

				// shards concurrently.

				type Cursor struct {

					shardCursorWrappers []*shardCursorWrapper

fyrchik commented

2023-04-19 10:18:48 +00:00

So we have a list of shardIDs in each element of this slice. Can we share both them and curr?

So we have a list of `shardIDs` in each element of this slice. Can we share both them and `curr`?

aarifullin commented

2023-04-24 11:14:50 +00:00

Outdated. Removed this method

pkg/local_object_storage/engine/list.go Outdated

					
				@ -128,2 +172,4 @@

					}, nil

				}

				func getSortedShardIDs(e *StorageEngine) []string {

fyrchik commented

2023-04-19 10:20:13 +00:00

We already have similar methods like sortedShards, it is possible/easy to reuse/rewrite them somehow?

We already have similar methods like `sortedShards`, it is possible/easy to reuse/rewrite them somehow?

aarifullin commented

2023-04-24 11:14:17 +00:00

Sorry, but I have not found anything with name sortedShards

Sorry, but I have not found anything with name `sortedShards`

pkg/local_object_storage/engine/list_test.go Outdated

					
				@ -79,0 +65,4 @@

							wantCursorsNum: 1,

						},

						{

							name:           "many shards, many objects, small batch size",

fyrchik commented

2023-04-19 10:21:35 +00:00

What is many in this context? I mean what is the difference between 3 and 6?

What is `many` in this context? I mean what is the difference between 3 and 6?

aarifullin commented

2023-04-24 11:29:53 +00:00

Removed extra cases

fyrchik marked this conversation as resolved

aarifullin force-pushed feature/19-list_mul_cursor from d4f33faf12 to fc48f8d152

2023-04-24 10:26:09 +00:00

Compare

aarifullin force-pushed feature/19-list_mul_cursor from fc48f8d152 to 6effd06d51

2023-04-24 10:27:07 +00:00

Compare

aarifullin force-pushed feature/19-list_mul_cursor from 6effd06d51 to dad5f3d092

2023-04-24 10:28:11 +00:00

Compare

aarifullin force-pushed feature/19-list_mul_cursor from dad5f3d092 to 62c8af9f9c

2023-04-24 11:27:03 +00:00

Compare

requested review from dstepanov-yadro

2023-04-24 11:29:38 +00:00

aarifullin force-pushed feature/19-list_mul_cursor from 62c8af9f9c to c75a340309

2023-04-24 15:45:13 +00:00

Compare

fyrchik approved these changes 2023-04-25 10:27:16 +00:00

dstepanov-yadro approved these changes 2023-04-25 15:49:21 +00:00

carpawell approved these changes 2023-04-27 07:32:15 +00:00

aarifullin force-pushed feature/19-list_mul_cursor from c75a340309 to ada081dfd5

2023-04-27 08:12:13 +00:00

Compare

fyrchik merged commit ada081dfd5 into master

2023-04-27 08:16:43 +00:00

fyrchik referenced this pull request

2023-05-18 08:55:34 +00:00

Take objects from the different shard by Policer #19

Sign in to join this conversation.

No reviewers

No labels

No milestone

No project

No assignees

4 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#257

Rows
Columns