[#19] node: Make policier read shards concurrently #257
No reviewers
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#257
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "aarifullin/frostfs-node:feature/19-list_mul_cursor"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
from different shards
Signed-off-by: Airat Arifullin a.arifullin@yadro.com
[#19] node: Make policier read shards concurrentlyto WIP: [#19] node: Make policier read shards concurrentlyWIP: [#19] node: Make policier read shards concurrentlyto [#19] node: Make policier read shards concurrently@ -128,2 +170,4 @@
}, nil
}
func (e *StorageEngine) ListWithMultiCursor(prm ListWithMultiCursorPrm) (ListWithMultiCursorRes, error) {
Does
funlen
linter marks this function as function with too many lines of code?I propose to divide this function:
getShardIDs
,createMultiCursor
,listWithMultiCursor
and so on.Ok, fare point
Added
getShardIDs
andcreateMultiCursor
but leftlistWithMultiCursor
because the function became uglier with that :) Butfunlen
is fine with this function anyway@ -130,0 +204,4 @@
}
for i := range multiCursor.cursors {
if multiCursor.cursors[i] == nil {
Can it be not nil?
Yes, it can. You can check
ListWithCursor
implementation that has the same logic.Also, you can check
shardPolicyWorker
to understand the principle of cursor usageHow it can be not nil?
Sorry. Now I've got it. I'll fix it, thanks!
Fixed
@ -127,3 +169,121 @@ func (e *StorageEngine) ListWithCursor(prm ListWithCursorPrm) (ListWithCursorRes
cursor: cursor,
}, nil
}
Why did you decide to implement a separate engine method instead of a simple wrapper in the policer package?
Briefly: no way
I tried many ways to manage reading shards from the policer but nothing succeeded. If
ListWithCursor
is used, then shards are re-read few times. We don't need itOk, could you describe the behaviour in the doc-comment?
I am specifically interested in (1) what do we do if the shard is removed/added and in (2) error handling (one of the shards went degraded -> we get error on list).
Please, check new comments for the method. If it lacks something, I'll add
@ -130,0 +189,4 @@
// Set initial cursors
multiCursor := prm.multiCursor
if multiCursor == nil {
var cursorCount uint32 = defaultCursorsNum
What is the meaning of this param? If I say
count=5
, will it iterate over 5 random shards?Yes, if
len(shards) == 5
. But ifcount == 5
andlen(shards) == 4
, thencount
gets4
. One cursor matches to one shardAs you can see
count
can be adjusted:1
for few cursors - non-sense)If there is one shard or batch size is 1, then multicursor behave itself the same like a
Cursor
Why did you choose to add a method instead of extending a
ListWithCursor
then?ListWithCursor
tells the client that shards won't be read concurrently whereverListWithMultiCursor
tells they will.But anyway your point sounds reasonable. I merge these implementations
@ -24,2 +24,4 @@
return res.AddressList(), res.Cursor(), nil
}
func (q *jobQueue) SelectWithMultiCursor(multiCursor *engine.MultiCursor, count uint32) ([]objectcore.AddressWithType, *engine.MultiCursor, error) {
Is
Select
still in use? If no, why not just replace it?d99b94340b
to4ae6ada5f3
4ae6ada5f3
toe8f74261ac
e8f74261ac
tob4014aec7b
@ -59,1 +70,4 @@
//
// If count parameter (i.e. batch size) and shard number are big enough,
// then few (for defaultCursorsNum) shard cursors read objects concurrently.
// In this case sorted shard ids are divided into groups by shardIndex % cursorIndex
I think
modulo
is not useful here, because HRW sort is not something we can usually predict and use.I don't agree it's not useful. Imagine this case
The second cursor finished shard processing and got the free one
The first cursor takes the already processed shard
If we say to the first cursor to take shards
1, 5, 9...
then it's fineIf you don't see the point in this way, I can to try use
visited map[string]bool
but it requires to change the old way to list. I'll tryI mean that it is not necessarily distribute the work in the best possible way, unless all shards have the same amount of objects.
If we precalculate a list of shards id during creation, the logic will be simple: if the iteration is done, we just pick the next available ID and increase it.
Speaking from the user POV: the order is not a part of the function contract, it should just be able to iterate over all objects in the engine. How this is done (modulo vs not modulo) is not important. My comment was more about the necessity to include this in the documentation.
I also took in account this comment and make
shardCursorWrapper
iterate over itsshardIDs
that are initialized only one time (at the cursor's initialization) and fix commentsListWithCursor
no longer manages which shards a cursor iterates for@ -70,3 +85,1 @@
}
e.mtx.RUnlock()
shardIDs := getSortedShardIDs(e)
Doing this each time we call the function seems inefficient. Can we do it only when we really need a new shard?
What about creating all cursors at the start of this operation (first call)?
Then we just take the next shard in order.
I have looked for a compromise between old implementation and new one. That's why I have kept this logic with shard sorting
I don't mind to keep shardIDs at the creation of the cursor and just keep them in its state like an iterator
I think this way we can achieve more predictable behaviour:
@ -112,3 +96,1 @@
if err != nil {
continue
}
batchPerCursor := int(prm.count) / len(cursor.shardCursorWrappers)
What if it is == 0?
Nothing wrong, but let's return
ListWithCursorRes{}, ErrEndOfListing
.Also,
len(cursor.shardCursorWrappers) <= int(prm.count)
. See above how it's adjustedb4014aec7b
to1b355459a0
@ -52,72 +78,91 @@ func (l ListWithCursorRes) Cursor() *Cursor {
return l.cursor
}
const (
redundant parentheses? also some comment is expected, otherwise,
const
is totally useless@ -60,0 +93,4 @@
//
// Adding a shard between ListWithCursor does not invalidate the cursor but new shard
// won't be listed.
// Removing a shard between ListWithCursor leads to the undefined behavior
is that planned to be solved?
(also, dot at the end of the sentence)
I don't think so. Anyway, it's rather warning for user.
From my POV, it's fine when an iterator gets invalidated because data has been changed
@ -116,3 +119,1 @@
result = append(result, res.AddressList()...)
cursor.shardCursor = res.Cursor()
cursor.shardID = shardIDs[i]
wg := &sync.WaitGroup{}
usually we do just
var wg sync.WaitGroup
I should get used to it :) Fixed!
@ -130,0 +180,4 @@
shardIDs = append(shardIDs, id)
}
e.mtx.RUnlock()
sort.Slice(shardIDs, func(i, j int) bool {
does
sort.Strings
do the work?@ -130,0 +187,4 @@
}
func newCursor(count uint32, shardIDs []string) *Cursor {
var cursorCount uint32 = defaultCursorsNum
is that var required?
It's alternative for casting
cursorCount := uint32(defaultCursorsNum)
. I prefer to not use casting becasue for me this is not declarative way, but I can fix this if it's needed@ -130,0 +194,4 @@
cursor := &Cursor{
shardCursorWrappers: make([]*shardCursorWrapper, cursorCount),
}
if len(shardIDs) < len(cursor.shardCursorWrappers) {
can we calculate slice size before
make
and do not allocate more that we need?1b355459a0
tod4f33faf12
@ -21,0 +33,4 @@
return
}
func (s *shardCursorWrapper) setShardCursor(shardCursor *shard.Cursor) {
From your POV, do this methods really improve readability?
Outdated. Removed this method
@ -21,0 +44,4 @@
// that point to one or few different shards. This allows to read objects from different
// shards concurrently.
type Cursor struct {
shardCursorWrappers []*shardCursorWrapper
So we have a list of
shardIDs
in each element of this slice. Can we share both them andcurr
?Outdated. Removed this method
@ -128,2 +172,4 @@
}, nil
}
func getSortedShardIDs(e *StorageEngine) []string {
We already have similar methods like
sortedShards
, it is possible/easy to reuse/rewrite them somehow?Sorry, but I have not found anything with name
sortedShards
@ -79,0 +65,4 @@
wantCursorsNum: 1,
},
{
name: "many shards, many objects, small batch size",
What is
many
in this context? I mean what is the difference between 3 and 6?Removed extra cases
d4f33faf12
tofc48f8d152
fc48f8d152
to6effd06d51
6effd06d51
todad5f3d092
dad5f3d092
to62c8af9f9c
62c8af9f9c
toc75a340309
c75a340309
toada081dfd5
Policer
#19