[#166] node: Parallelize background tree service sync by operation batching #235
No reviewers
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#235
Loading…
Reference in a new issue
No description provided.
Delete branch "aarifullin/frostfs-node:feature/166-batch_tree_apply"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Signed-off-by: Airat Arifullin a.arifullin@yadro.com
@ -234,0 +242,4 @@
return nil
})
}
if errGroupErr := errGroup.Wait(); errGroupErr != nil {
I am sure the error check must NOT be ignored here: we'll never figure out if
TreeApply
is failed inGo
@ -231,3 +231,1 @@
} else {
newHeight++
}
errGroup.Go(func() error {
@fyrchik Should we add some comments here? The batching happens implicitly, so...
No, I think it is clear what we are doing.
[#166] node: Parallelize background tree service sync by operation batchingto WIP: [#166] node: Parallelize background tree service sync by operation batching41b7ef50ad
to81f91b379f
WIP: [#166] node: Parallelize background tree service sync by operation batchingto [#166] node: Parallelize background tree service sync by operation batching@ -196,2 +196,4 @@
cid.Encode(rawCID)
errGroup, egCtx := errgroup.WithContext(ctx)
errGroup.SetLimit(1024)
magic number?
Have made it as local named constant
By the way, it was constant in the hotfix, we can make it a parameter here: the difference between 1024 and 16k is significant, we may want to increase this parameter on high-end servers.
81f91b379f
to296ae711fd
@ -231,3 +232,1 @@
} else {
newHeight++
}
errGroup.Go(func() error {
But can we do better in this case? I mean we already parallelize operations by nodes, it would be nice not to try applying the same operation from different nodes.
170fdea621
toe509366ff4
e509366ff4
todc058f6cd1
dc058f6cd1
to4c64d149b4
@ -128,0 +131,4 @@
ms := make([]*pilorama.Move, len(streams))
for i := range streams {
m, ok := <-streams[i]
Closed channel will just return default value, do we need
, ok
andif
here?My bad.
ok, if
are unnecessaryFixed
@ -128,0 +152,4 @@
for {
minTimeMove := &pilorama.Move{}
minTimeMove.Time = math.MaxInt64
minTimeMoveIndex := -1
Can we just use
[0]
element as the first minimum and storeindex
only?Here
-1
is like C++'sstd::find_if(ms.begin(), ms.end(), []{...} == ms.end())
(sorry, only this idea comes to my mind).You asked here if we check for
-1
when all streams are closed - it's correct, I need this logic@ -128,0 +160,4 @@
}
}
if minTimeMoveIndex == -1 {
This is possible only if we have no streams or all of them are empty, right?
Yes, you're right
@ -128,0 +177,4 @@
}
func (s *Service) applyOperationStream(ctx context.Context, cid cid.ID, treeID string,
operationStream <-chan *pilorama.Move, notApplied chan<- *pilorama.Move) {
Just a question: we use
notApplied
only to maintain some height.Wouldn't it be easier to return this height from the function and avoid providing a channel as an argument? (locally we could use either mutex or channel, mutex looks simpler to me).
You're right. That's really possible and will make my code easier
Fixed
@ -128,0 +182,4 @@
const workersCount = 1024
errGroup.SetLimit(workersCount)
var prev *pilorama.Move
Do we need
prev
or onlyprev.Time
?Only its
.Time
. Do you think I should introduceuint64
variable?I would use
uint64
, but nothing wrong with your approach.@ -128,0 +196,4 @@
errGroup.Go(func() error {
if err := s.forest.TreeApply(ctx, cid, treeID, m, true); err != nil {
notApplied <- m
How can I easily be sure that this operation would not block forever?
No longer actual
4c64d149b4
to526086292d
@ -128,0 +191,4 @@
// skip already applied op
if prev != nil {
if prev.Time == m.Time {
Can you merge it in one if?
Done
526086292d
to5d20745056
@ -128,0 +146,4 @@
var minStreamedLastHeight uint64 = math.MaxUint64
for {
minTimeMove := &pilorama.Move{}
Do we still need it?
@ -140,0 +273,4 @@
nodeOperationStreams[i] = make(chan *pilorama.Move)
}
merged := make(chan *pilorama.Move)
minStreamedLastHeight := make(chan uint64)
We already use
errgroup
, do we have any benefits when using a channel instead of a simple variable?Fair point, fixed
5d20745056
to155f151523