[#166] node: Parallelize background tree service sync by operation batching #235

Merged
fyrchik merged 2 commits from aarifullin/frostfs-node:feature/166-batch_tree_apply into master 2023-04-26 10:17:59 +00:00
Member
  • Concurrently dispatch TreeApply operations for batching in forest

Signed-off-by: Airat Arifullin a.arifullin@yadro.com

* Concurrently dispatch TreeApply operations for batching in forest Signed-off-by: Airat Arifullin a.arifullin@yadro.com
aarifullin added the
frostfs-node
label 2023-04-11 15:10:25 +00:00
aarifullin reviewed 2023-04-11 15:12:40 +00:00
@ -234,0 +242,4 @@
return nil
})
}
if errGroupErr := errGroup.Wait(); errGroupErr != nil {
Author
Member

I am sure the error check must NOT be ignored here: we'll never figure out if TreeApply is failed in Go

I am sure the error check must NOT be ignored here: we'll never figure out if `TreeApply` is failed in `Go`
aarifullin reviewed 2023-04-11 15:13:43 +00:00
@ -231,3 +231,1 @@
} else {
newHeight++
}
errGroup.Go(func() error {
Author
Member

@fyrchik Should we add some comments here? The batching happens implicitly, so...

@fyrchik Should we add some comments here? The batching happens implicitly, so...
Owner

No, I think it is clear what we are doing.

No, I think it is clear what we are doing.
fyrchik marked this conversation as resolved
aarifullin requested review from fyrchik 2023-04-11 15:13:58 +00:00
aarifullin requested review from storage-core-committers 2023-04-11 15:13:58 +00:00
aarifullin requested review from storage-core-developers 2023-04-11 15:13:58 +00:00
aarifullin changed title from [#166] node: Parallelize background tree service sync by operation batching to WIP: [#166] node: Parallelize background tree service sync by operation batching 2023-04-11 15:23:14 +00:00
aarifullin force-pushed feature/166-batch_tree_apply from 41b7ef50ad to 81f91b379f 2023-04-11 15:25:07 +00:00 Compare
aarifullin changed title from WIP: [#166] node: Parallelize background tree service sync by operation batching to [#166] node: Parallelize background tree service sync by operation batching 2023-04-11 15:25:12 +00:00
carpawell reviewed 2023-04-11 19:41:13 +00:00
@ -196,2 +196,4 @@
cid.Encode(rawCID)
errGroup, egCtx := errgroup.WithContext(ctx)
errGroup.SetLimit(1024)
Contributor

magic number?

magic number?
Author
Member

Have made it as local named constant

Have made it as *local* named constant
Owner

By the way, it was constant in the hotfix, we can make it a parameter here: the difference between 1024 and 16k is significant, we may want to increase this parameter on high-end servers.

By the way, it was constant in the hotfix, we can make it a parameter here: the difference between 1024 and 16k is significant, we may want to increase this parameter on high-end servers.
aarifullin force-pushed feature/166-batch_tree_apply from 81f91b379f to 296ae711fd 2023-04-11 21:13:35 +00:00 Compare
dstepanov-yadro approved these changes 2023-04-12 06:52:16 +00:00
fyrchik reviewed 2023-04-13 04:59:19 +00:00
@ -231,3 +232,1 @@
} else {
newHeight++
}
errGroup.Go(func() error {
Owner

But can we do better in this case? I mean we already parallelize operations by nodes, it would be nice not to try applying the same operation from different nodes.

But can we do better in this case? I mean we already parallelize operations by nodes, it would be nice not to try applying the same operation from different nodes.
fyrchik marked this conversation as resolved
acid-ant approved these changes 2023-04-13 13:29:01 +00:00
aarifullin force-pushed feature/166-batch_tree_apply from 170fdea621 to e509366ff4 2023-04-20 13:02:22 +00:00 Compare
aarifullin force-pushed feature/166-batch_tree_apply from e509366ff4 to dc058f6cd1 2023-04-20 13:06:28 +00:00 Compare
aarifullin requested review from storage-core-committers 2023-04-20 15:50:35 +00:00
aarifullin requested review from storage-core-developers 2023-04-20 15:50:36 +00:00
aarifullin force-pushed feature/166-batch_tree_apply from dc058f6cd1 to 4c64d149b4 2023-04-20 16:08:59 +00:00 Compare
fyrchik approved these changes 2023-04-21 09:53:42 +00:00
@ -128,0 +131,4 @@
ms := make([]*pilorama.Move, len(streams))
for i := range streams {
m, ok := <-streams[i]
Owner

Closed channel will just return default value, do we need , ok and if here?

Closed channel will just return default value, do we need `, ok` and `if` here?
Author
Member

My bad. ok, if are unnecessary

My bad. `ok, if` are unnecessary
Author
Member

Fixed

Fixed
fyrchik marked this conversation as resolved
@ -128,0 +152,4 @@
for {
minTimeMove := &pilorama.Move{}
minTimeMove.Time = math.MaxInt64
minTimeMoveIndex := -1
Owner

Can we just use [0] element as the first minimum and store index only?

Can we just use `[0]` element as the first minimum and store `index` only?
Author
Member

Here -1 is like C++'s std::find_if(ms.begin(), ms.end(), []{...} == ms.end()) (sorry, only this idea comes to my mind).

You asked here if we check for -1 when all streams are closed - it's correct, I need this logic

Here `-1` is like C++'s `std::find_if(ms.begin(), ms.end(), []{...} == ms.end())` (sorry, only this idea comes to my mind). You asked [here](https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/235/files#issuecomment-6981) if we check for `-1` when all streams are closed - it's correct, I need this logic
fyrchik marked this conversation as resolved
@ -128,0 +160,4 @@
}
}
if minTimeMoveIndex == -1 {
Owner

This is possible only if we have no streams or all of them are empty, right?

This is possible only if we have no streams or all of them are empty, right?
Author
Member

Yes, you're right

Yes, you're right
fyrchik marked this conversation as resolved
@ -128,0 +177,4 @@
}
func (s *Service) applyOperationStream(ctx context.Context, cid cid.ID, treeID string,
operationStream <-chan *pilorama.Move, notApplied chan<- *pilorama.Move) {
Owner

Just a question: we use notApplied only to maintain some height.
Wouldn't it be easier to return this height from the function and avoid providing a channel as an argument? (locally we could use either mutex or channel, mutex looks simpler to me).

Just a question: we use `notApplied` only to maintain some height. Wouldn't it be easier to return this height from the function and avoid providing a channel as an argument? (locally we could use either mutex or channel, mutex looks simpler to me).
Author
Member

You're right. That's really possible and will make my code easier

You're right. That's really possible and will make my code easier
Author
Member

Fixed

Fixed
fyrchik marked this conversation as resolved
@ -128,0 +182,4 @@
const workersCount = 1024
errGroup.SetLimit(workersCount)
var prev *pilorama.Move
Owner

Do we need prev or only prev.Time?

Do we need `prev` or only `prev.Time`?
Author
Member

Only its .Time. Do you think I should introduce uint64 variable?

Only its `.Time`. Do you think I should introduce `uint64` variable?
Owner

I would use uint64, but nothing wrong with your approach.

I would use `uint64`, but nothing wrong with your approach.
fyrchik marked this conversation as resolved
@ -128,0 +196,4 @@
errGroup.Go(func() error {
if err := s.forest.TreeApply(ctx, cid, treeID, m, true); err != nil {
notApplied <- m
Owner

How can I easily be sure that this operation would not block forever?

How can I easily be sure that this operation would not block forever?
Author
Member

No longer actual

No longer actual
fyrchik marked this conversation as resolved
fyrchik requested review from fyrchik 2023-04-21 09:53:49 +00:00
aarifullin force-pushed feature/166-batch_tree_apply from 4c64d149b4 to 526086292d 2023-04-23 12:23:35 +00:00 Compare
acid-ant reviewed 2023-04-24 08:36:15 +00:00
@ -128,0 +191,4 @@
// skip already applied op
if prev != nil {
if prev.Time == m.Time {
Member

Can you merge it in one if?

Can you merge it in one if?
Author
Member

Done

Done
acid-ant marked this conversation as resolved
aarifullin requested review from acid-ant 2023-04-24 12:02:29 +00:00
aarifullin requested review from dstepanov-yadro 2023-04-24 12:02:33 +00:00
aarifullin force-pushed feature/166-batch_tree_apply from 526086292d to 5d20745056 2023-04-24 12:04:36 +00:00 Compare
fyrchik approved these changes 2023-04-24 14:45:55 +00:00
@ -128,0 +146,4 @@
var minStreamedLastHeight uint64 = math.MaxUint64
for {
minTimeMove := &pilorama.Move{}
Owner

Do we still need it?

Do we still need it?
@ -140,0 +273,4 @@
nodeOperationStreams[i] = make(chan *pilorama.Move)
}
merged := make(chan *pilorama.Move)
minStreamedLastHeight := make(chan uint64)
Owner

We already use errgroup, do we have any benefits when using a channel instead of a simple variable?

We already use `errgroup`, do we have any benefits when using a channel instead of a simple variable?
Author
Member

Fair point, fixed

Fair point, fixed
fyrchik marked this conversation as resolved
dstepanov-yadro refused to review 2023-04-25 07:11:51 +00:00
aarifullin force-pushed feature/166-batch_tree_apply from 5d20745056 to 155f151523 2023-04-25 07:22:38 +00:00 Compare
dstepanov-yadro approved these changes 2023-04-25 07:29:11 +00:00
aarifullin requested review from fyrchik 2023-04-25 08:54:16 +00:00
aarifullin requested review from dstepanov-yadro 2023-04-25 08:54:17 +00:00
fyrchik approved these changes 2023-04-25 10:23:45 +00:00
dstepanov-yadro approved these changes 2023-04-25 15:49:49 +00:00
dstepanov-yadro approved these changes 2023-04-25 15:49:52 +00:00
fyrchik merged commit 9d01029733 into master 2023-04-26 10:17:59 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#235
No description provided.