gc: Skip received event if previous still processing #1228
No reviewers
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#1228
Loading…
Reference in a new issue
No description provided.
Delete branch "acid-ant/frostfs-node:bugfix/gc-unblock-evnt-handler"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Unblocking send in the notification channel fixed by #1238
But we still have an issue with handling multiple events:
Signed-off-by: Anton Nikiforov an.nikiforov@yadro.com
67133a3766
tocb113bd046
gc: Skip received event if previous still processingto WIP: gc: Skip received event if previous still processingcb113bd046
to0b95f40cba
0b95f40cba
toc524dfb80c
WIP: gc: Skip received event if previous still processingto gc: Skip received event if previous still processing@ -178,19 +178,14 @@ func (gc *gc) listenEvents(ctx context.Context) {
}
func (gc *gc) handleEvent(ctx context.Context, event Event) {
if !gc.processEvnt.CompareAndSwap(false, true) {
Here is compare and swap
false
->true
, but where istrue
->false
swap?Thanks, lost it after refactoring. Added routine which will wait for result of processing.
gc: Skip received event if previous still processingto WIP: gc: Skip received event if previous still processingc524dfb80c
to1f63926ab2
1f63926ab2
to70148790c3
WIP: gc: Skip received event if previous still processingto gc: Skip received event if previous still processingOther than the comments, LGTM
@ -102,6 +101,7 @@ type gc struct {
onceStop sync.Once
stopChannel chan struct{}
wg sync.WaitGroup
processEvnt atomic.Bool
typo in
processEvent
?If you want not to make other lines take part in diff, could add empty line before the new field declaration
Thanks, I'll take this into account in the future.
@ -181,1 +182,4 @@
gc.log.Warn(logs.ShardEventProcessingInProgress)
return
}
v, ok := gc.mEventHandler[event.typ()]
Just to clarify: do we have only 1 event type?
Right now, yes - only for new epoch.
@ -212,2 +206,4 @@
v.wg.Done()
}
}
go func() {
It is not obvious, why this code should not be executed if we exit prematurely.
Could you elaborate a bit on why this is the right behaviour?
I don't like this approach too. It was compromise, because we are handling only one type of events.
Refactored a bit to avoid leaking of the
routines
.@ -212,2 +207,4 @@
}
}
go func() {
gc.log.Debug(logs.ShardEventProcessingWait)
Also, it seems this goroutine can leak, who waits until it is finished?
@ -213,1 +208,4 @@
}
go func() {
gc.log.Debug(logs.ShardEventProcessingWait)
v.wg.Wait()
Why waiting is performed by separate goroutine?
Refactored this part. Please review.
70148790c3
tod0ac7897e1
@ -203,2 +204,2 @@
defer v.prevGroup.Done()
h(runCtx, event)
h(ctx, event)
n := v.runningHandlers.Add(-1)
Let
v.runningHandlers = 1
before this assignment. When it's decremented, it becomes0
and the next coming event is able to performCompareAndSwap
above. As soon as the next event passes through this gate, it starts iterating overv.handlers
and increments wait group's counterv.wg.Add(1)
. Can't this cause a hang inv.wg.Done()
within submitted handler that's not finished yet?v.wg.Done()
is equal tov.wg.Add(-1)
, and according to source code is an action uponatomic
.As I understand, you are talking about situation, when we are able to submit in a pool, but there are no ready workers to process submitted action, because previous submitted action unblocked processing, but still in a working queue.
I don't think it is a problem.
Sorry, I quickly glanced at the code and I mistook
Done
forWait
.Yes, in this case I believe we have no problem
@ -195,2 +192,4 @@
select {
case <-ctx.Done():
n := v.runningHandlers.Add(int32(i - len(v.handlers)))
if n == 0 {
We have this
Add(-1) + log
in 3 different places, that's 12 lines for one debug log. Can it be made simpler?You are right, wrapped this part into the function.
Anyway, how is this solution different from non-blocking send in the notification channel in the epoch handler?
You are talking about buffered channel?
It doesn't matter, our handler hangs, with non-blocking send it won't.
It hangs only for submitting into the pool. You think we should reduce this time too?
@fyrchik, created #1238.
Submit to the pool stops because all workers in it are busy.
All workers are busy because they block on channel send.
#1238 solves this problem completely, do we need this PR?
Looks like we can close it because we have only one event type, and it is not a big deal for us to wait for a completion, but not for a new notification.
d0ac7897e1
to2614f972fd
2614f972fd
tof6247edd24
f6247edd24
toaeafa97f5a
Pull request closed