gc: Skip received event if previous still processing #1228

acid-ant · 2024-07-04T14:50:18Z

acid-ant commented

2024-07-04 14:50:18 +00:00

Unblocking send in the notification channel fixed by #1238
But we still have an issue with handling multiple events:

Processed Event#1
Event#2 stuck on waiting for end of the processing event_1
Event#3 and the following will be skipped.

Signed-off-by: Anton Nikiforov an.nikiforov@yadro.com

Unblocking send in the notification channel fixed by https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/1238 But we still have an issue with handling multiple events: - Processed Event#1 - Event#2 stuck on waiting for end of the processing event_1 - Event#3 and the following will be skipped. Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>

acid-ant force-pushed bugfix/gc-unblock-evnt-handler from 67133a3766 to cb113bd046

2024-07-04 14:51:25 +00:00

Compare

acid-ant changed title from ~~gc: Skip received event if previous still processing~~ to WIP: gc: Skip received event if previous still processing

2024-07-05 06:54:23 +00:00

acid-ant force-pushed bugfix/gc-unblock-evnt-handler from cb113bd046 to 0b95f40cba

2024-07-05 10:26:19 +00:00

Compare

acid-ant force-pushed bugfix/gc-unblock-evnt-handler from 0b95f40cba to c524dfb80c

2024-07-05 10:40:49 +00:00

Compare

acid-ant requested review from storage-core-committers 2024-07-05 10:56:00 +00:00

acid-ant requested review from storage-core-developers 2024-07-05 10:56:01 +00:00

acid-ant changed title from ~~WIP: gc: Skip received event if previous still processing~~ to gc: Skip received event if previous still processing

2024-07-05 10:56:08 +00:00

dstepanov-yadro reviewed 2024-07-05 11:57:56 +00:00

pkg/local_object_storage/shard/gc.go Outdated

					
				@ -178,19 +178,14 @@ func (gc *gc) listenEvents(ctx context.Context) {

				}

				func (gc *gc) handleEvent(ctx context.Context, event Event) {

					if !gc.processEvnt.CompareAndSwap(false, true) {

dstepanov-yadro commented

2024-07-05 10:59:26 +00:00

Here is compare and swap false -> true, but where is true -> false swap?

Here is compare and swap `false` -> `true`, but where is `true` -> `false` swap?

acid-ant commented

2024-07-08 08:14:21 +00:00

Thanks, lost it after refactoring. Added routine which will wait for result of processing.

acid-ant changed title from ~~gc: Skip received event if previous still processing~~ to WIP: gc: Skip received event if previous still processing

2024-07-07 18:13:32 +00:00

acid-ant force-pushed bugfix/gc-unblock-evnt-handler from c524dfb80c to 1f63926ab2

2024-07-08 06:53:31 +00:00

Compare

acid-ant force-pushed bugfix/gc-unblock-evnt-handler from 1f63926ab2 to 70148790c3

2024-07-08 07:52:25 +00:00

Compare

acid-ant changed title from ~~WIP: gc: Skip received event if previous still processing~~ to gc: Skip received event if previous still processing

2024-07-08 08:14:30 +00:00

fyrchik reviewed 2024-07-08 12:16:36 +00:00

fyrchik left a comment

Other than the comments, LGTM

pkg/local_object_storage/shard/gc.go Outdated

					
				@ -102,6 +101,7 @@ type gc struct {

					onceStop    sync.Once

					stopChannel chan struct{}

					wg          sync.WaitGroup

					processEvnt atomic.Bool

fyrchik commented

2024-07-08 12:13:32 +00:00

typo in processEvent?
If you want not to make other lines take part in diff, could add empty line before the new field declaration

typo in `processEvent`? If you want not to make other lines take part in diff, could add empty line before the new field declaration

acid-ant commented

2024-07-09 08:02:52 +00:00

Thanks, I'll take this into account in the future.

fyrchik marked this conversation as resolved

pkg/local_object_storage/shard/gc.go Outdated

					
				@ -181,1 +182,4 @@

						gc.log.Warn(logs.ShardEventProcessingInProgress)

						return

					}

					v, ok := gc.mEventHandler[event.typ()]

fyrchik commented

2024-07-08 12:15:28 +00:00

Just to clarify: do we have only 1 event type?

acid-ant commented

2024-07-09 08:04:16 +00:00

Right now, yes - only for new epoch.

fyrchik marked this conversation as resolved

pkg/local_object_storage/shard/gc.go Outdated

					
				@ -212,2 +206,4 @@

							v.wg.Done()

						}

					}

					go func() {

fyrchik commented

2024-07-08 12:14:47 +00:00

It is not obvious, why this code should not be executed if we exit prematurely.
Could you elaborate a bit on why this is the right behaviour?

It is not obvious, why this code should not be executed if we exit prematurely. Could you elaborate a bit on why this is the right behaviour?

acid-ant commented

2024-07-09 08:07:45 +00:00

I don't like this approach too. It was compromise, because we are handling only one type of events.
Refactored a bit to avoid leaking of the routines.

I don't like this approach too. It was compromise, because we are handling only one type of events. Refactored a bit to avoid leaking of the `routines`.

fyrchik marked this conversation as resolved

fyrchik reviewed 2024-07-08 12:18:06 +00:00

pkg/local_object_storage/shard/gc.go Outdated

					
				@ -212,2 +207,4 @@

						}

					}

					go func() {

						gc.log.Debug(logs.ShardEventProcessingWait)

fyrchik commented

2024-07-08 12:18:03 +00:00

Also, it seems this goroutine can leak, who waits until it is finished?

dstepanov-yadro reviewed 2024-07-09 07:40:05 +00:00

pkg/local_object_storage/shard/gc.go Outdated

					
				@ -213,1 +208,4 @@

					}

					go func() {

						gc.log.Debug(logs.ShardEventProcessingWait)

						v.wg.Wait()

dstepanov-yadro commented

2024-07-09 07:40:05 +00:00

Why waiting is performed by separate goroutine?

acid-ant commented

2024-07-09 08:08:17 +00:00

Refactored this part. Please review.

acid-ant force-pushed bugfix/gc-unblock-evnt-handler from 70148790c3 to d0ac7897e1

2024-07-09 08:02:06 +00:00

Compare

dstepanov-yadro approved these changes 2024-07-09 08:19:45 +00:00

aarifullin reviewed 2024-07-09 09:24:20 +00:00

pkg/local_object_storage/shard/gc.go Outdated

					
				@ -203,2 +204,2 @@

							defer v.prevGroup.Done()

							h(runCtx, event)

							h(ctx, event)

							n := v.runningHandlers.Add(-1)

aarifullin commented

2024-07-09 09:24:20 +00:00

Let v.runningHandlers = 1 before this assignment. When it's decremented, it becomes 0 and the next coming event is able to perform CompareAndSwap above. As soon as the next event passes through this gate, it starts iterating over v.handlers and increments wait group's counter v.wg.Add(1). Can't this cause a hang in v.wg.Done() within submitted handler that's not finished yet?

Let `v.runningHandlers = 1` before this assignment. When it's decremented, it becomes `0` and the next coming event is able to perform `CompareAndSwap` above. As soon as the next event passes through this gate, it starts iterating over `v.handlers` and increments wait group's counter `v.wg.Add(1)`. Can't this cause a hang in `v.wg.Done()` within submitted handler that's not finished yet?

acid-ant commented

2024-07-09 09:49:42 +00:00

v.wg.Done() is equal to v.wg.Add(-1), and according to source code is an action upon atomic.
As I understand, you are talking about situation, when we are able to submit in a pool, but there are no ready workers to process submitted action, because previous submitted action unblocked processing, but still in a working queue.
I don't think it is a problem.

`v.wg.Done()` is equal to `v.wg.Add(-1)`, and according to source code is an action upon `atomic`. As I understand, you are talking about situation, when we are able to submit in a pool, but there are no ready workers to process submitted action, because previous submitted action unblocked processing, but still in a working queue. I don't think it is a problem.

aarifullin commented

2024-07-09 10:51:48 +00:00

Sorry, I quickly glanced at the code and I mistook Done for Wait.
Yes, in this case I believe we have no problem

Sorry, I quickly glanced at the code and I mistook `Done` for `Wait`. Yes, in this case I believe we have no problem

aarifullin marked this conversation as resolved

aarifullin approved these changes 2024-07-09 10:51:59 +00:00

achuprov approved these changes 2024-07-09 10:56:36 +00:00

fyrchik requested changes 2024-07-09 12:50:04 +00:00

pkg/local_object_storage/shard/gc.go Outdated

					
				@ -195,2 +192,4 @@

						select {

						case <-ctx.Done():

							n := v.runningHandlers.Add(int32(i - len(v.handlers)))

							if n == 0 {

fyrchik commented

2024-07-09 12:47:29 +00:00

We have this Add(-1) + log in 3 different places, that's 12 lines for one debug log. Can it be made simpler?

We have this `Add(-1) + log` in 3 different places, that's 12 lines for one debug log. Can it be made simpler?

acid-ant commented

2024-07-09 14:33:02 +00:00

You are right, wrapped this part into the function.

fyrchik commented

2024-07-09 15:02:56 +00:00

Anyway, how is this solution different from non-blocking send in the notification channel in the epoch handler?

acid-ant commented

2024-07-09 17:53:50 +00:00

You are talking about buffered channel?

fyrchik commented

2024-07-10 06:10:46 +00:00

It doesn't matter, our handler hangs, with non-blocking send it won't.

acid-ant commented

2024-07-10 06:40:14 +00:00

It hangs only for submitting into the pool. You think we should reduce this time too?

acid-ant commented

2024-07-10 08:36:42 +00:00

@fyrchik, created #1238.

@fyrchik, created https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/1238.

fyrchik commented

2024-07-10 13:35:51 +00:00

Submit to the pool stops because all workers in it are busy.
All workers are busy because they block on channel send.

#1238 solves this problem completely, do we need this PR?

Submit to the pool stops because all workers in it are busy. All workers are busy because they block on channel send. #1238 solves this problem completely, do we need this PR?

acid-ant commented

2024-07-10 13:56:13 +00:00

Looks like we can close it because we have only one event type, and it is not a big deal for us to wait for a completion, but not for a new notification.