Prevent process from killing by systemd when shutting down #34

2023-01-27T15:13:34Z

acid-ant commented

2023-01-27 15:13:34 +00:00

(Migrated from github.com)

Signed-off-by: Anton Nikiforov an.nikiforov@yadro.com

Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>

carpawell (Migrated from github.com) reviewed 2023-01-27 15:13:34 +00:00

carpawell (Migrated from github.com) reviewed 2023-02-01 19:55:38 +00:00

cmd/frostfs-node/main.go

carpawell (Migrated from github.com) commented

2023-02-01 19:51:37 +00:00

what if initialization is taking too long but no os interuption signals are listened yet? however, that mb not critical at all
/cc @fyrchik

what if initialization is taking too long but no os interuption signals are listened yet? however, that mb not critical at all /cc @fyrchik

carpawell (Migrated from github.com) reviewed 2023-02-01 19:59:41 +00:00

cmd/frostfs-node/config.go

					
				@ -967,1 +968,3 @@

							return

					// Storage Engine

					var rcfg engine.ReConfiguration

carpawell (Migrated from github.com) commented

2023-02-01 19:58:20 +00:00

why do we listen to ctx here? i would close cancel func with terminateWatcher and do not share it via cfg at all

also, it would save us from double c.setHealthStatus(control.HealthStatus_SHUTTING_DOWN) i guess

why do we listen to `ctx` here? i would close cancel func with `terminateWatcher` and do not share it via `cfg` at all also, it would save us from double `c.setHealthStatus(control.HealthStatus_SHUTTING_DOWN)` i guess

pkg/morph/client/notifications.go

carpawell (Migrated from github.com) commented

2023-02-01 19:59:37 +00:00

is it possible to just close the chan here? do not remember why that was done that way but now closing looks better to me

acid-ant (Migrated from github.com) reviewed 2023-02-02 06:37:35 +00:00

cmd/frostfs-node/main.go

acid-ant (Migrated from github.com) commented

2023-02-02 06:37:35 +00:00

@carpawell see your point. in this case setting of the health status should be under the mutex and we need to exit once the signal trapped and status not equal READY, right?

acid-ant (Migrated from github.com) reviewed 2023-02-02 06:42:18 +00:00

cmd/frostfs-node/config.go

					
				@ -967,1 +968,3 @@

							return

					// Storage Engine

					var rcfg engine.ReConfiguration

acid-ant (Migrated from github.com) commented

2023-02-02 06:42:17 +00:00

@carpawell we listen here for context because we may want to exit by internal error.

carpawell (Migrated from github.com) reviewed 2023-02-02 07:40:41 +00:00

cmd/frostfs-node/config.go

					
				@ -967,1 +968,3 @@

							return

					// Storage Engine

					var rcfg engine.ReConfiguration

carpawell (Migrated from github.com) commented

2023-02-02 07:40:41 +00:00

yes, and my question is: would it be better to just move listening internal error here too (in the same goroutine)? it would be the only code that is possible to cancel ctx, all the others would just wait for its closing. more straight scheme IMO (if it is possible and i have not missed something)

carpawell (Migrated from github.com) reviewed 2023-02-02 07:40:43 +00:00

cmd/frostfs-node/main.go

carpawell (Migrated from github.com) commented

2023-02-02 07:40:43 +00:00

yes, two things bother me:

signal listening moved in the end of the init stage in that PR;
component init could take time and context cancel does not seem to handle that

yes, two things bother me: 1. signal listening moved in the end of the init stage in that PR; 2. component init could take time and context cancel does not seem to handle that

acid-ant (Migrated from github.com) reviewed 2023-02-02 08:12:17 +00:00

cmd/frostfs-node/main.go

acid-ant (Migrated from github.com) commented

2023-02-02 08:12:17 +00:00

@carpawell right, it is impossible to cover that with context, when health state is undefined or starting. I think here we need to decide set o services for graceful shutdown, local storage is the first candidate, try close them and exit. @fyrchik your thoughts on this?

acid-ant (Migrated from github.com) reviewed 2023-02-02 12:25:01 +00:00

cmd/frostfs-node/config.go

					
				@ -967,1 +968,3 @@

							return

					// Storage Engine

					var rcfg engine.ReConfiguration

acid-ant (Migrated from github.com) commented

2023-02-02 12:25:01 +00:00

Looks better, moved listening for internal errors here too.

acid-ant (Migrated from github.com) reviewed 2023-02-02 12:38:55 +00:00

pkg/morph/client/notifications.go

acid-ant (Migrated from github.com) commented

2023-02-02 12:38:55 +00:00

Sure, replaced with close.

carpawell (Migrated from github.com) reviewed 2023-02-06 17:20:53 +00:00

cmd/frostfs-node/main.go

carpawell (Migrated from github.com) commented

2023-02-06 17:20:53 +00:00

i think we can just leave an issue for that and merge that PR

carpawell (Migrated from github.com) reviewed 2023-02-06 17:20:58 +00:00

fyrchik (Migrated from github.com) reviewed 2023-02-06 18:36:58 +00:00

cmd/frostfs-node/config.go

					
				@ -962,3 +966,2 @@

								}

							}

					components = append(components, dCfg{name: "logger", cfg: logPrm})

fyrchik (Migrated from github.com) commented

2023-02-06 13:46:29 +00:00

Why not just nil? Also, do we need it at all?

Why not just `nil`? Also, do we need it at all?

fyrchik (Migrated from github.com) commented

2023-02-06 18:28:21 +00:00

Why did you decide to have a separate function?

cmd/frostfs-node/config.go

					
				@ -967,1 +968,3 @@

							return

					// Storage Engine

					var rcfg engine.ReConfiguration

fyrchik (Migrated from github.com) commented

2023-02-06 18:36:56 +00:00

So can we remove ctx.Done now? The problem I see it that this is not a real "worker": it doesn't use context at all, but instead manages. Why not have this select executed synchronously in wait?

So can we remove `ctx.Done` now? The problem I see it that this is not a real "worker": it doesn't use context at all, but instead _manages_. Why not have this `select` executed synchronously in `wait`?

cmd/frostfs-node/main.go

fyrchik (Migrated from github.com) commented

2023-02-06 18:27:53 +00:00

Could you elaborate, what is the problem, exactly?
Simple mutex in setHealthStatus seems easy to implement if needed.

Could you elaborate, what is the problem, exactly? Simple mutex in `setHealthStatus` seems easy to implement if needed.

cmd/frostfs-node/morph.go

fyrchik (Migrated from github.com) commented

2023-02-06 18:16:14 +00:00

Do we need this context?

Do we _need_ this context?

acid-ant (Migrated from github.com) reviewed 2023-02-07 08:20:29 +00:00

cmd/frostfs-node/config.go

					
				@ -962,3 +966,2 @@

								}

							}

					components = append(components, dCfg{name: "logger", cfg: logPrm})

acid-ant (Migrated from github.com) commented

2023-02-07 08:20:29 +00:00

No, we don't need this line at all, removed.

acid-ant (Migrated from github.com) reviewed 2023-02-07 08:20:50 +00:00

cmd/frostfs-node/morph.go

acid-ant (Migrated from github.com) commented

2023-02-07 08:20:49 +00:00

In current implementation it is redundant, removed.

acid-ant (Migrated from github.com) reviewed 2023-02-07 08:22:18 +00:00

cmd/frostfs-node/config.go

					
				@ -962,3 +966,2 @@

								}

							}

					components = append(components, dCfg{name: "logger", cfg: logPrm})

acid-ant (Migrated from github.com) commented

2023-02-07 08:22:18 +00:00

Now I see that it is not necessary, thanks. Merged two functions in one.

acid-ant (Migrated from github.com) reviewed 2023-02-07 08:26:59 +00:00

cmd/frostfs-node/main.go

acid-ant (Migrated from github.com) commented

2023-02-07 08:26:59 +00:00

What should we do if we stuck at local storage initialization or at service startup - should we try to shutdown normally or we need to try to close limited set of workers and perform os.Exit(1)?

What should we do if we stuck at local storage initialization or at service startup - should we try to shutdown normally or we need to try to close limited set of workers and perform `os.Exit(1)`?

fyrchik (Migrated from github.com) reviewed 2023-02-09 11:54:39 +00:00

cmd/frostfs-node/main.go

fyrchik (Migrated from github.com) commented

2023-02-09 11:54:38 +00:00

Let's wait for now -- it is a feature, this task is about shutdown.
Ideally we would accept context for all operations with storage engine.

Let's wait for now -- it is a feature, this task is about shutdown. Ideally we would accept context for all operations with storage engine.

acid-ant (Migrated from github.com) reviewed 2023-02-09 16:18:33 +00:00

cmd/frostfs-node/main.go

acid-ant (Migrated from github.com) commented

2023-02-09 16:18:32 +00:00

Moved to the top of the initApp(). Created new issue #49 to cover this case.

Moved to the top of the `initApp()`. Created new issue #49 to cover this case.

fyrchik (Migrated from github.com) reviewed 2023-02-10 05:15:14 +00:00

cmd/frostfs-node/config.go

					
				@ -927,1 +929,4 @@

								c.log.Info("termination signal processing is complete")

								return

							}

fyrchik (Migrated from github.com) commented

2023-02-10 05:13:44 +00:00

Actually, it can be problematic handling SIGHUP and SIGTERM in one thread, because SIGHUP handling can take a long time. However, there are mutexes inside of an engine anyway, so we may postpone this until #49.

Actually, it can be problematic handling `SIGHUP` and `SIGTERM` in one thread, because SIGHUP handling can take a long time. However, there are mutexes inside of an engine anyway, so we may postpone this until #49.

cmd/frostfs-node/morph.go

					
				@ -59,1 +59,3 @@

					c.onShutdown(cli.Close)

					c.onShutdown(func() {

						c.log.Info("closing morph components...")

						cli.Close()

fyrchik (Migrated from github.com) commented

2023-02-10 05:14:58 +00:00

not related to this commit

acid-ant (Migrated from github.com) reviewed 2023-02-10 06:23:43 +00:00

cmd/frostfs-node/morph.go

					
				@ -59,1 +59,3 @@

					c.onShutdown(cli.Close)

					c.onShutdown(func() {

						c.log.Info("closing morph components...")

						cli.Close()