Prevent process from killing by systemd when shutting down #34

Merged
acid-ant merged 2 commits from bugfix/1465-fix-node-shutdown into master 2023-02-17 09:13:01 +00:00
acid-ant commented 2023-01-27 15:13:34 +00:00 (Migrated from github.com)

Signed-off-by: Anton Nikiforov an.nikiforov@yadro.com

Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
carpawell (Migrated from github.com) reviewed 2023-01-27 15:13:34 +00:00
carpawell (Migrated from github.com) reviewed 2023-02-01 19:55:38 +00:00
carpawell (Migrated from github.com) commented 2023-02-01 19:51:37 +00:00

what if initialization is taking too long but no os interuption signals are listened yet? however, that mb not critical at all
/cc @fyrchik

what if initialization is taking too long but no os interuption signals are listened yet? however, that mb not critical at all /cc @fyrchik
carpawell (Migrated from github.com) reviewed 2023-02-01 19:59:41 +00:00
@ -967,1 +968,3 @@
return
// Storage Engine
var rcfg engine.ReConfiguration
carpawell (Migrated from github.com) commented 2023-02-01 19:58:20 +00:00

why do we listen to ctx here? i would close cancel func with terminateWatcher and do not share it via cfg at all

also, it would save us from double c.setHealthStatus(control.HealthStatus_SHUTTING_DOWN) i guess

why do we listen to `ctx` here? i would close cancel func with `terminateWatcher` and do not share it via `cfg` at all also, it would save us from double `c.setHealthStatus(control.HealthStatus_SHUTTING_DOWN)` i guess
carpawell (Migrated from github.com) commented 2023-02-01 19:59:37 +00:00

is it possible to just close the chan here? do not remember why that was done that way but now closing looks better to me

is it possible to just close the chan here? do not remember why that was done that way but now closing looks better to me
acid-ant (Migrated from github.com) reviewed 2023-02-02 06:37:35 +00:00
acid-ant (Migrated from github.com) commented 2023-02-02 06:37:35 +00:00

@carpawell see your point. in this case setting of the health status should be under the mutex and we need to exit once the signal trapped and status not equal READY, right?

@carpawell see your point. in this case setting of the health status should be under the mutex and we need to exit once the signal trapped and status not equal READY, right?
acid-ant (Migrated from github.com) reviewed 2023-02-02 06:42:18 +00:00
@ -967,1 +968,3 @@
return
// Storage Engine
var rcfg engine.ReConfiguration
acid-ant (Migrated from github.com) commented 2023-02-02 06:42:17 +00:00

@carpawell we listen here for context because we may want to exit by internal error.

@carpawell we listen here for context because we may want to exit by internal error.
carpawell (Migrated from github.com) reviewed 2023-02-02 07:40:41 +00:00
@ -967,1 +968,3 @@
return
// Storage Engine
var rcfg engine.ReConfiguration
carpawell (Migrated from github.com) commented 2023-02-02 07:40:41 +00:00

yes, and my question is: would it be better to just move listening internal error here too (in the same goroutine)? it would be the only code that is possible to cancel ctx, all the others would just wait for its closing. more straight scheme IMO (if it is possible and i have not missed something)

yes, and my question is: would it be better to just move listening internal error here too (in the same goroutine)? it would be the only code that is possible to cancel ctx, all the others would just wait for its closing. more straight scheme IMO (if it is possible and i have not missed something)
carpawell (Migrated from github.com) reviewed 2023-02-02 07:40:43 +00:00
carpawell (Migrated from github.com) commented 2023-02-02 07:40:43 +00:00

yes, two things bother me:

  1. signal listening moved in the end of the init stage in that PR;
  2. component init could take time and context cancel does not seem to handle that
yes, two things bother me: 1. signal listening moved in the end of the init stage in that PR; 2. component init could take time and context cancel does not seem to handle that
acid-ant (Migrated from github.com) reviewed 2023-02-02 08:12:17 +00:00
acid-ant (Migrated from github.com) commented 2023-02-02 08:12:17 +00:00

@carpawell right, it is impossible to cover that with context, when health state is undefined or starting. I think here we need to decide set o services for graceful shutdown, local storage is the first candidate, try close them and exit. @fyrchik your thoughts on this?

@carpawell right, it is impossible to cover that with context, when health state is undefined or starting. I think here we need to decide set o services for graceful shutdown, local storage is the first candidate, try close them and exit. @fyrchik your thoughts on this?
acid-ant (Migrated from github.com) reviewed 2023-02-02 12:25:01 +00:00
@ -967,1 +968,3 @@
return
// Storage Engine
var rcfg engine.ReConfiguration
acid-ant (Migrated from github.com) commented 2023-02-02 12:25:01 +00:00

Looks better, moved listening for internal errors here too.

Looks better, moved listening for internal errors here too.
acid-ant (Migrated from github.com) reviewed 2023-02-02 12:38:55 +00:00
acid-ant (Migrated from github.com) commented 2023-02-02 12:38:55 +00:00

Sure, replaced with close.

Sure, replaced with close.
carpawell (Migrated from github.com) reviewed 2023-02-06 17:20:53 +00:00
carpawell (Migrated from github.com) commented 2023-02-06 17:20:53 +00:00

i think we can just leave an issue for that and merge that PR

i think we can just leave an issue for that and merge that PR
carpawell (Migrated from github.com) reviewed 2023-02-06 17:20:58 +00:00
fyrchik (Migrated from github.com) reviewed 2023-02-06 18:36:58 +00:00
@ -962,3 +966,2 @@
}
}
components = append(components, dCfg{name: "logger", cfg: logPrm})
fyrchik (Migrated from github.com) commented 2023-02-06 13:46:29 +00:00

Why not just nil? Also, do we need it at all?

Why not just `nil`? Also, do we need it at all?
fyrchik (Migrated from github.com) commented 2023-02-06 18:28:21 +00:00

Why did you decide to have a separate function?

Why did you decide to have a separate function?
@ -967,1 +968,3 @@
return
// Storage Engine
var rcfg engine.ReConfiguration
fyrchik (Migrated from github.com) commented 2023-02-06 18:36:56 +00:00

So can we remove ctx.Done now? The problem I see it that this is not a real "worker": it doesn't use context at all, but instead manages. Why not have this select executed synchronously in wait?

So can we remove `ctx.Done` now? The problem I see it that this is not a real "worker": it doesn't use context at all, but instead _manages_. Why not have this `select` executed synchronously in `wait`?
fyrchik (Migrated from github.com) commented 2023-02-06 18:27:53 +00:00

Could you elaborate, what is the problem, exactly?
Simple mutex in setHealthStatus seems easy to implement if needed.

Could you elaborate, what is the problem, exactly? Simple mutex in `setHealthStatus` seems easy to implement if needed.
fyrchik (Migrated from github.com) commented 2023-02-06 18:16:14 +00:00

Do we need this context?

Do we _need_ this context?
acid-ant (Migrated from github.com) reviewed 2023-02-07 08:20:29 +00:00
@ -962,3 +966,2 @@
}
}
components = append(components, dCfg{name: "logger", cfg: logPrm})
acid-ant (Migrated from github.com) commented 2023-02-07 08:20:29 +00:00

No, we don't need this line at all, removed.

No, we don't need this line at all, removed.
acid-ant (Migrated from github.com) reviewed 2023-02-07 08:20:50 +00:00
acid-ant (Migrated from github.com) commented 2023-02-07 08:20:49 +00:00

In current implementation it is redundant, removed.

In current implementation it is redundant, removed.
acid-ant (Migrated from github.com) reviewed 2023-02-07 08:22:18 +00:00
@ -962,3 +966,2 @@
}
}
components = append(components, dCfg{name: "logger", cfg: logPrm})
acid-ant (Migrated from github.com) commented 2023-02-07 08:22:18 +00:00

Now I see that it is not necessary, thanks. Merged two functions in one.

Now I see that it is not necessary, thanks. Merged two functions in one.
acid-ant (Migrated from github.com) reviewed 2023-02-07 08:26:59 +00:00
acid-ant (Migrated from github.com) commented 2023-02-07 08:26:59 +00:00

What should we do if we stuck at local storage initialization or at service startup - should we try to shutdown normally or we need to try to close limited set of workers and perform os.Exit(1)?

What should we do if we stuck at local storage initialization or at service startup - should we try to shutdown normally or we need to try to close limited set of workers and perform `os.Exit(1)`?
fyrchik (Migrated from github.com) reviewed 2023-02-09 11:54:39 +00:00
fyrchik (Migrated from github.com) commented 2023-02-09 11:54:38 +00:00

Let's wait for now -- it is a feature, this task is about shutdown.
Ideally we would accept context for all operations with storage engine.

Let's wait for now -- it is a feature, this task is about shutdown. Ideally we would accept context for all operations with storage engine.
acid-ant (Migrated from github.com) reviewed 2023-02-09 16:18:33 +00:00
acid-ant (Migrated from github.com) commented 2023-02-09 16:18:32 +00:00

Moved to the top of the initApp(). Created new issue #49 to cover this case.

Moved to the top of the `initApp()`. Created new issue #49 to cover this case.
fyrchik (Migrated from github.com) reviewed 2023-02-10 05:15:14 +00:00
@ -927,1 +929,4 @@
c.log.Info("termination signal processing is complete")
return
}
fyrchik (Migrated from github.com) commented 2023-02-10 05:13:44 +00:00

Actually, it can be problematic handling SIGHUP and SIGTERM in one thread, because SIGHUP handling can take a long time. However, there are mutexes inside of an engine anyway, so we may postpone this until #49.

Actually, it can be problematic handling `SIGHUP` and `SIGTERM` in one thread, because SIGHUP handling can take a long time. However, there are mutexes inside of an engine anyway, so we may postpone this until #49.
@ -59,1 +59,3 @@
c.onShutdown(cli.Close)
c.onShutdown(func() {
c.log.Info("closing morph components...")
cli.Close()
fyrchik (Migrated from github.com) commented 2023-02-10 05:14:58 +00:00

not related to this commit

not related to this commit
acid-ant (Migrated from github.com) reviewed 2023-02-10 06:23:43 +00:00
@ -59,1 +59,3 @@
c.onShutdown(cli.Close)
c.onShutdown(func() {
c.log.Info("closing morph components...")
cli.Close()
acid-ant (Migrated from github.com) commented 2023-02-10 06:23:43 +00:00

Thanks, moved to another.

Thanks, moved to another.
fyrchik (Migrated from github.com) reviewed 2023-02-16 06:38:45 +00:00
fyrchik (Migrated from github.com) approved these changes 2023-02-17 09:12:53 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#34
No description provided.