Prevent process from killing by systemd when shutting down #34
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#34
Loading…
Reference in a new issue
No description provided.
Delete branch "bugfix/1465-fix-node-shutdown"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Signed-off-by: Anton Nikiforov an.nikiforov@yadro.com
what if initialization is taking too long but no os interuption signals are listened yet? however, that mb not critical at all
/cc @fyrchik
@ -967,1 +968,3 @@
return
// Storage Engine
var rcfg engine.ReConfiguration
why do we listen to
ctx
here? i would close cancel func withterminateWatcher
and do not share it viacfg
at allalso, it would save us from double
c.setHealthStatus(control.HealthStatus_SHUTTING_DOWN)
i guessis it possible to just close the chan here? do not remember why that was done that way but now closing looks better to me
@carpawell see your point. in this case setting of the health status should be under the mutex and we need to exit once the signal trapped and status not equal READY, right?
@ -967,1 +968,3 @@
return
// Storage Engine
var rcfg engine.ReConfiguration
@carpawell we listen here for context because we may want to exit by internal error.
@ -967,1 +968,3 @@
return
// Storage Engine
var rcfg engine.ReConfiguration
yes, and my question is: would it be better to just move listening internal error here too (in the same goroutine)? it would be the only code that is possible to cancel ctx, all the others would just wait for its closing. more straight scheme IMO (if it is possible and i have not missed something)
yes, two things bother me:
@carpawell right, it is impossible to cover that with context, when health state is undefined or starting. I think here we need to decide set o services for graceful shutdown, local storage is the first candidate, try close them and exit. @fyrchik your thoughts on this?
@ -967,1 +968,3 @@
return
// Storage Engine
var rcfg engine.ReConfiguration
Looks better, moved listening for internal errors here too.
Sure, replaced with close.
i think we can just leave an issue for that and merge that PR
@ -962,3 +966,2 @@
}
}
components = append(components, dCfg{name: "logger", cfg: logPrm})
Why not just
nil
? Also, do we need it at all?Why did you decide to have a separate function?
@ -967,1 +968,3 @@
return
// Storage Engine
var rcfg engine.ReConfiguration
So can we remove
ctx.Done
now? The problem I see it that this is not a real "worker": it doesn't use context at all, but instead manages. Why not have thisselect
executed synchronously inwait
?Could you elaborate, what is the problem, exactly?
Simple mutex in
setHealthStatus
seems easy to implement if needed.Do we need this context?
@ -962,3 +966,2 @@
}
}
components = append(components, dCfg{name: "logger", cfg: logPrm})
No, we don't need this line at all, removed.
In current implementation it is redundant, removed.
@ -962,3 +966,2 @@
}
}
components = append(components, dCfg{name: "logger", cfg: logPrm})
Now I see that it is not necessary, thanks. Merged two functions in one.
What should we do if we stuck at local storage initialization or at service startup - should we try to shutdown normally or we need to try to close limited set of workers and perform
os.Exit(1)
?Let's wait for now -- it is a feature, this task is about shutdown.
Ideally we would accept context for all operations with storage engine.
Moved to the top of the
initApp()
. Created new issue #49 to cover this case.@ -927,1 +929,4 @@
c.log.Info("termination signal processing is complete")
return
}
Actually, it can be problematic handling
SIGHUP
andSIGTERM
in one thread, because SIGHUP handling can take a long time. However, there are mutexes inside of an engine anyway, so we may postpone this until #49.@ -59,1 +59,3 @@
c.onShutdown(cli.Close)
c.onShutdown(func() {
c.log.Info("closing morph components...")
cli.Close()
not related to this commit
@ -59,1 +59,3 @@
c.onShutdown(cli.Close)
c.onShutdown(func() {
c.log.Info("closing morph components...")
cli.Close()
Thanks, moved to another.