systemd notify protocol #810
No reviewers
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
5 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#810
Loading…
Add table
Reference in a new issue
No description provided.
Delete branch "elebedeva/frostfs-node:sd-notify"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Close #552
example of displayed status in
frostfs-storage.service
:example of displayed status in
frostfs-ir.service
:Signed-off-by: Ekaterina Lebedeva ekaterina.lebedeva@yadro.com
LGTM, 2 questions:
systemctl status frostfs-storage
?@ -82,12 +84,20 @@ func (c *cfg) NetmapStatus() control.NetmapStatus {
}
func (c *cfg) setHealthStatus(st control.HealthStatus) {
err := sysdnotify.Status(fmt.Sprintf("%v, %v", st, control.HealthStatus_name[int32(st)]))
Can we move
fmt.Sprintf
to a separate function which acceptsst
? It can later be extended to accept any string and there is less opportunities to make an error somewhere.Logging can also be done inside this function.
done
@ -88,3 +94,4 @@
func (c *cfg) compareAndSwapHealthStatus(oldSt, newSt control.HealthStatus) (swapped bool) {
if swapped = c.healthStatus.CompareAndSwap(int32(oldSt), int32(newSt)); swapped {
err := sysdnotify.Status(fmt.Sprintf("%v, %v", newSt, control.HealthStatus_name[int32(newSt)]))
What happens here is
sdnotify
is disabled?sysdnotify.Status()
returns an errorconnection refused
, and this error will be logged.a0509535bd
to8c7a181a82
8c7a181a82
toe6188cbaa1
@ -106,0 +114,4 @@
status := fmt.Sprintf("%v, %v", st.Number(), st)
err := sdnotify.Status(status)
if err == nil {
c.log.Info(fmt.Sprintf("reported STATUS=\"%v\" to systemd", status))
It is debug log? I believe this is not worth logging especially with
info
.removed
@ -0,0 +18,4 @@
func Send(state string) error {
if socket == nil {
socket = &net.UnixAddr{
Name: os.Getenv("NOTIFY_SOCKET"),
We should take it from config, not from env
How about
systemdnotify.socket
?systemd
can only pass socket to service through an environmental variable3644466a86
to9cfc69cd3c
9cfc69cd3c
to1f14ca473e
WIP: systemd notify protocolto systemd notify protocol@ -354,2 +355,4 @@
// is node under maintenance
isMaintenance atomic.Bool
sdNotify atomic.Bool
Why
atomic.Bool
needed here?it is not needed, replaced with just
bool
@ -106,0 +113,4 @@
if c.sdNotify.Load() {
status := fmt.Sprintf("%v", st)
err := sdnotify.Status(status)
if err != nil {
How about merging above three lines in one?
@ -58,6 +58,7 @@ type (
netmapClient *nmClient.Client
persistate *state.PersistentStorage
containerClient *container.Client
sdNotify atomic.Bool
Why
atomic.Bool
needed here?it is not needed, replaced with just
bool
@ -0,0 +11,4 @@
// Initializes socket with provided name of
// environment variable.
func InitSocket() error {
notifySocket := os.Getenv("NOTIFY_SOCKET")
Why socket name configured only via env?
This is how systemd communicates socket path to our service.
@ -0,0 +40,4 @@
func Send(state string) error {
if socket == nil {
socket = &net.UnixAddr{
Name: os.Getenv("NOTIFY_SOCKET"),
Why you are skipping check for
NOTIFY_SOCKET
here?it was done for testing purposes only
replaced with an error
@ -176,0 +181,4 @@
status := fmt.Sprintf("%v", st)
err := sdnotify.Status(status)
if err != nil {
s.log.Error(logs.FailedToReportStatusToSystemd, zap.String("error", err.Error()))
zap.String("error", err.Error())
->zap.Error(err)
fixed
@ -176,0 +178,4 @@
func (s *Server) notifySystemd(st control.HealthStatus) {
if s.sdNotify.Load() {
status := fmt.Sprintf("%v", st)
Why do we use strings here? Maybe move the switch from
sdnotify.Status
to here? This way it would be immediately obvious what do we send.fixed
@ -0,0 +26,4 @@
func Status(status string) error {
switch status {
case "READY":
_ = Send("READY=1")
Why is the error ignored?
moved this switch to the frostfs-node control.go and to frostfs-ir state.go, where this error may be logged
It seems the switch is still here.
fixed
@ -0,0 +32,4 @@
case "RECONFIGURING":
_ = Send("RELOADING=1")
}
return Send(fmt.Sprintf("STATUS=%s", status))
We do 2 sends in this function, can they be combined in a single message?
Messages like "STATUS=%s" allow us to see them in
systemctl status
output, while messages like "READY=1" are actual notifications to systemd. If we don't need to see status throughsystemctl status
, then second status report can be removed.From here https://www.man7.org/linux/man-pages/man3/sd_notify.3.html :
So will
Send(fmt.Sprintf("RELOADING=1\nSTATUS=%s", status))
work? It is not a big problem, but it seems both variables are better changed together.added a function that combines two notifications
8b1925c195
tob113b206c2
b113b206c2
to8d3811f7cf
@ -106,0 +114,4 @@
var err error
switch st {
case control.HealthStatus_READY:
err = sdnotify.Send("READY=1")
My point is optional, but how about to introduce constants like
and replace these strings by the constants?
added constants
@ -636,2 +640,4 @@
}
func initSdNotify(appCfg *config.Config) bool {
enabled := false
Do we need this variable?
no, removed
@ -638,0 +644,4 @@
if config.BoolSafe(appCfg.Sub("systemdnotify"), "enabled") {
enabled = true
err := sdnotify.InitSocket()
fatalOnErr(err)
Can you merge above two lines?
@ -407,0 +417,4 @@
if enabled {
err = sdnotify.InitSocket()
}
return enabled, err
How about this:
fixed
8d3811f7cf
to5ff4dfec76
5ff4dfec76
to2f3a5534ec
@ -105,1 +110,4 @@
}
func (c *cfg) notifySystemd(st control.HealthStatus) {
if c.sdNotify {
This way allows to drop indent:
Not required to fix, just FYI.
2f3a5534ec
todaa6ddbf1a
daa6ddbf1a
to2d9fbbd008
LGTM
2d9fbbd008
to2d4c0a0f4a