systemd notify protocol #810

Merged
fyrchik merged 5 commits from elebedeva/frostfs-node:sd-notify into master 2024-09-04 19:51:04 +00:00
Member

Close #552

example of displayed status in frostfs-storage.service:

service@elebedeva-node3:~$ systemctl status frostfs-storage
● frostfs-storage.service - FrostFS Storage node
     Loaded: loaded (/lib/systemd/system/frostfs-storage.service; enabled; preset: enabled)
     Active: active (running) since Tue 2023-12-05 10:26:56 UTC; 12min ago
   Main PID: 398257 (frostfs-node)
     Status: "READY"
      Tasks: 14 (limit: 9492)
     Memory: 46.8M
        CPU: 6.692s
     CGroup: /system.slice/frostfs-storage.service
             └─398257 /usr/bin/frostfs-node --config /etc/frostfs/storage/config.yml --config-dir /etc/frostfs/storage/conf.d

example of displayed status in frostfs-ir.service:

service@elebedeva-node3:~$ systemctl status frostfs-ir
● frostfs-ir.service - frostfs InnerRing node
     Loaded: loaded (/lib/systemd/system/frostfs-ir.service; enabled; preset: enabled)
     Active: active (running) since Tue 2023-12-05 14:05:09 UTC; 4s ago
   Main PID: 400903 (frostfs-ir)
     Status: "READY"
      Tasks: 9 (limit: 9492)
     Memory: 25.5M
        CPU: 405ms
     CGroup: /system.slice/frostfs-ir.service
             └─400903 /usr/bin/frostfs-ir --config /etc/frostfs/ir/config.yml -config-dir /etc/frostfs/ir/conf.d

Signed-off-by: Ekaterina Lebedeva ekaterina.lebedeva@yadro.com

Close #552 example of displayed status in ```frostfs-storage.service```: ``` service@elebedeva-node3:~$ systemctl status frostfs-storage ● frostfs-storage.service - FrostFS Storage node Loaded: loaded (/lib/systemd/system/frostfs-storage.service; enabled; preset: enabled) Active: active (running) since Tue 2023-12-05 10:26:56 UTC; 12min ago Main PID: 398257 (frostfs-node) Status: "READY" Tasks: 14 (limit: 9492) Memory: 46.8M CPU: 6.692s CGroup: /system.slice/frostfs-storage.service └─398257 /usr/bin/frostfs-node --config /etc/frostfs/storage/config.yml --config-dir /etc/frostfs/storage/conf.d ``` example of displayed status in ```frostfs-ir.service```: ``` service@elebedeva-node3:~$ systemctl status frostfs-ir ● frostfs-ir.service - frostfs InnerRing node Loaded: loaded (/lib/systemd/system/frostfs-ir.service; enabled; preset: enabled) Active: active (running) since Tue 2023-12-05 14:05:09 UTC; 4s ago Main PID: 400903 (frostfs-ir) Status: "READY" Tasks: 9 (limit: 9492) Memory: 25.5M CPU: 405ms CGroup: /system.slice/frostfs-ir.service └─400903 /usr/bin/frostfs-ir --config /etc/frostfs/ir/config.yml -config-dir /etc/frostfs/ir/conf.d ``` Signed-off-by: Ekaterina Lebedeva <ekaterina.lebedeva@yadro.com>
fyrchik reviewed 2023-11-16 08:56:11 +00:00
fyrchik left a comment
Owner

LGTM, 2 questions:

  1. Can you provide an example output from systemctl status frostfs-storage?
  2. What testing was done here?
LGTM, 2 questions: 1. Can you provide an example output from `systemctl status frostfs-storage`? 2. What testing was done here?
@ -82,12 +84,20 @@ func (c *cfg) NetmapStatus() control.NetmapStatus {
}
func (c *cfg) setHealthStatus(st control.HealthStatus) {
err := sysdnotify.Status(fmt.Sprintf("%v, %v", st, control.HealthStatus_name[int32(st)]))
Owner

Can we move fmt.Sprintf to a separate function which accepts st? It can later be extended to accept any string and there is less opportunities to make an error somewhere.

Can we move `fmt.Sprintf` to a separate function which accepts `st`? It can later be extended to accept any string and there is less opportunities to make an error somewhere.
Owner

Logging can also be done inside this function.

Logging can also be done inside this function.
Author
Member

done

done
fyrchik marked this conversation as resolved
@ -88,3 +94,4 @@
func (c *cfg) compareAndSwapHealthStatus(oldSt, newSt control.HealthStatus) (swapped bool) {
if swapped = c.healthStatus.CompareAndSwap(int32(oldSt), int32(newSt)); swapped {
err := sysdnotify.Status(fmt.Sprintf("%v, %v", newSt, control.HealthStatus_name[int32(newSt)]))
Owner

What happens here is sdnotify is disabled?

What happens here is `sdnotify` is disabled?
Author
Member

sysdnotify.Status() returns an error connection refused, and this error will be logged.

```sysdnotify.Status()``` returns an error ```connection refused```, and this error will be logged.
elebedeva force-pushed sd-notify from a0509535bd to 8c7a181a82 2023-11-20 15:48:45 +00:00 Compare
elebedeva force-pushed sd-notify from 8c7a181a82 to e6188cbaa1 2023-11-20 16:10:38 +00:00 Compare
fyrchik reviewed 2023-11-23 08:59:33 +00:00
@ -106,0 +114,4 @@
status := fmt.Sprintf("%v, %v", st.Number(), st)
err := sdnotify.Status(status)
if err == nil {
c.log.Info(fmt.Sprintf("reported STATUS=\"%v\" to systemd", status))
Owner

It is debug log? I believe this is not worth logging especially with info.

It is debug log? I believe this is not worth logging especially with `info`.
Author
Member

removed

removed
fyrchik marked this conversation as resolved
@ -0,0 +18,4 @@
func Send(state string) error {
if socket == nil {
socket = &net.UnixAddr{
Name: os.Getenv("NOTIFY_SOCKET"),
Owner

We should take it from config, not from env
How about systemdnotify.socket?

We should take it from config, not from env How about `systemdnotify.socket`?
Author
Member

systemd can only pass socket to service through an environmental variable

```systemd``` can only pass socket to service through an environmental variable
fyrchik marked this conversation as resolved
elebedeva force-pushed sd-notify from 3644466a86 to 9cfc69cd3c 2023-12-04 16:15:15 +00:00 Compare
elebedeva force-pushed sd-notify from 9cfc69cd3c to 1f14ca473e 2023-12-05 10:44:29 +00:00 Compare
elebedeva changed title from WIP: systemd notify protocol to systemd notify protocol 2023-12-05 14:08:39 +00:00
requested reviews from storage-core-committers, storage-core-developers 2023-12-05 14:09:09 +00:00
acid-ant reviewed 2023-12-06 07:33:31 +00:00
@ -354,2 +355,4 @@
// is node under maintenance
isMaintenance atomic.Bool
sdNotify atomic.Bool
Member

Why atomic.Bool needed here?

Why `atomic.Bool` needed here?
Author
Member

it is not needed, replaced with just bool

it is not needed, replaced with just ```bool```
@ -106,0 +113,4 @@
if c.sdNotify.Load() {
status := fmt.Sprintf("%v", st)
err := sdnotify.Status(status)
if err != nil {
Member

How about merging above three lines in one?

How about merging above three lines in one?
@ -58,6 +58,7 @@ type (
netmapClient *nmClient.Client
persistate *state.PersistentStorage
containerClient *container.Client
sdNotify atomic.Bool
Member

Why atomic.Bool needed here?

Why `atomic.Bool` needed here?
Author
Member

it is not needed, replaced with just bool

it is not needed, replaced with just ```bool```
@ -0,0 +11,4 @@
// Initializes socket with provided name of
// environment variable.
func InitSocket() error {
notifySocket := os.Getenv("NOTIFY_SOCKET")
Member

Why socket name configured only via env?

Why socket name configured only via env?
Owner

This is how systemd communicates socket path to our service.

This is how systemd communicates socket path to our service.
@ -0,0 +40,4 @@
func Send(state string) error {
if socket == nil {
socket = &net.UnixAddr{
Name: os.Getenv("NOTIFY_SOCKET"),
Member

Why you are skipping check for NOTIFY_SOCKET here?

Why you are skipping check for `NOTIFY_SOCKET` here?
Author
Member

it was done for testing purposes only
replaced with an error

it was done for testing purposes only replaced with an error
dstepanov-yadro requested changes 2023-12-06 09:24:59 +00:00
@ -176,0 +181,4 @@
status := fmt.Sprintf("%v", st)
err := sdnotify.Status(status)
if err != nil {
s.log.Error(logs.FailedToReportStatusToSystemd, zap.String("error", err.Error()))

zap.String("error", err.Error()) -> zap.Error(err)

`zap.String("error", err.Error())` -> `zap.Error(err)`
Author
Member

fixed

fixed
dstepanov-yadro marked this conversation as resolved
fyrchik reviewed 2023-12-06 12:33:42 +00:00
@ -176,0 +178,4 @@
func (s *Server) notifySystemd(st control.HealthStatus) {
if s.sdNotify.Load() {
status := fmt.Sprintf("%v", st)
Owner

Why do we use strings here? Maybe move the switch from sdnotify.Status to here? This way it would be immediately obvious what do we send.

Why do we use strings here? Maybe move the switch from `sdnotify.Status` to here? This way it would be immediately obvious what do we send.
Author
Member

fixed

fixed
fyrchik marked this conversation as resolved
@ -0,0 +26,4 @@
func Status(status string) error {
switch status {
case "READY":
_ = Send("READY=1")
Owner

Why is the error ignored?

Why is the error ignored?
Author
Member

moved this switch to the frostfs-node control.go and to frostfs-ir state.go, where this error may be logged

moved this switch to the frostfs-node control.go and to frostfs-ir state.go, where this error may be logged
Owner

It seems the switch is still here.

It seems the switch is still here.
Author
Member

fixed

fixed
@ -0,0 +32,4 @@
case "RECONFIGURING":
_ = Send("RELOADING=1")
}
return Send(fmt.Sprintf("STATUS=%s", status))
Owner

We do 2 sends in this function, can they be combined in a single message?

We do 2 sends in this function, can they be combined in a single message?
Author
Member

Messages like "STATUS=%s" allow us to see them in systemctl status output, while messages like "READY=1" are actual notifications to systemd. If we don't need to see status through systemctl status, then second status report can be removed.

Messages like "STATUS=%s" allow us to see them in ```systemctl status``` output, while messages like "READY=1" are actual notifications to systemd. If we don't need to see status through ```systemctl status```, then second status report can be removed.
Owner

From here https://www.man7.org/linux/man-pages/man3/sd_notify.3.html :

The state parameter should contain a newline-separated list of variable assignments, similar in style to an environment block.

So will Send(fmt.Sprintf("RELOADING=1\nSTATUS=%s", status)) work? It is not a big problem, but it seems both variables are better changed together.

From here https://www.man7.org/linux/man-pages/man3/sd_notify.3.html : >The state parameter should contain a newline-separated list of variable assignments, similar in style to an environment block. So will `Send(fmt.Sprintf("RELOADING=1\nSTATUS=%s", status))` work? It is not a big problem, but it seems both variables are better changed together.
Author
Member

added a function that combines two notifications

added a function that combines two notifications
elebedeva force-pushed sd-notify from 8b1925c195 to b113b206c2 2023-12-11 15:48:54 +00:00 Compare
elebedeva force-pushed sd-notify from b113b206c2 to 8d3811f7cf 2023-12-11 16:01:04 +00:00 Compare
aarifullin reviewed 2023-12-12 10:14:47 +00:00
@ -106,0 +114,4 @@
var err error
switch st {
case control.HealthStatus_READY:
err = sdnotify.Send("READY=1")
Member

My point is optional, but how about to introduce constants like

const (
   SystemdReadyEnabled      = "READY=1"
   SystemdStoppingEnabled   = "STOPPING=1"
   SystemdReloadingEnabled  = "RELOADING=1"
)

and replace these strings by the constants?

My point is optional, but how about to introduce constants like ```go const ( SystemdReadyEnabled = "READY=1" SystemdStoppingEnabled = "STOPPING=1" SystemdReloadingEnabled = "RELOADING=1" ) ``` and replace these strings by the constants?
Author
Member

added constants

added constants
aarifullin marked this conversation as resolved
acid-ant approved these changes 2023-12-12 10:46:14 +00:00
@ -636,2 +640,4 @@
}
func initSdNotify(appCfg *config.Config) bool {
enabled := false
Member

Do we need this variable?

Do we need this variable?
Author
Member

no, removed

no, removed
acid-ant marked this conversation as resolved
@ -638,0 +644,4 @@
if config.BoolSafe(appCfg.Sub("systemdnotify"), "enabled") {
enabled = true
err := sdnotify.InitSocket()
fatalOnErr(err)
Member

Can you merge above two lines?

Can you merge above two lines?
acid-ant marked this conversation as resolved
@ -407,0 +417,4 @@
if enabled {
err = sdnotify.InitSocket()
}
return enabled, err
Member

How about this:

	if cfg.GetBool("systemdnotify.enabled") {
		return true, sdnotify.InitSocket()
	}
	return false, nil
How about this: ``` if cfg.GetBool("systemdnotify.enabled") { return true, sdnotify.InitSocket() } return false, nil ```
Author
Member

fixed

fixed
acid-ant marked this conversation as resolved
elebedeva force-pushed sd-notify from 8d3811f7cf to 5ff4dfec76 2023-12-12 12:42:41 +00:00 Compare
elebedeva force-pushed sd-notify from 5ff4dfec76 to 2f3a5534ec 2023-12-12 12:54:40 +00:00 Compare
requested reviews from dstepanov-yadro, acid-ant 2023-12-12 12:57:09 +00:00
dstepanov-yadro reviewed 2023-12-12 15:13:41 +00:00
@ -105,1 +110,4 @@
}
func (c *cfg) notifySystemd(st control.HealthStatus) {
if c.sdNotify {

This way allows to drop indent:

if !c.sdNotify {
     return
}
var err error

Not required to fix, just FYI.

This way allows to drop indent: ``` if !c.sdNotify { return } var err error ``` Not required to fix, just FYI.
dstepanov-yadro approved these changes 2023-12-12 15:18:46 +00:00
elebedeva force-pushed sd-notify from 2f3a5534ec to daa6ddbf1a 2023-12-13 11:08:00 +00:00 Compare
elebedeva force-pushed sd-notify from daa6ddbf1a to 2d9fbbd008 2023-12-13 12:03:01 +00:00 Compare
requested review from dstepanov-yadro 2023-12-13 12:24:16 +00:00
dstepanov-yadro approved these changes 2023-12-13 13:05:40 +00:00
aarifullin approved these changes 2023-12-13 14:39:11 +00:00
aarifullin left a comment
Member

LGTM

LGTM
elebedeva force-pushed sd-notify from 2d9fbbd008 to 2d4c0a0f4a 2023-12-13 14:52:29 +00:00 Compare
dstepanov-yadro approved these changes 2023-12-13 15:03:58 +00:00
fyrchik approved these changes 2023-12-13 15:55:37 +00:00
fyrchik merged commit 2d4c0a0f4a into master 2023-12-13 15:55:45 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#810
No description provided.