Reload operation fails with timeout #1135

Closed
opened 2024-05-15 09:26:29 +00:00 by an-nikitin · 5 comments

Expected Behavior

Reload operation notifies systemd upon completion.

Current Behavior

Reload operation completes but does not notify systemd, which causes systemd to wait until it reaches timeout, after which the reload is reported as failed:

systemd[1]: frostfs-storage.service: Reload operation timed out. Killing reload process.
systemd[1]: Reload failed for FrostFS Storage node.

Possible Solution

No fix can be suggested by a QA engineer. Further solutions shall be up to developers.

Steps to Reproduce (for bugs)

  1. Change frostfs-node service type to notify-reload and add systemdnotify.enabled: true to its configuration.
  2. Do systemctl reload frostfs-storage.

Context

This problem prevents using "notify-reload" as the service type.

Regression

No.

Your Environment

Tatlin.Object cluster node, version v1.6.0-4 (CYP v4.1.0).

## Expected Behavior Reload operation notifies systemd upon completion. ## Current Behavior Reload operation completes but does not notify systemd, which causes systemd to wait until it reaches timeout, after which the reload is reported as failed: ``` systemd[1]: frostfs-storage.service: Reload operation timed out. Killing reload process. systemd[1]: Reload failed for FrostFS Storage node. ``` ## Possible Solution No fix can be suggested by a QA engineer. Further solutions shall be up to developers. ## Steps to Reproduce (for bugs) 1. Change frostfs-node service type to notify-reload and add `systemdnotify.enabled: true` to its configuration. 2. Do `systemctl reload frostfs-storage`. ## Context This problem prevents using "notify-reload" as the service type. ## Regression No. ## Your Environment Tatlin.Object cluster node, version v1.6.0-4 (CYP v4.1.0).
an-nikitin added the
bug
triage
labels 2024-05-15 09:26:29 +00:00
Owner

When sending an update directly, it doesn't work too:

$ sudo NOTIFY_SOCKET=/run/systemd/notify systemd-notify --ready --status="1234" --uid=frostfs-storage --pid=410548
$ systemctl status frostfs-storage
↻ frostfs-storage.service - FrostFS Storage node
     Loaded: loaded (/usr/lib/systemd/system/frostfs-storage.service; enabled; preset: disabled)
     Active: reloading (reload-signal) since Tue 2024-05-14 14:16:41 UTC; 19h ago
   Main PID: 410548 (frostfs-node)
     Status: "1234"
      Tasks: 9 (limit: 19200)
     Memory: 44.4M
        CPU: 26min 19.551s
     CGroup: /system.slice/frostfs-storage.service
             └─410548 /usr/bin/frostfs-node --config /etc/frostfs/storage/config.yml --config-dir /etc/frostfs/storage/conf.d

I have also checked that metric was updated, and we send a message to systemd before we update the metric.

So it is reasonable to suggest this is a bug in the systemd itself.

@elebedeva on a side-note, I haven't found status updates for reloading configuration for the frostfs-ir, have we done them?

When sending an update directly, it doesn't work too: ``` $ sudo NOTIFY_SOCKET=/run/systemd/notify systemd-notify --ready --status="1234" --uid=frostfs-storage --pid=410548 $ systemctl status frostfs-storage ↻ frostfs-storage.service - FrostFS Storage node Loaded: loaded (/usr/lib/systemd/system/frostfs-storage.service; enabled; preset: disabled) Active: reloading (reload-signal) since Tue 2024-05-14 14:16:41 UTC; 19h ago Main PID: 410548 (frostfs-node) Status: "1234" Tasks: 9 (limit: 19200) Memory: 44.4M CPU: 26min 19.551s CGroup: /system.slice/frostfs-storage.service └─410548 /usr/bin/frostfs-node --config /etc/frostfs/storage/config.yml --config-dir /etc/frostfs/storage/conf.d ``` I have also checked that metric was updated, and we send a message to systemd before we update the metric. So it is reasonable to suggest this is a bug in the systemd itself. @elebedeva on a side-note, I haven't found status updates for reloading configuration for the frostfs-ir, have we done them?
elebedeva was assigned by fyrchik 2024-05-15 09:30:45 +00:00
fyrchik added this to the v0.40.0 milestone 2024-05-15 09:30:47 +00:00
Owner
$ systemctl --version
systemd 254 (254)
+PAM -AUDIT -SELINUX -APPARMOR -IMA -SMACK -SECCOMP +GCRYPT -GNUTLS -OPENSSL -ACL +BLKID +CURL -ELFUTILS -FIDO2 -IDN2 -IDN -IPTC +KMOD -LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE -TPM2 -BZIP2 -LZ4 -XZ -ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified
``` $ systemctl --version systemd 254 (254) +PAM -AUDIT -SELINUX -APPARMOR -IMA -SMACK -SECCOMP +GCRYPT -GNUTLS -OPENSSL -ACL +BLKID +CURL -ELFUTILS -FIDO2 -IDN2 -IDN -IPTC +KMOD -LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE -TPM2 -BZIP2 -LZ4 -XZ -ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified ```
Owner

If I run --reloading and --ready transition completely with systemd-notify everything works.
If reload is done with the systemctl reload, then readiness is not updated neither via systemd-notify neither via frostfs-storage

If I run `--reloading` and `--ready` transition completely with `systemd-notify` everything works. If reload is done with the `systemctl reload`, then readiness is not updated neither via `systemd-notify` neither via `frostfs-storage`
fyrchik added
frostfs-node
and removed
triage
labels 2024-05-15 09:35:44 +00:00
Owner

The problem may be that we do not send MONOTONIC_USEC:
https://www.man7.org/linux/man-pages/man3/sd_notify.3.html

Upstream systemd-notify does:
https://github.com/systemd/systemd/blob/main/src/notify/notify.c#L351

The problem may be that we do not send `MONOTONIC_USEC`: https://www.man7.org/linux/man-pages/man3/sd_notify.3.html Upstream systemd-notify does: https://github.com/systemd/systemd/blob/main/src/notify/notify.c#L351
fyrchik self-assigned this 2024-05-15 09:46:35 +00:00
elebedeva was unassigned by fyrchik 2024-05-15 09:46:40 +00:00
Owner

Refs #552

Refs #552
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1135
No description provided.