Node send bootstrap request every reboot even we are already ONLINE #691

Open
opened 2023-09-15 13:03:57 +00:00 by anikeev-yadro · 7 comments

Related to #516

Expected Behavior

Do not bootstrap if we are already ONLINE.

Current Behavior

Node send bootstrap request every reboot even we are already ONLINE.

Steps to Reproduce (for bugs)

  1. Reboot node muliple times
root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage
Fri Sep 15 12:45:24 UTC 2023

root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage
Fri Sep 15 12:45:40 UTC 2023

root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage
Fri Sep 15 12:46:10 UTC 2023 
  1. See bootstrap request every reboot
Sep 15 12:45:35 tatlin-object-failover-node3 frostfs-node[1441817]: 2023-09-15T12:45:35.387Z        info        frostfs-node/config.go:1018        bootstrapping with online state        {"previous": "ONLINE"}
Sep 15 12:45:51 tatlin-object-failover-node3 frostfs-node[1441854]: 2023-09-15T12:45:51.183Z        info        frostfs-node/config.go:1018        bootstrapping with online state        {"previous": "ONLINE"}
Sep 15 12:46:15 tatlin-object-failover-node3 frostfs-node[1441914]: 2023-09-15T12:46:15.380Z        info        frostfs-node/config.go:1018        bootstrapping with online state        {"previous": "ONLINE"}

Regression

No

Version

FrostFS Storage node
Version: v0.37.0-rc.1-1-g3889e829
GoVersion: go1.20.5

Your Environment

Cloud

Related to #516 ## Expected Behavior Do not bootstrap if we are already ONLINE. ## Current Behavior Node send bootstrap request every reboot even we are already ONLINE. ## Steps to Reproduce (for bugs) 1. Reboot node muliple times ``` root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage Fri Sep 15 12:45:24 UTC 2023 root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage Fri Sep 15 12:45:40 UTC 2023 root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage Fri Sep 15 12:46:10 UTC 2023 ``` 2. See bootstrap request every reboot ``` Sep 15 12:45:35 tatlin-object-failover-node3 frostfs-node[1441817]: 2023-09-15T12:45:35.387Z info frostfs-node/config.go:1018 bootstrapping with online state {"previous": "ONLINE"} Sep 15 12:45:51 tatlin-object-failover-node3 frostfs-node[1441854]: 2023-09-15T12:45:51.183Z info frostfs-node/config.go:1018 bootstrapping with online state {"previous": "ONLINE"} Sep 15 12:46:15 tatlin-object-failover-node3 frostfs-node[1441914]: 2023-09-15T12:46:15.380Z info frostfs-node/config.go:1018 bootstrapping with online state {"previous": "ONLINE"} ``` ## Regression No ## Version ``` FrostFS Storage node Version: v0.37.0-rc.1-1-g3889e829 GoVersion: go1.20.5 ``` ## Your Environment Cloud
anikeev-yadro added the
bug
triage
labels 2023-09-15 13:03:57 +00:00
fyrchik added this to the v0.37.0 milestone 2023-09-15 13:55:53 +00:00
fyrchik self-assigned this 2023-09-15 14:30:01 +00:00

It is a complete duplicate of #516, no?

It is a complete duplicate of #516, no?
fyrchik added the
frostfs-node
label 2023-09-15 14:34:12 +00:00

No, because #516 is an enhancement and I cannot re-open it, but this is bug.

No, because #516 is an enhancement and I cannot re-open it, but this is bug.
fyrchik added reference support/v0.37 2023-09-15 14:51:01 +00:00
fyrchik reopened this issue 2023-10-06 14:18:09 +00:00
fyrchik modified the milestone from v0.37.0 to v0.38.0 2023-10-06 14:18:14 +00:00
fyrchik removed the
triage
label 2024-01-16 12:55:44 +00:00
fyrchik removed their assignment 2024-01-16 12:55:51 +00:00

The initial optimization proved to be too error-prone:

  1. We must send bootstrap if node attributes have changed.
  2. If we start the service and we see MAINTENANCE state, what do we need to do? If we see OFFLINE?
The initial optimization proved to be too error-prone: 1. We must send bootstrap if node attributes have changed. 2. If we start the service and we see MAINTENANCE state, what do we need to do? If we see OFFLINE?
  1. From my POV we need keep the current state. Because if we change state to MAINTENANCE/OFFLINE and node suddenly restarted we expect that state will not change.
2. From my POV we need keep the current state. Because if we change state to MAINTENANCE/OFFLINE and node suddenly restarted we expect that state will not change.

Well, the problem is that "OFFLINE" is actually just a graceful removal, so when we start we must send bootstrap request.

Well, the problem is that "OFFLINE" is actually just a graceful removal, so when we start we must send bootstrap request.

When we start why don't we keep OFFLINE state and not send bootstrap request?
Then customer need to manually change the status from OFFLINE to ONLINE.

When we start why don't we keep OFFLINE state and not send bootstrap request? Then customer need to manually change the status from OFFLINE to ONLINE.

Then customer need to manually change the status from OFFLINE to ONLINE.

Do we agree this is a problematic behaviour? The node should start and "just work".

The initial solution was for GAS problems, I believe we could solve THAT first via some other way (like saving last bootstrap epoch in some persistent state?)

>Then customer need to manually change the status from OFFLINE to ONLINE. Do we agree this is a problematic behaviour? The node should start and "just work". The initial solution was for GAS problems, I believe we could solve THAT first via some other way (like saving last bootstrap epoch in some persistent state?)
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#691
There is no content yet.