Node send bootstrap request every reboot even we are already ONLINE #691

Open
opened 2023-09-15 13:03:57 +00:00 by anikeev-yadro · 7 comments
Member

Related to #516

Expected Behavior

Do not bootstrap if we are already ONLINE.

Current Behavior

Node send bootstrap request every reboot even we are already ONLINE.

Steps to Reproduce (for bugs)

  1. Reboot node muliple times
root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage
Fri Sep 15 12:45:24 UTC 2023

root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage
Fri Sep 15 12:45:40 UTC 2023

root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage
Fri Sep 15 12:46:10 UTC 2023 
  1. See bootstrap request every reboot
Sep 15 12:45:35 tatlin-object-failover-node3 frostfs-node[1441817]: 2023-09-15T12:45:35.387Z        info        frostfs-node/config.go:1018        bootstrapping with online state        {"previous": "ONLINE"}
Sep 15 12:45:51 tatlin-object-failover-node3 frostfs-node[1441854]: 2023-09-15T12:45:51.183Z        info        frostfs-node/config.go:1018        bootstrapping with online state        {"previous": "ONLINE"}
Sep 15 12:46:15 tatlin-object-failover-node3 frostfs-node[1441914]: 2023-09-15T12:46:15.380Z        info        frostfs-node/config.go:1018        bootstrapping with online state        {"previous": "ONLINE"}

Regression

No

Version

FrostFS Storage node
Version: v0.37.0-rc.1-1-g3889e829
GoVersion: go1.20.5

Your Environment

Cloud

Related to #516 ## Expected Behavior Do not bootstrap if we are already ONLINE. ## Current Behavior Node send bootstrap request every reboot even we are already ONLINE. ## Steps to Reproduce (for bugs) 1. Reboot node muliple times ``` root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage Fri Sep 15 12:45:24 UTC 2023 root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage Fri Sep 15 12:45:40 UTC 2023 root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage Fri Sep 15 12:46:10 UTC 2023 ``` 2. See bootstrap request every reboot ``` Sep 15 12:45:35 tatlin-object-failover-node3 frostfs-node[1441817]: 2023-09-15T12:45:35.387Z info frostfs-node/config.go:1018 bootstrapping with online state {"previous": "ONLINE"} Sep 15 12:45:51 tatlin-object-failover-node3 frostfs-node[1441854]: 2023-09-15T12:45:51.183Z info frostfs-node/config.go:1018 bootstrapping with online state {"previous": "ONLINE"} Sep 15 12:46:15 tatlin-object-failover-node3 frostfs-node[1441914]: 2023-09-15T12:46:15.380Z info frostfs-node/config.go:1018 bootstrapping with online state {"previous": "ONLINE"} ``` ## Regression No ## Version ``` FrostFS Storage node Version: v0.37.0-rc.1-1-g3889e829 GoVersion: go1.20.5 ``` ## Your Environment Cloud
anikeev-yadro added the
bug
triage
labels 2023-09-15 13:03:57 +00:00
fyrchik added this to the v0.37.0 milestone 2023-09-15 13:55:53 +00:00
fyrchik self-assigned this 2023-09-15 14:30:01 +00:00
Owner

It is a complete duplicate of #516, no?

It is a complete duplicate of #516, no?
fyrchik added the
frostfs-node
label 2023-09-15 14:34:12 +00:00
Author
Member

No, because #516 is an enhancement and I cannot re-open it, but this is bug.

No, because #516 is an enhancement and I cannot re-open it, but this is bug.
fyrchik added reference support/v0.37 2023-09-15 14:51:01 +00:00
fyrchik reopened this issue 2023-10-06 14:18:09 +00:00
fyrchik modified the milestone from v0.37.0 to v0.38.0 2023-10-06 14:18:14 +00:00
fyrchik removed the
triage
label 2024-01-16 12:55:44 +00:00
fyrchik removed their assignment 2024-01-16 12:55:51 +00:00
Owner

The initial optimization proved to be too error-prone:

  1. We must send bootstrap if node attributes have changed.
  2. If we start the service and we see MAINTENANCE state, what do we need to do? If we see OFFLINE?
The initial optimization proved to be too error-prone: 1. We must send bootstrap if node attributes have changed. 2. If we start the service and we see MAINTENANCE state, what do we need to do? If we see OFFLINE?
Author
Member
  1. From my POV we need keep the current state. Because if we change state to MAINTENANCE/OFFLINE and node suddenly restarted we expect that state will not change.
2. From my POV we need keep the current state. Because if we change state to MAINTENANCE/OFFLINE and node suddenly restarted we expect that state will not change.
Owner

Well, the problem is that "OFFLINE" is actually just a graceful removal, so when we start we must send bootstrap request.

Well, the problem is that "OFFLINE" is actually just a graceful removal, so when we start we must send bootstrap request.
Author
Member

When we start why don't we keep OFFLINE state and not send bootstrap request?
Then customer need to manually change the status from OFFLINE to ONLINE.

When we start why don't we keep OFFLINE state and not send bootstrap request? Then customer need to manually change the status from OFFLINE to ONLINE.
Owner

Then customer need to manually change the status from OFFLINE to ONLINE.

Do we agree this is a problematic behaviour? The node should start and "just work".

The initial solution was for GAS problems, I believe we could solve THAT first via some other way (like saving last bootstrap epoch in some persistent state?)

>Then customer need to manually change the status from OFFLINE to ONLINE. Do we agree this is a problematic behaviour? The node should start and "just work". The initial solution was for GAS problems, I believe we could solve THAT first via some other way (like saving last bootstrap epoch in some persistent state?)
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#691
No description provided.