TrueCloudLab/frostfs-node

Fork 28

Node send bootstrap request every reboot even we are already ONLINE #691

New issue

Open

opened 2023-09-15 13:03:57 +00:00 by anikeev-yadro · 7 comments

anikeev-yadro commented

2023-09-15 13:03:57 +00:00

Member

Related to #516

Expected Behavior

Do not bootstrap if we are already ONLINE.

Current Behavior

Node send bootstrap request every reboot even we are already ONLINE.

Steps to Reproduce (for bugs)

Reboot node muliple times

root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage
Fri Sep 15 12:45:24 UTC 2023

root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage
Fri Sep 15 12:45:40 UTC 2023

root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage
Fri Sep 15 12:46:10 UTC 2023

See bootstrap request every reboot

Sep 15 12:45:35 tatlin-object-failover-node3 frostfs-node[1441817]: 2023-09-15T12:45:35.387Z        info        frostfs-node/config.go:1018        bootstrapping with online state        {"previous": "ONLINE"}
Sep 15 12:45:51 tatlin-object-failover-node3 frostfs-node[1441854]: 2023-09-15T12:45:51.183Z        info        frostfs-node/config.go:1018        bootstrapping with online state        {"previous": "ONLINE"}
Sep 15 12:46:15 tatlin-object-failover-node3 frostfs-node[1441914]: 2023-09-15T12:46:15.380Z        info        frostfs-node/config.go:1018        bootstrapping with online state        {"previous": "ONLINE"}

Regression

Version

FrostFS Storage node
Version: v0.37.0-rc.1-1-g3889e829
GoVersion: go1.20.5

Your Environment

Cloud

Related to #516 ## Expected Behavior Do not bootstrap if we are already ONLINE. ## Current Behavior Node send bootstrap request every reboot even we are already ONLINE. ## Steps to Reproduce (for bugs) 1. Reboot node muliple times ``` root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage Fri Sep 15 12:45:24 UTC 2023 root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage Fri Sep 15 12:45:40 UTC 2023 root@tatlin-object-failover-node3:~# date;systemctl restart frostfs-storage Fri Sep 15 12:46:10 UTC 2023 ``` 2. See bootstrap request every reboot ``` Sep 15 12:45:35 tatlin-object-failover-node3 frostfs-node[1441817]: 2023-09-15T12:45:35.387Z info frostfs-node/config.go:1018 bootstrapping with online state {"previous": "ONLINE"} Sep 15 12:45:51 tatlin-object-failover-node3 frostfs-node[1441854]: 2023-09-15T12:45:51.183Z info frostfs-node/config.go:1018 bootstrapping with online state {"previous": "ONLINE"} Sep 15 12:46:15 tatlin-object-failover-node3 frostfs-node[1441914]: 2023-09-15T12:46:15.380Z info frostfs-node/config.go:1018 bootstrapping with online state {"previous": "ONLINE"} ``` ## Regression No ## Version ``` FrostFS Storage node Version: v0.37.0-rc.1-1-g3889e829 GoVersion: go1.20.5 ``` ## Your Environment Cloud

anikeev-yadro added the

bug

triage

labels 2023-09-15 13:03:57 +00:00

fyrchik added this to the v0.37.0 milestone 2023-09-15 13:55:53 +00:00

fyrchik self-assigned this 2023-09-15 14:30:01 +00:00

fyrchik referenced this issue from a pull request that will close it,

2023-09-15 14:30:30 +00:00

SUPPORT node: Compare node info during initial bootstrap properly #693

fyrchik commented

2023-09-15 14:32:16 +00:00

Owner

It is a complete duplicate of #516, no?

fyrchik added the

frostfs-node

label 2023-09-15 14:34:12 +00:00

anikeev-yadro commented

2023-09-15 14:44:05 +00:00

Author

Member

No, because #516 is an enhancement and I cannot re-open it, but this is bug.

fyrchik added reference support/v0.37

2023-09-15 14:51:01 +00:00

dstepanov-yadro closed this issue

2023-09-18 07:30:17 +00:00

dstepanov-yadro referenced this issue from a commit

2023-09-18 07:30:18 +00:00

[#691] node: Compare node info during initial bootstrap properly

fyrchik reopened this issue

2023-10-06 14:18:09 +00:00

fyrchik modified the milestone from v0.37.0 to v0.38.0

2023-10-06 14:18:14 +00:00

fyrchik removed the

triage

label 2024-01-16 12:55:44 +00:00

fyrchik removed their assignment 2024-01-16 12:55:51 +00:00

fyrchik commented

2024-01-16 12:58:21 +00:00

Owner

The initial optimization proved to be too error-prone:

We must send bootstrap if node attributes have changed.
If we start the service and we see MAINTENANCE state, what do we need to do? If we see OFFLINE?

The initial optimization proved to be too error-prone: 1. We must send bootstrap if node attributes have changed. 2. If we start the service and we see MAINTENANCE state, what do we need to do? If we see OFFLINE?

anikeev-yadro commented

2024-01-17 06:17:51 +00:00

Author

Member

From my POV we need keep the current state. Because if we change state to MAINTENANCE/OFFLINE and node suddenly restarted we expect that state will not change.

2. From my POV we need keep the current state. Because if we change state to MAINTENANCE/OFFLINE and node suddenly restarted we expect that state will not change.

fyrchik commented

2024-01-17 06:35:48 +00:00

Owner

Well, the problem is that "OFFLINE" is actually just a graceful removal, so when we start we must send bootstrap request.

anikeev-yadro commented

2024-01-17 06:40:53 +00:00

Author

Member

When we start why don't we keep OFFLINE state and not send bootstrap request?
Then customer need to manually change the status from OFFLINE to ONLINE.

When we start why don't we keep OFFLINE state and not send bootstrap request? Then customer need to manually change the status from OFFLINE to ONLINE.

fyrchik commented

2024-01-17 08:20:09 +00:00

Owner

Then customer need to manually change the status from OFFLINE to ONLINE.

Do we agree this is a problematic behaviour? The node should start and "just work".

The initial solution was for GAS problems, I believe we could solve THAT first via some other way (like saving last bootstrap epoch in some persistent state?)

>Then customer need to manually change the status from OFFLINE to ONLINE. Do we agree this is a problematic behaviour? The node should start and "just work". The initial solution was for GAS problems, I believe we could solve THAT first via some other way (like saving last bootstrap epoch in some persistent state?)