pool: Do not reconnect to storage node in maintenance mode #278

Closed
opened 2024-10-04 08:25:33 +00:00 by alexvanin · 0 comments
Owner

Found by @dkirillov

Storage node in a maintenance mode returns errors on object storage operations. Pool component will switch connection to different storage node eventually.

Expected Behavior

During connection re-balance, storage node in a maintenance mode will not be marked as healthy node.

Current Behavior

Storage in maintenance mode is marked as healthy and connection is re-established with this node.

Possible Solution

As far as I understand, maintenance mode is explicitly declared in a node info status. If so, then we can check it during healthcheck.

ei, err := cl.EndpointInfo(ctx, sdkClient.PrmEndpointInfo{})
if err != nil || ei.NodeInfo().Status().IsMaintenance() {
	c.setUnhealthy()
	return wasHealthy, err
} 

Steps to Reproduce (for bugs)

  1. Put node in a maintenance mode
  2. Send object.Put request with SDK Pool component until connection is marked as unhealthy
  3. Wait for rebalance interval

Context

Maintenance mode is a complemetely valid state for node and it should not dramatically affect performance when SDK Pool tries to work with node in this state.

Regression

No

Your Environment

SDK Pool from frostfs-s3-gw v0.31.0-rc.2 (1b67ab9608)

Found by @dkirillov Storage node in a maintenance mode returns errors on object storage operations. Pool component will switch connection to different storage node eventually. ## Expected Behavior During connection re-balance, storage node in a maintenance mode will not be marked as healthy node. ## Current Behavior Storage in maintenance mode is marked as healthy and connection is re-established with this node. ## Possible Solution As far as I understand, maintenance mode is explicitly declared in a node info status. If so, then we can check it during healthcheck. ```go ei, err := cl.EndpointInfo(ctx, sdkClient.PrmEndpointInfo{}) if err != nil || ei.NodeInfo().Status().IsMaintenance() { c.setUnhealthy() return wasHealthy, err } ``` ## Steps to Reproduce (for bugs) 1. Put node in a maintenance mode 2. Send object.Put request with SDK Pool component until connection is marked as unhealthy 3. Wait for rebalance interval ## Context Maintenance mode is a complemetely valid state for node and it should not dramatically affect performance when SDK Pool tries to work with node in this state. ## Regression No ## Your Environment SDK Pool from frostfs-s3-gw v0.31.0-rc.2 (1b67ab960848)
alexvanin added the
bug
label 2024-10-04 08:25:33 +00:00
alexvanin self-assigned this 2024-10-04 08:25:33 +00:00
alexvanin added the
pool
label 2024-10-04 08:25:40 +00:00
dkirillov self-assigned this 2024-10-16 12:23:32 +00:00
alexvanin was unassigned by dkirillov 2024-10-16 14:31:26 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-sdk-go#278
No description provided.