System behaviour when evacuating REP 1 objects #115

Closed
opened 2023-03-08 23:34:43 +00:00 by snegurochka · 2 comments
Member

Original issue: https://github.com/nspcc-dev/neofs-node/issues/1812

We want to behave correctly when evacuating objects stored only on a single node.
Here are 2 situations:

  1. Move node to maintenance -> evacuate objects
  2. Evacuate objects -> move node to maintenance

In (1) the objects could be unavailable between the first and the second step.
In (2) the objects could become unavailable if policer on another node removes them right after the evacuation.

So we need to take special care here. The proposal is following:

  1. Allow node to answer some requests in maintenance mode if possible.
  2. Change its state in the netmap.
  3. Make sure policer takes netmap into account even if a node answers requests: if an object is stored on a node that is under maintenance in a single replica, make sure it is replicated.
Original issue: https://github.com/nspcc-dev/neofs-node/issues/1812 We want to behave correctly when evacuating objects stored only on a single node. Here are 2 situations: 1. Move node to maintenance -> evacuate objects 2. Evacuate objects -> move node to maintenance In (1) the objects could be unavailable between the first and the second step. In (2) the objects could become unavailable if policer on another node removes them right after the evacuation. So we need to take special care here. The proposal is following: 1. Allow node to answer some requests in maintenance mode if possible. 2. Change its state in the netmap. 3. Make sure policer takes netmap into account even if a node answers requests: if an object is stored on a node that is under maintenance in a single replica, make sure it is replicated.
Owner

Decided to leave it as is for now, REP 1 is already a non-recommended policy.

Decided to leave it as is for now, REP 1 is already a non-recommended policy.

Scenario:

  1. I want to transfer all objects from node 1 to other nodes. The objects are saved with the REP 1 policy.
  2. I switch all the shards on node 1 to read-only mode and start the evacuation. Evacuation knows that it cannot transfer data to any of the available shards and begins to transfer it to other nodes.
  3. On other nodes, the policer is triggered, it takes an object that was moved by evacuation, checks that the object is already on node 1, and deletes this object from itself.
  4. The evacuation is coming to an end, I am fully confident that everything is OK, I am switching off the node 1.
  5. Node 1 switched off, policer deleted object on the other nodes, object lost. Profit!
Scenario: 1. I want to transfer all objects from node 1 to other nodes. The objects are saved with the REP 1 policy. 2. I switch all the shards on node 1 to read-only mode and start the evacuation. Evacuation knows that it cannot transfer data to any of the available shards and begins to transfer it to other nodes. 3. On other nodes, the policer is triggered, it takes an object that was moved by evacuation, checks that the object is already on node 1, and deletes this object from itself. 4. The evacuation is coming to an end, I am fully confident that everything is OK, I am switching off the node 1. 5. Node 1 switched off, policer deleted object on the other nodes, object lost. Profit!
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#115
No description provided.