Helper object for newbie nodes/shards #51

Open
opened 2023-02-13 09:36:12 +00:00 by carpawell · 1 comment
carpawell commented 2023-02-13 09:36:12 +00:00 (Migrated from github.com)

Context

We really rely on our sorting algorithm in some of our subsystems, e.g. we try to place objects in the most appropriate shards, and nodes.

Problem

I could imagine some sort of scenario when adding a new node/shard breaks sorting (that is not bad by itself) and could lead to unexpected object removal/keeping in storage:

  1. Broadcasting LOCK object was done when a container node was not expected to store an object and the node wrote locking info in the shard it thought should store the object. Change container node and shard order -> probability to write the object (and a potential TS for it) in another shard -> unexpected removal.
  2. Shard was RO -> locking/TS info came to not expected shard, will not be handled correctly when an object comes to that shard -> unexpected removal.
  3. Node was added after TS/LOCK objects creation, -> in the replication process that node could accept TS/LOCK objects before their objects and, therefore, could place them to a different shard -> unexpected removal.
  4. Metabase resync without real TS/LOCK object storing will erase that info.
  5. ...

Ideas

  • I was always thinking that storing a real object (not its meta relations only) in all the participating nodes/shards isn't a critical overhead but it could save us from hard-to-predict bugs and I still think so.
  • Maybe it is the time for MoveTo operation implementation.
  • All the shards could store meta relations of a container.
  • ...
### Context We really rely on our sorting algorithm in some of our subsystems, e.g. we try to place objects in the _most_ appropriate shards, and nodes. ### Problem I could imagine some sort of scenario when adding a new node/shard breaks sorting (that is not bad by itself) and could lead to unexpected object removal/keeping in storage: 1. Broadcasting LOCK object was done when a container node was not expected to store an object and the node wrote locking info in the shard it _thought_ should store the object. Change container node and shard order -> probability to write the object (and a potential TS for it) in another shard -> unexpected removal. 2. Shard was RO -> locking/TS info came to not expected shard, will not be handled correctly when an object comes to that shard -> unexpected removal. 3. Node was added after TS/LOCK objects creation, -> in the replication process that node could accept TS/LOCK objects _before_ their objects and, therefore, could place them to a different shard -> unexpected removal. 4. Metabase resync without real TS/LOCK object storing will erase that info. 5. ... ### Ideas - I was always thinking that storing a _real_ object (not its meta relations only) in all the participating nodes/shards isn't a critical overhead but it could save us from hard-to-predict bugs and I still think so. - Maybe it is the time for `MoveTo` operation implementation. - All the shards could store meta relations of a container. - ...
carpawell commented 2023-02-13 09:36:46 +00:00 (Migrated from github.com)

See some discussion details in the original issue.

See some discussion details in the original [issue](https://github.com/nspcc-dev/neofs-node/issues/2015).
fyrchik added this to the vNext milestone 2023-05-18 08:49:30 +00:00
fyrchik added
frostfs-node
and removed
triage
labels 2023-11-09 12:31:25 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#51
No description provided.