Optimize new ID generation in pilorama #593

Open
opened 2023-08-10 12:21:55 +00:00 by fyrchik · 3 comments
Owner

Here is an idea: instead of picking random ID in Move() and checking whether it exists, we could just pick the timestamp.
It will save us some time (and memory).

Database format remains compatible, but we could have problems if this algorithm is used together with the old implementation (some nodes are updated, some are not). Having this stored inside of the DB does not help much: we can always lose SSD and start fresh.
Having this in the configuration seems wrong: it is not a local parameter and affects the whole network.
Think about network setting or some failsafe mechanism.

Here is an idea: instead of picking random ID in `Move()` and checking whether it exists, we could just pick the timestamp. It will save us some time (and memory). Database format remains compatible, but we could have problems if this algorithm is used together with the old implementation (some nodes are updated, some are not). Having this stored inside of the DB does not help much: we can always lose SSD and start fresh. Having this in the configuration seems wrong: it is not a local parameter and affects the whole network. Think about network setting or some failsafe mechanism.
fyrchik added the
frostfs-node
triage
labels 2023-08-10 12:21:55 +00:00
aarifullin self-assigned this 2023-08-10 15:49:50 +00:00
aarifullin was unassigned by fyrchik 2023-08-11 07:35:52 +00:00
fyrchik added the
discussion
label 2023-08-11 07:35:58 +00:00
Author
Owner

Forget about it, it's bad.
We can support "correct-only" usecases, where Move is done on already existing items. But in reality it is not a part of the API and we cannot prevent "incorrect" moves being applied.

Forget about it, it's bad. We can support "correct-only" usecases, where `Move` is done on already existing items. But in reality it is not a part of the API and we cannot prevent "incorrect" moves being applied.
Author
Owner

We still have a possible problem -- random numbers on different nodes can be equal (this is not hash, so the probability is non-negligible if the system works for years). Let's think about it in this task. The suggestion is to have them depend on epoch -- this way it works under all possible conditions. However, this in turn restricts the number of epochs: we need to get some numbers here (as an example -- first 4-bytes of the ID is an epoch, last 4 bytes are payload).

There are 2 ways to fight this: more bits for ID and proper snapshots.

We still have a possible problem -- random numbers on different nodes can be equal (this is not hash, so the probability is non-negligible if the system works for years). Let's think about it in this task. The suggestion is to have them depend on epoch -- this way it works under all possible conditions. However, this in turn restricts the number of epochs: we need to get some numbers here (as an example -- first 4-bytes of the ID is an epoch, last 4 bytes are payload). There are 2 ways to fight this: more bits for ID and proper snapshots.
fyrchik reopened this issue 2023-08-28 08:20:28 +00:00
fyrchik added this to the vNext milestone 2023-08-28 15:47:13 +00:00
Author
Owner

To be clear, this is a hard task because protocol compatibility and storage format compatibility need to be taken into account.

To be clear, this is a _hard_ task because protocol compatibility and storage format compatibility need to be taken into account.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#593
No description provided.