Optimize new ID generation in pilorama

fyrchik commented

2023-08-10 12:21:55 +00:00

Owner

Here is an idea: instead of picking random ID in Move() and checking whether it exists, we could just pick the timestamp.
It will save us some time (and memory).

Database format remains compatible, but we could have problems if this algorithm is used together with the old implementation (some nodes are updated, some are not). Having this stored inside of the DB does not help much: we can always lose SSD and start fresh.
Having this in the configuration seems wrong: it is not a local parameter and affects the whole network.
Think about network setting or some failsafe mechanism.

Here is an idea: instead of picking random ID in `Move()` and checking whether it exists, we could just pick the timestamp. It will save us some time (and memory). Database format remains compatible, but we could have problems if this algorithm is used together with the old implementation (some nodes are updated, some are not). Having this stored inside of the DB does not help much: we can always lose SSD and start fresh. Having this in the configuration seems wrong: it is not a local parameter and affects the whole network. Think about network setting or some failsafe mechanism.

fyrchik added the

frostfs-node

triage

labels 2023-08-10 12:21:55 +00:00

aarifullin self-assigned this 2023-08-10 15:49:50 +00:00

aarifullin was unassigned by fyrchik

2023-08-11 07:35:52 +00:00

fyrchik added the

discussion

label 2023-08-11 07:35:58 +00:00

fyrchik commented

2023-08-11 07:37:22 +00:00

Author

Owner

Forget about it, it's bad.
We can support "correct-only" usecases, where Move is done on already existing items. But in reality it is not a part of the API and we cannot prevent "incorrect" moves being applied.

Forget about it, it's bad. We can support "correct-only" usecases, where `Move` is done on already existing items. But in reality it is not a part of the API and we cannot prevent "incorrect" moves being applied.

fyrchik closed this issue

2023-08-11 07:37:28 +00:00

fyrchik commented

2023-08-28 08:20:28 +00:00

Author

Owner

We still have a possible problem -- random numbers on different nodes can be equal (this is not hash, so the probability is non-negligible if the system works for years). Let's think about it in this task. The suggestion is to have them depend on epoch -- this way it works under all possible conditions. However, this in turn restricts the number of epochs: we need to get some numbers here (as an example -- first 4-bytes of the ID is an epoch, last 4 bytes are payload).

There are 2 ways to fight this: more bits for ID and proper snapshots.

We still have a possible problem -- random numbers on different nodes can be equal (this is not hash, so the probability is non-negligible if the system works for years). Let's think about it in this task. The suggestion is to have them depend on epoch -- this way it works under all possible conditions. However, this in turn restricts the number of epochs: we need to get some numbers here (as an example -- first 4-bytes of the ID is an epoch, last 4 bytes are payload). There are 2 ways to fight this: more bits for ID and proper snapshots.

fyrchik reopened this issue

2023-08-28 08:20:28 +00:00

fyrchik added this to the vNext milestone 2023-08-28 15:47:13 +00:00

fyrchik commented

2023-08-28 15:48:10 +00:00

Author

Owner

To be clear, this is a hard task because protocol compatibility and storage format compatibility need to be taken into account.

To be clear, this is a _hard_ task because protocol compatibility and storage format compatibility need to be taken into account.

Optimize new ID generation in pilorama #593