Writecache potential consistency issues #634

Closed
opened 2023-08-22 06:41:33 +00:00 by ale64bit · 1 comment
Collaborator

I've been thinking in a few cases where the current writecache implementations seem to be racy, although I don't have a concrete test to display the issue yet. So I'll open an issue to discuss and collect such cases.

  1. The Delete method from writecache deletes it from the cache, but never checks against the backing storage. For example, if a Delete call is issued while a flush worker is Putting the same object, couldn't it happen that after calling Delete, you still get the object with Get due to such race? This would be unfortunate because the system wouldn't make any further progress to get out of the inconsistency.
  2. If the same key is written twice with a payload that is smaller than the cache maxObjectSize and then with one that is larger, there's a race between the flush that the client should not observe, no? Namely, issuing the sequence Put(k1, v_small), Put(k1, v_large), Get(k1) could return either v_small or v_large due to the flush behavior.

Generally speaking, I feel there are many edge cases like this and I would strongly recommend taking a critical look at our consistency model and writecache invariants.

Related: #610

I've been thinking in a few cases where the current writecache implementations seem to be racy, although I don't have a concrete test to display the issue yet. So I'll open an issue to discuss and collect such cases. 1. The `Delete` method from writecache deletes it from the cache, but never checks against the backing storage. For example, if a `Delete` call is issued while a flush worker is `Put`ting the same object, couldn't it happen that after calling `Delete`, you still get the object with `Get` due to such race? This would be unfortunate because the system wouldn't make any further progress to get out of the inconsistency. 2. If the same key is written twice with a payload that is smaller than the cache maxObjectSize and then with one that is larger, there's a race between the flush that the client should not observe, no? Namely, issuing the sequence `Put(k1, v_small), Put(k1, v_large), Get(k1)` could return either `v_small` or `v_large` due to the flush behavior. Generally speaking, I feel there are many edge cases like this and I would strongly recommend taking a critical look at our consistency model and writecache invariants. Related: https://git.frostfs.info/TrueCloudLab/frostfs-node/issues/610
ale64bit added the
bug
triage
discussion
labels 2023-08-22 06:41:41 +00:00
  1. key is defined by value (key is hash of value), so this case is undefined behaviour i think :)
2. `key` is defined by `value` (key is hash of value), so this case is `undefined behaviour` i think :)
fyrchik added this to the v0.38.0 milestone 2023-08-23 10:40:37 +00:00
fyrchik modified the milestone from v0.38.0 to v0.39.0 2024-02-28 19:27:08 +00:00
dstepanov-yadro self-assigned this 2024-03-06 09:09:41 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#634
There is no content yet.