Guarantee consistency when handling expired tombstones and lock objects #1445

Open
opened 2024-10-23 08:35:37 +00:00 by a-savchuk · 0 comments
Member

The task is based on several bugs that have appeared recently.

Problem

Suppose a locked object and its lock are placed on separate shards. While handling expired lock objects, the GC deletes the expired lock object but doesn't unlock the associated object if it's on a read-only shard. As a result, the original object remains locked infinitely on that node.

Since the GC needs to access all shards when handling a expired locks, it uses callbacks provided by the storage engine. Then the GC accesses each shards separately, which leads to the problem described above.

The same problem may occur when a shard with the lock was detached and then attached after the GC have already handled the expired lock object.

Tombstones and graves are handled similarly and may have same problems.

Different behavior of garbage collector

The garbage collector handles expired tombstone and grave, lock object and locks differently:

  • Collect graves -> for each shard: delete tombstone and graves (atomic deletion after #1493)
---
title: Expired tombstones handling
---

flowchart LR
    TG[Take grave] --> SFT["`Search for
tombstone`"]
    SFT --> ITA{"`Is tombstone
available?`"}
    
    ITA --> |True| STOP
    ITA --> |False| DG["`Walk all shards,
delete tombstone
and related graves
on each shard`"]
    
    DG --> STOP((Stop))
  • Collect lock objects -> for each shard: delete locks, then delete lock object.
---
title: Expired lock objects handling
---

flowchart LR
    TL["`Take lock
object`"] --> IE{Is expired?}
    IE --> |True| DL["`Walk all shards,
delete lock object
and related locks
on each shard`"]

    DL --> STOP((Stop))  
    IE --> |False| STOP

As a result:

  • Orphan locks are more likely to occur than orphan graves, because there's no way to determine if a lock needs to be deleted when its related lock object is already gone.
  • Orphan tombstones (not graves) can occur, but they are easily to determine as expired since each tombstone is stored with its expiration epoch. However, currently the GC doesn't handle this situation.
  • A grave may be deleted before an associated tombstone is expired. For example, suppose a shard with the tombstone was detached, the GC didn't find this tombstone, grave is deleted. Now it's unlikely to happen because the GC searches for the tombstone in entire FrostFS network. However, we'd like the GC do all operations locally because it shouldn't fail because of network operations.

Proposed solution

Append expiration epoch to graves and locks

From now on, each grave or lock must include an expiration epoch. Inhume and Lock operations will require an expiration epoch to be passed along with other arguments and will store the expiration epoch together with the grave or lock.

Modify behavior of garbage collector

If every graves and locks had expiration epoch, the GC behavior would be very simple, it would have three routines: one for deleting expired objects of all types, other one for deleting expired graves, last one for deleting expired locks. However, before metabase migration (about migration see below) is applied, the GC should handle graves and locks of both old and new formats. The following GC behavior is proposed:

---
title: Expired tombstones handling
---

flowchart LR
    TG[Take grave] --> HEE{"`Has expiration
epoch?`"}

    HEE --> |True| IE{Is expired?}
    IE --> |True| DG1["`Delete grave
on current shard`"]
    IE --> |False| STOP
    
    HEE -->|False| SFT["`Search for
tombstone`"]
    SFT --> ITA{"`Is tombstone
available?`"}
    
    ITA --> |True| STOP
    ITA --> |False| DG2["`Walk all shards,
delete tombstone
and related graves
on each shard`"]
    
    DG1 --> STOP((Stop))
    DG2 --> STOP  


    _TG[Take tombstone] --> _IE{Is expired?}
    
    _IE -->|True| _DT["`Delete tombstone
on current shard`"]
    _IE -->|False| _STOP

    _DT --> _STOP((Stop))
---
title: Expired lock object handling
---

flowchart LR
    TG["`Take lock
object`"] --> IE{Is expired?}
    IE --> |True| DL["`Walk all shards,
delete lock object
and related locks
on each shard`"]

    DL --> STOP((Stop))  
    IE --> |False| STOP


    _TL[Take lock] --> _HEE{"`Has expiration
epoch?`"}

    _HEE --> |True| _IE{Is expired?}
    _IE --> |True| _DL["`Delete lock
on current shard`"]
    _DL --> _STOP((Stop))  
    _IE --> |False| _STOP

    _HEE --> |False| _STOP

Additionally, handling expired objects of different types can be united into one routine.

Apply metabase migration

Appending expiration epoch to graves and lock requires applying metabase migration. The following approach is suggested.

Asynchronous migration with policer

Since determining the expiration epoch for a grave or lock may require searching for a tombstone or lock object on other nodes, synchronous migration is not easy and fast enough. A policer is proposed to find tombstones and lock objects and append the missing expiration epochs to graves and locks.

However, it's not guaranteed that every tombstone or lock object can be found. Some may have been lost.

Synchronous migration in the future

After a sufficient period, when most graves and locks have already had their expiration epochs appended, a synchronous migration can be applied.

Handling newly created metabases differently

Newly created metabases will already all graves and locks with expiration epochs. Therefore, the policer can skip migration for these metabases. Additionally, the GC behavior will be much simpler: it can delete graves and tombstones separately, as well as locks and lock objects.

Progress

  • Append expiration epoch to graves (#1481)
  • Append expiration epoch to locks
  • Modify behavior of garbage collector
    • Delete expired objects of all types in one worker (#1481)
    • Delete graves with expiration epoch separately (#1481)
    • Delete locks with expiration epoch separately
  • Handle newly created metabases differently
  • Asynchronous migration with policer
  • Synchronous migration in the future
The task is based on several bugs that have appeared recently. ## Problem Suppose a locked object and its lock are placed on separate shards. While handling expired lock objects, the GC deletes the expired lock object but doesn't unlock the associated object if it's on a read-only shard. As a result, the original object remains locked infinitely on that node. Since the GC needs to access all shards when handling a expired locks, it uses callbacks provided by the storage engine. Then the GC accesses each shards separately, which leads to the problem described above. The same problem may occur when a shard with the lock was detached and then attached after the GC have already handled the expired lock object. Tombstones and graves are handled similarly and may have same problems. ### Different behavior of garbage collector The garbage collector handles expired tombstone and grave, lock object and locks differently: - Collect graves -> for each shard: delete tombstone and graves (atomic deletion after #1493) ```mermaid --- title: Expired tombstones handling --- flowchart LR TG[Take grave] --> SFT["`Search for tombstone`"] SFT --> ITA{"`Is tombstone available?`"} ITA --> |True| STOP ITA --> |False| DG["`Walk all shards, delete tombstone and related graves on each shard`"] DG --> STOP((Stop)) ``` - Collect lock objects -> for each shard: delete locks, then delete lock object. ```mermaid --- title: Expired lock objects handling --- flowchart LR TL["`Take lock object`"] --> IE{Is expired?} IE --> |True| DL["`Walk all shards, delete lock object and related locks on each shard`"] DL --> STOP((Stop)) IE --> |False| STOP ``` As a result: - Orphan locks are more likely to occur than orphan graves, because there's no way to determine if a lock needs to be deleted when its related lock object is already gone. - Orphan tombstones (not graves) can occur, but they are easily to determine as expired since each tombstone is stored with its expiration epoch. However, currently the GC doesn't handle this situation. - A grave may be deleted before an associated tombstone is expired. For example, suppose a shard with the tombstone was detached, the GC didn't find this tombstone, grave is deleted. Now it's unlikely to happen because the GC searches for the tombstone in entire FrostFS network. However, we'd like the GC do all operations locally because it shouldn't fail because of network operations. ## Proposed solution ### Append expiration epoch to graves and locks From now on, each grave or lock must include an expiration epoch. `Inhume` and `Lock` operations will require an expiration epoch to be passed along with other arguments and will store the expiration epoch together with the grave or lock. ### Modify behavior of garbage collector If every graves and locks had expiration epoch, the GC behavior would be very simple, it would have three routines: one for deleting expired objects of all types, other one for deleting expired graves, last one for deleting expired locks. However, before metabase migration (about migration see below) is applied, the GC should handle graves and locks of both old and new formats. The following GC behavior is proposed: ```mermaid --- title: Expired tombstones handling --- flowchart LR TG[Take grave] --> HEE{"`Has expiration epoch?`"} HEE --> |True| IE{Is expired?} IE --> |True| DG1["`Delete grave on current shard`"] IE --> |False| STOP HEE -->|False| SFT["`Search for tombstone`"] SFT --> ITA{"`Is tombstone available?`"} ITA --> |True| STOP ITA --> |False| DG2["`Walk all shards, delete tombstone and related graves on each shard`"] DG1 --> STOP((Stop)) DG2 --> STOP _TG[Take tombstone] --> _IE{Is expired?} _IE -->|True| _DT["`Delete tombstone on current shard`"] _IE -->|False| _STOP _DT --> _STOP((Stop)) ``` ```mermaid --- title: Expired lock object handling --- flowchart LR TG["`Take lock object`"] --> IE{Is expired?} IE --> |True| DL["`Walk all shards, delete lock object and related locks on each shard`"] DL --> STOP((Stop)) IE --> |False| STOP _TL[Take lock] --> _HEE{"`Has expiration epoch?`"} _HEE --> |True| _IE{Is expired?} _IE --> |True| _DL["`Delete lock on current shard`"] _DL --> _STOP((Stop)) _IE --> |False| _STOP _HEE --> |False| _STOP ``` Additionally, handling expired objects of different types can be united into one routine. ### Apply metabase migration Appending expiration epoch to graves and lock requires applying metabase migration. The following approach is suggested. #### Asynchronous migration with policer Since determining the expiration epoch for a grave or lock may require searching for a tombstone or lock object on other nodes, synchronous migration is not easy and fast enough. A policer is proposed to find tombstones and lock objects and append the missing expiration epochs to graves and locks. However, it's not guaranteed that every tombstone or lock object can be found. Some may have been lost. #### Synchronous migration in the future After a sufficient period, when most graves and locks have already had their expiration epochs appended, a synchronous migration can be applied. #### Handling newly created metabases differently Newly created metabases will already all graves and locks with expiration epochs. Therefore, the policer can skip migration for these metabases. Additionally, the GC behavior will be much simpler: it can delete graves and tombstones separately, as well as locks and lock objects. ## Progress - [x] Append expiration epoch to graves (#1481) - [ ] Append expiration epoch to locks - [ ] Modify behavior of garbage collector - [x] Delete expired objects of all types in one worker (#1481) - [x] Delete graves with expiration epoch separately (#1481) - [ ] Delete locks with expiration epoch separately - [ ] Handle newly created metabases differently - [ ] Asynchronous migration with policer - [ ] Synchronous migration in the future
a-savchuk added the
discussion
frostfs-node
triage
labels 2024-10-23 08:35:37 +00:00
a-savchuk self-assigned this 2024-10-23 08:35:37 +00:00
a-savchuk was unassigned by fyrchik 2024-10-28 06:51:22 +00:00
a-savchuk changed title from Make GC handle expired lockers/tombstones correctly while some shards are read-only to Guarantee consistency when handling expired tombstones and lock objects 2024-12-20 10:00:59 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1445
No description provided.