Object won't be deleted if shards are out of free space #1599

Open
opened 2025-01-14 10:20:19 +00:00 by aarifullin · 0 comments
Member

Expected Behavior

Object is able to get removed even shard is out of space

Current Behavior

When shards are full, then an object cannot get removed because Put with the object type tombstone won't be performed:

Jan 14 12:55:51 NB-2635 s01[305]: debug        delete/delete.go:39        operation finished with error        {"component": "objectSDK.Delete service", "request": "DELETE", "address": "3MZawNcrpYsR4YuRnYCWJWF5EtemXZAGRqy75GiLXZGc/jh3Xn6HeDzydDTExHSeko3h6phawHDbj25di7EYysWJ", "local": false, "with session": true, "with bearer": false, "error": "save tombstone: (*putsvc.Streamer) could not close object target: could not write to next target: incomplete object PUT by placement: could not write header: (*writer.remoteWriter) could not put single object to [/dns4/s04.frostfs.devenv/tcp/8082/tls /dns4/s04.frostfs.devenv/tcp/8080]: put single object via client: status: code = 1024 message = incomplete object PUT by placement: (writer.LocalTarget) could not put object to local storage: could not put object to any shard", "trace_id": "3009d1e98a7e17e8ac9d51d8fc4c0fd5"}
Jan 14 12:55:51 NB-2635 s04[305]: warn        engine/engine.go:151        could not get active blobovnicza        {"shard_id": "DTLvhAM7LThQAirjidxZJh", "error count": 29, "error": "open blobovnicza /storage/blobovnicza0/1/3/0.db: mkdir /storage/blobovnicza0/1: no space left on device", "trace_id": "3009d1e98a7e17e8ac9d51d8fc4c0fd5"}

Possible Solution

Make a hard reservation for the disk space to put tombstone objects

Context

Regression

No regression as the problem seems to be never solved before

Your Environment

The bug can be easily reproduced in frostfs-dev-env

Steps to Reproduce (for bugs)

To reproduce the bug in frostfs-dev-env we need to "shrink" docker volumes for the storages

  1. Create a disk image and create ext4 FS (probably, you can create the only disk image but I haven't checked this out)
dd if=/dev/zero of=disk1.img bs=1M count=128; mkfs.ext4 disk1.img
dd if=/dev/zero of=disk2.img bs=1M count=128; mkfs.ext4 disk2.img
dd if=/dev/zero of=disk3.img bs=1M count=128; mkfs.ext4 disk3.img
dd if=/dev/zero of=disk4.img bs=1M count=128; mkfs.ext4 disk4.img
  1. Mount the virtual disks and change permissions
sudo mkdir /mnt/disk1; sudo mount -o loop disk1.img /mnt/disk1
...
sudo mkdir /mnt/disk2; sudo mount -o loop disk1.img /mnt/disk4
sudo chown -R $USER:$USER /mnt/disk1; sudo chown -R $USER:$USER /mnt/disk2; ...
  1. Slightly change docker-compose file
volumes:
  storage_s01:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: /mnt/disk1
  storage_s02:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: /mnt/disk2
  storage_s03:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: /mnt/disk3
  storage_s04:
    driver: local
    driver_opts:
      o: bind
      type: none
      device: /mnt/disk4
  1. Launch dev-env. Create container with 'REP 2 CBF 1 SELECT 2 FROM *'
  2. Upload files until df -h /mnt/disk1 shows 100% usage
  3. Try to delete any object: you'll get the error
<!-- Provide a general summary of the issue in the Title above --> ## Expected Behavior Object is able to get removed even shard is out of space ## Current Behavior When shards are full, then an object cannot get removed because `Put` with the object type `tombstone` won't be performed: ``` Jan 14 12:55:51 NB-2635 s01[305]: debug delete/delete.go:39 operation finished with error {"component": "objectSDK.Delete service", "request": "DELETE", "address": "3MZawNcrpYsR4YuRnYCWJWF5EtemXZAGRqy75GiLXZGc/jh3Xn6HeDzydDTExHSeko3h6phawHDbj25di7EYysWJ", "local": false, "with session": true, "with bearer": false, "error": "save tombstone: (*putsvc.Streamer) could not close object target: could not write to next target: incomplete object PUT by placement: could not write header: (*writer.remoteWriter) could not put single object to [/dns4/s04.frostfs.devenv/tcp/8082/tls /dns4/s04.frostfs.devenv/tcp/8080]: put single object via client: status: code = 1024 message = incomplete object PUT by placement: (writer.LocalTarget) could not put object to local storage: could not put object to any shard", "trace_id": "3009d1e98a7e17e8ac9d51d8fc4c0fd5"} Jan 14 12:55:51 NB-2635 s04[305]: warn engine/engine.go:151 could not get active blobovnicza {"shard_id": "DTLvhAM7LThQAirjidxZJh", "error count": 29, "error": "open blobovnicza /storage/blobovnicza0/1/3/0.db: mkdir /storage/blobovnicza0/1: no space left on device", "trace_id": "3009d1e98a7e17e8ac9d51d8fc4c0fd5"} ``` ## Possible Solution Make a hard reservation for the disk space to put tombstone objects ## Context <!-- How has this issue affected you? What are you trying to accomplish? Providing context helps us come up with a solution that is most useful in the real world --> ## Regression No regression as the problem seems to be never solved before <!-- Is this issue a regression? (Yes / No) If Yes, optionally please include version or commit id or PR# that caused this regression, if you have these details --> ## Your Environment The bug can be easily reproduced in `frostfs-dev-env` ## Steps to Reproduce (for bugs) To reproduce the bug in `frostfs-dev-env` we need to "shrink" docker volumes for the storages 1. Create a disk image and create `ext4` FS (probably, you can create the only disk image but I haven't checked this out) ```bash dd if=/dev/zero of=disk1.img bs=1M count=128; mkfs.ext4 disk1.img dd if=/dev/zero of=disk2.img bs=1M count=128; mkfs.ext4 disk2.img dd if=/dev/zero of=disk3.img bs=1M count=128; mkfs.ext4 disk3.img dd if=/dev/zero of=disk4.img bs=1M count=128; mkfs.ext4 disk4.img ``` 2. Mount the virtual disks and change permissions ```bash sudo mkdir /mnt/disk1; sudo mount -o loop disk1.img /mnt/disk1 ... sudo mkdir /mnt/disk2; sudo mount -o loop disk1.img /mnt/disk4 ``` ```bash sudo chown -R $USER:$USER /mnt/disk1; sudo chown -R $USER:$USER /mnt/disk2; ... ``` 3. Slightly change [docker-compose file](https://git.frostfs.info/TrueCloudLab/frostfs-dev-env/src/branch/master/services/storage/docker-compose.yml#L175-L179) ```docker-compose volumes: storage_s01: driver: local driver_opts: o: bind type: none device: /mnt/disk1 storage_s02: driver: local driver_opts: o: bind type: none device: /mnt/disk2 storage_s03: driver: local driver_opts: o: bind type: none device: /mnt/disk3 storage_s04: driver: local driver_opts: o: bind type: none device: /mnt/disk4 ``` 4. Launch dev-env. Create container with `'REP 2 CBF 1 SELECT 2 FROM *'` 5. Upload files until `df -h /mnt/disk1` shows 100% usage 6. Try to delete any object: you'll get the error
aarifullin added the
bug
discussion
triage
labels 2025-01-14 10:20:19 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1599
No description provided.