Completed shard evacuation reports number of evacuated objects less than total #684

Closed
opened 2023-09-12 08:50:22 +00:00 by an-nikitin · 4 comments

Expected Behavior

Once the evacuation is in completed state, the value of the Evacuated field in the GetShardEvacuationStatusResponse_Body structure is equal to Total.

Current Behavior

root@annikitin-node1:/home/service# frostfs-cli --config /etc/frostfs/storage/control.yml control shards set-mode --mode read-only --id "3ALSd4o2ioKDDzywwheErA"

Shard mode update request successfully sent.

root@annikitin-node1:/home/service# frostfs-cli --config /etc/frostfs/storage/control.yml control shards evacuation start --await --id 3ALSd4o2ioKDDzywwheErA --no-errors

Shard evacuation has been successfully started.

Progress will be reported every 5 seconds.

Shard IDs: 3ALSd4o2ioKDDzywwheErA. Status: running. Evacuated 0 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:04.

Shard IDs: 3ALSd4o2ioKDDzywwheErA. Status: running. Evacuated 0 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:09.

Shard evacuation has been completed.

Shard IDs: 3ALSd4o2ioKDDzywwheErA. Evacuated 2 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:13.

Possible Solution

No fix can be suggested by a QA engineer. Further solutions shall be up to developers.

Steps to Reproduce (for bugs)

  1. Create some objects on the node.
  2. Put a shard to read-only mode.
  3. Start evacuation of the shard.
  4. Wait for the evacution to become completed.
  5. Check the Evacuated value in the evacuation status.

Context

The problem is in the statistics retrieved via RPC calls, not in frostfs-cli, so it affects any tools which use the RPC interface to perform shard evacuations.
The behavior itself looks very confusing to the user because it gives the impression that some (or most) of the objects have not been evacuated.

Regression

No.

Your Environment

T.O v1.3.0-137 in Sbercloud.

## Expected Behavior Once the evacuation is in completed state, the value of the Evacuated field in the GetShardEvacuationStatusResponse_Body structure is equal to Total. ## Current Behavior ``` root@annikitin-node1:/home/service# frostfs-cli --config /etc/frostfs/storage/control.yml control shards set-mode --mode read-only --id "3ALSd4o2ioKDDzywwheErA" Shard mode update request successfully sent. root@annikitin-node1:/home/service# frostfs-cli --config /etc/frostfs/storage/control.yml control shards evacuation start --await --id 3ALSd4o2ioKDDzywwheErA --no-errors Shard evacuation has been successfully started. Progress will be reported every 5 seconds. Shard IDs: 3ALSd4o2ioKDDzywwheErA. Status: running. Evacuated 0 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:04. Shard IDs: 3ALSd4o2ioKDDzywwheErA. Status: running. Evacuated 0 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:09. Shard evacuation has been completed. Shard IDs: 3ALSd4o2ioKDDzywwheErA. Evacuated 2 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:13. ``` ## Possible Solution No fix can be suggested by a QA engineer. Further solutions shall be up to developers. ## Steps to Reproduce (for bugs) 1. Create some objects on the node. 2. Put a shard to read-only mode. 3. Start evacuation of the shard. 4. Wait for the evacution to become completed. 5. Check the Evacuated value in the evacuation status. ## Context The problem is in the statistics retrieved via RPC calls, not in frostfs-cli, so it affects any tools which use the RPC interface to perform shard evacuations. The behavior itself looks very confusing to the user because it gives the impression that some (or most) of the objects have not been evacuated. ## Regression No. ## Your Environment T.O v1.3.0-137 in Sbercloud.
an-nikitin added the
bug
triage
labels 2023-09-12 08:50:22 +00:00
fyrchik added the
frostfs-node
label 2023-09-12 08:51:59 +00:00
Owner

Could be related to big object handling.

Could be related to big object handling.
fyrchik added this to the v0.38.0 milestone 2023-09-12 08:52:25 +00:00
Author

Could be related to big object handling.

Might very well be, because I put some 1GB objects there.

> Could be related to big object handling. Might very well be, because I put some 1GB objects there.
achuprov was assigned by dstepanov-yadro 2023-09-20 14:42:16 +00:00
Member

I cannot reproduce the bug. I am using a 1GB file

Create container and put objects
frostfs-cli container create -r localhost:8080 --wallet /etc/frostfs/http/wallet.json --policy "REP 1 IN X CBF 1 SELECT 1 FROM * AS X" --basic-acl public-read --await

frostfs-cli object put -r localhost:8080 --wallet /etc/frostfs/http/wallet.json --file 1gb --cid $cid

Evacuation:

I choose a shard where the disk is most filled (64%) on another node.

frostfs-cli --config /etc/frostfs/storage/control.yml control shards list

frostfs-cli --config /etc/frostfs/storage/control.yml control shards set-mode --mode read-only --id E6kboFMNVs6KbfkXJCHCJn

 frostfs-cli --config /etc/frostfs/storage/control.yml control shards evacuation start --await --id E6kboFMNVs6KbfkXJCHCJn --no-errors

Output:

Shard evacuation has been successfully started.
Progress will be reported every 5 seconds.
Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 5 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:04. Estimated time left: 2 minutes.
Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 10 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:09. Estimated time left: 3 minutes.
Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 14 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:14. Estimated time left: 3 minutes.
Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 19 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:19. Estimated time left: 3 minutes.

...

Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 220 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:04:24. Estimated time left: 0 minutes.
Shard evacuation has been completed.
Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Evacuated 225 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:04:28.

Environment: sbercloud v1.3.0-178

I cannot reproduce the bug. I am using a 1GB file <details> <summary>Create container and put objects</summary> ```bash frostfs-cli container create -r localhost:8080 --wallet /etc/frostfs/http/wallet.json --policy "REP 1 IN X CBF 1 SELECT 1 FROM * AS X" --basic-acl public-read --await frostfs-cli object put -r localhost:8080 --wallet /etc/frostfs/http/wallet.json --file 1gb --cid $cid ``` </details> Evacuation: I choose a shard where the disk is most filled (64%) on another node. frostfs-cli --config /etc/frostfs/storage/control.yml control shards list frostfs-cli --config /etc/frostfs/storage/control.yml control shards set-mode --mode read-only --id E6kboFMNVs6KbfkXJCHCJn frostfs-cli --config /etc/frostfs/storage/control.yml control shards evacuation start --await --id E6kboFMNVs6KbfkXJCHCJn --no-errors Output: Shard evacuation has been successfully started. Progress will be reported every 5 seconds. Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 5 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:04. Estimated time left: 2 minutes. Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 10 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:09. Estimated time left: 3 minutes. Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 14 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:14. Estimated time left: 3 minutes. Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 19 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:19. Estimated time left: 3 minutes. ... Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 220 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:04:24. Estimated time left: 0 minutes. Shard evacuation has been completed. Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Evacuated 225 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:04:28. Environment: sbercloud v1.3.0-178
dstepanov-yadro was assigned by fyrchik 2023-11-01 12:55:46 +00:00
achuprov was unassigned by fyrchik 2023-11-01 12:55:47 +00:00

Added skipped counter to verify total count.

Added skipped counter to verify total count.
fyrchik removed the
triage
label 2023-11-09 07:15:52 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#684
No description provided.