TrueCloudLab/frostfs-node

Fork 28

Completed shard evacuation reports number of evacuated objects less than total #684

New issue

Closed

opened 2023-09-12 08:50:22 +00:00 by an-nikitin · 4 comments

an-nikitin commented

2023-09-12 08:50:22 +00:00

Expected Behavior

Once the evacuation is in completed state, the value of the Evacuated field in the GetShardEvacuationStatusResponse_Body structure is equal to Total.

Current Behavior

root@annikitin-node1:/home/service# frostfs-cli --config /etc/frostfs/storage/control.yml control shards set-mode --mode read-only --id "3ALSd4o2ioKDDzywwheErA"

Shard mode update request successfully sent.

root@annikitin-node1:/home/service# frostfs-cli --config /etc/frostfs/storage/control.yml control shards evacuation start --await --id 3ALSd4o2ioKDDzywwheErA --no-errors

Shard evacuation has been successfully started.

Progress will be reported every 5 seconds.

Shard IDs: 3ALSd4o2ioKDDzywwheErA. Status: running. Evacuated 0 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:04.

Shard IDs: 3ALSd4o2ioKDDzywwheErA. Status: running. Evacuated 0 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:09.

Shard evacuation has been completed.

Shard IDs: 3ALSd4o2ioKDDzywwheErA. Evacuated 2 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:13.

Possible Solution

No fix can be suggested by a QA engineer. Further solutions shall be up to developers.

Steps to Reproduce (for bugs)

Create some objects on the node.
Put a shard to read-only mode.
Start evacuation of the shard.
Wait for the evacution to become completed.
Check the Evacuated value in the evacuation status.

Context

The problem is in the statistics retrieved via RPC calls, not in frostfs-cli, so it affects any tools which use the RPC interface to perform shard evacuations.
The behavior itself looks very confusing to the user because it gives the impression that some (or most) of the objects have not been evacuated.

Regression

No.

Your Environment

T.O v1.3.0-137 in Sbercloud.

## Expected Behavior Once the evacuation is in completed state, the value of the Evacuated field in the GetShardEvacuationStatusResponse_Body structure is equal to Total. ## Current Behavior ``` root@annikitin-node1:/home/service# frostfs-cli --config /etc/frostfs/storage/control.yml control shards set-mode --mode read-only --id "3ALSd4o2ioKDDzywwheErA" Shard mode update request successfully sent. root@annikitin-node1:/home/service# frostfs-cli --config /etc/frostfs/storage/control.yml control shards evacuation start --await --id 3ALSd4o2ioKDDzywwheErA --no-errors Shard evacuation has been successfully started. Progress will be reported every 5 seconds. Shard IDs: 3ALSd4o2ioKDDzywwheErA. Status: running. Evacuated 0 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:04. Shard IDs: 3ALSd4o2ioKDDzywwheErA. Status: running. Evacuated 0 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:09. Shard evacuation has been completed. Shard IDs: 3ALSd4o2ioKDDzywwheErA. Evacuated 2 object out of 24, failed to evacuate 0 objects. Started at: 2023-09-12T08:32:23Z UTC. Duration: 00:00:13. ``` ## Possible Solution No fix can be suggested by a QA engineer. Further solutions shall be up to developers. ## Steps to Reproduce (for bugs) 1. Create some objects on the node. 2. Put a shard to read-only mode. 3. Start evacuation of the shard. 4. Wait for the evacution to become completed. 5. Check the Evacuated value in the evacuation status. ## Context The problem is in the statistics retrieved via RPC calls, not in frostfs-cli, so it affects any tools which use the RPC interface to perform shard evacuations. The behavior itself looks very confusing to the user because it gives the impression that some (or most) of the objects have not been evacuated. ## Regression No. ## Your Environment T.O v1.3.0-137 in Sbercloud.

an-nikitin added the

bug

triage

labels 2023-09-12 08:50:22 +00:00

fyrchik added the

frostfs-node

label 2023-09-12 08:51:59 +00:00

fyrchik commented

2023-09-12 08:52:20 +00:00

Owner

Could be related to big object handling.

fyrchik added this to the v0.38.0 milestone 2023-09-12 08:52:25 +00:00

an-nikitin commented

2023-09-12 09:06:22 +00:00

Author

Could be related to big object handling.

Might very well be, because I put some 1GB objects there.

> Could be related to big object handling. Might very well be, because I put some 1GB objects there.

achuprov was assigned by dstepanov-yadro

2023-09-20 14:42:16 +00:00

achuprov commented

2023-09-29 14:32:54 +00:00

Member

I cannot reproduce the bug. I am using a 1GB file

Create container and put objects

frostfs-cli container create -r localhost:8080 --wallet /etc/frostfs/http/wallet.json --policy "REP 1 IN X CBF 1 SELECT 1 FROM * AS X" --basic-acl public-read --await

frostfs-cli object put -r localhost:8080 --wallet /etc/frostfs/http/wallet.json --file 1gb --cid $cid

Evacuation:

I choose a shard where the disk is most filled (64%) on another node.

frostfs-cli --config /etc/frostfs/storage/control.yml control shards list

frostfs-cli --config /etc/frostfs/storage/control.yml control shards set-mode --mode read-only --id E6kboFMNVs6KbfkXJCHCJn

 frostfs-cli --config /etc/frostfs/storage/control.yml control shards evacuation start --await --id E6kboFMNVs6KbfkXJCHCJn --no-errors

Output:

Shard evacuation has been successfully started.
Progress will be reported every 5 seconds.
Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 5 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:04. Estimated time left: 2 minutes.
Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 10 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:09. Estimated time left: 3 minutes.
Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 14 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:14. Estimated time left: 3 minutes.
Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 19 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:19. Estimated time left: 3 minutes.

...

Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 220 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:04:24. Estimated time left: 0 minutes.
Shard evacuation has been completed.
Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Evacuated 225 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:04:28.

Environment: sbercloud v1.3.0-178

I cannot reproduce the bug. I am using a 1GB file <details> <summary>Create container and put objects</summary> ```bash frostfs-cli container create -r localhost:8080 --wallet /etc/frostfs/http/wallet.json --policy "REP 1 IN X CBF 1 SELECT 1 FROM * AS X" --basic-acl public-read --await frostfs-cli object put -r localhost:8080 --wallet /etc/frostfs/http/wallet.json --file 1gb --cid $cid ``` </details> Evacuation: I choose a shard where the disk is most filled (64%) on another node. frostfs-cli --config /etc/frostfs/storage/control.yml control shards list frostfs-cli --config /etc/frostfs/storage/control.yml control shards set-mode --mode read-only --id E6kboFMNVs6KbfkXJCHCJn frostfs-cli --config /etc/frostfs/storage/control.yml control shards evacuation start --await --id E6kboFMNVs6KbfkXJCHCJn --no-errors Output: Shard evacuation has been successfully started. Progress will be reported every 5 seconds. Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 5 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:04. Estimated time left: 2 minutes. Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 10 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:09. Estimated time left: 3 minutes. Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 14 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:14. Estimated time left: 3 minutes. Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 19 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:00:19. Estimated time left: 3 minutes. ... Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Status: running. Evacuated 220 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:04:24. Estimated time left: 0 minutes. Shard evacuation has been completed. Shard IDs: E6kboFMNVs6KbfkXJCHCJn. Evacuated 225 object out of 225, failed to evacuate 0 objects. Started at: 2023-09-29T13:45:58Z UTC. Duration: 00:04:28. Environment: sbercloud v1.3.0-178