Detach shard with frostfs-cli control shards detach command #945

Merged
fyrchik merged 4 commits from dstepanov-yadro/frostfs-node:feat/disable_shard into master 2024-02-06 12:12:21 +00:00

Relates #917

Added new command frostfs-cli control shards detach.
It temporary detaches defined shards (removes shards IDs from engine and closes metabase, writecache, blobstore, pilorama).
Limitations: SIGHUP or restart leads to detached shard will be again attached.

Also documentation was updated and developers wallet dev/wallet.json was added as allowed key for VSCode debug config example.

Example:

developer:computer:~$ ./frostfs-cli control shards list --wallet ./dev/wallet.json --endpoint localhost:8081
Enter password > 
Shard SekrSf7eqnjr9C8cMz53pC:
Mode: read-write
Metabase: /home/developer/frostfs-node/.cache/storage/meta1
Blobstor:
        Path 0: /home/developer/frostfs-node/.cache/storage/blobovnicza1
        Type 0: blobovnicza
        Path 1: /home/developer/frostfs-node/.cache/storage/fstree1
        Type 1: fstree
Write-cache: /home/developer/frostfs-node/.cache/storage/wc1
Pilorama: /home/developer/frostfs-node/.cache/storage/pilorama1
Error count: 0
Shard VzQknC8PwYVMHFUyQnt2EE:
Mode: read-write
Metabase: /home/developer/frostfs-node/.cache/storage/meta0
Blobstor:
        Path 0: /home/developer/frostfs-node/.cache/storage/blobovnicza0
        Type 0: blobovnicza
        Path 1: /home/developer/frostfs-node/.cache/storage/fstree0
        Type 1: fstree
Write-cache: /home/developer/frostfs-node/.cache/storage/wc0
Pilorama: /home/developer/frostfs-node/.cache/storage/pilorama0
Error count: 0

developer:computer:~$ ./frostfs-cli control shards detach --id SekrSf7eqnjr9C8cMz53pC,VzQknC8PwYVMHFUyQnt2EE --wallet ./dev/wallet.json --en
dpoint localhost:8081
Enter password > 
rpc error: rpc error: code = InvalidArgument desc = could not delete all the shards

developer:computer:~$ ./frostfs-cli control shards detach --id VzQknC8PwYVMHFUyQnt2EB --wallet ./dev/wallet.json --endpoint localhost:8081
Enter password > 
rpc error: rpc error: code = InvalidArgument desc = shard not found

developer:computer:~$ ./frostfs-cli control shards detach --id VzQknC8PwYVMHFUyQnt2EE --wallet ./dev/wallet.json --endpoint localhost:8081
Enter password > 
Shard mode update request successfully sent.

developer:computer:~$ ./frostfs-cli control shards list --wallet ./dev/wallet.json --endpoint localhost:8081
Enter password > 
Shard SekrSf7eqnjr9C8cMz53pC:
Mode: read-write
Metabase: /home/developer/frostfs-node/.cache/storage/meta1
Blobstor:
        Path 0: /home/developer/frostfs-node/.cache/storage/blobovnicza1
        Type 0: blobovnicza
        Path 1: /home/developer/frostfs-node/.cache/storage/fstree1
        Type 1: fstree
Write-cache: /home/developer/frostfs-node/.cache/storage/wc1
Pilorama: /home/developer/frostfs-node/.cache/storage/pilorama1
Error count: 0

Also checked SIGHUP scenario on dev-env.

Relates #917 Added new command `frostfs-cli control shards detach`. It temporary detaches defined shards (removes shards IDs from engine and closes metabase, writecache, blobstore, pilorama). Limitations: SIGHUP or restart leads to detached shard will be again attached. Also documentation was updated and developers wallet `dev/wallet.json` was added as allowed key for VSCode debug config example. Example: ```shell developer:computer:~$ ./frostfs-cli control shards list --wallet ./dev/wallet.json --endpoint localhost:8081 Enter password > Shard SekrSf7eqnjr9C8cMz53pC: Mode: read-write Metabase: /home/developer/frostfs-node/.cache/storage/meta1 Blobstor: Path 0: /home/developer/frostfs-node/.cache/storage/blobovnicza1 Type 0: blobovnicza Path 1: /home/developer/frostfs-node/.cache/storage/fstree1 Type 1: fstree Write-cache: /home/developer/frostfs-node/.cache/storage/wc1 Pilorama: /home/developer/frostfs-node/.cache/storage/pilorama1 Error count: 0 Shard VzQknC8PwYVMHFUyQnt2EE: Mode: read-write Metabase: /home/developer/frostfs-node/.cache/storage/meta0 Blobstor: Path 0: /home/developer/frostfs-node/.cache/storage/blobovnicza0 Type 0: blobovnicza Path 1: /home/developer/frostfs-node/.cache/storage/fstree0 Type 1: fstree Write-cache: /home/developer/frostfs-node/.cache/storage/wc0 Pilorama: /home/developer/frostfs-node/.cache/storage/pilorama0 Error count: 0 developer:computer:~$ ./frostfs-cli control shards detach --id SekrSf7eqnjr9C8cMz53pC,VzQknC8PwYVMHFUyQnt2EE --wallet ./dev/wallet.json --en dpoint localhost:8081 Enter password > rpc error: rpc error: code = InvalidArgument desc = could not delete all the shards developer:computer:~$ ./frostfs-cli control shards detach --id VzQknC8PwYVMHFUyQnt2EB --wallet ./dev/wallet.json --endpoint localhost:8081 Enter password > rpc error: rpc error: code = InvalidArgument desc = shard not found developer:computer:~$ ./frostfs-cli control shards detach --id VzQknC8PwYVMHFUyQnt2EE --wallet ./dev/wallet.json --endpoint localhost:8081 Enter password > Shard mode update request successfully sent. developer:computer:~$ ./frostfs-cli control shards list --wallet ./dev/wallet.json --endpoint localhost:8081 Enter password > Shard SekrSf7eqnjr9C8cMz53pC: Mode: read-write Metabase: /home/developer/frostfs-node/.cache/storage/meta1 Blobstor: Path 0: /home/developer/frostfs-node/.cache/storage/blobovnicza1 Type 0: blobovnicza Path 1: /home/developer/frostfs-node/.cache/storage/fstree1 Type 1: fstree Write-cache: /home/developer/frostfs-node/.cache/storage/wc1 Pilorama: /home/developer/frostfs-node/.cache/storage/pilorama1 Error count: 0 ``` Also checked `SIGHUP` scenario on dev-env.
dstepanov-yadro changed title from Add `frostfs-cli control shards set-mode disabled` command to WIP: Add `frostfs-cli control shards set-mode disabled` command 2024-01-30 15:35:08 +00:00
dstepanov-yadro force-pushed feat/disable_shard from bd6da4d41c to c03134a287 2024-01-31 07:02:10 +00:00 Compare
dstepanov-yadro force-pushed feat/disable_shard from c03134a287 to c48144ccf1 2024-01-31 09:58:15 +00:00 Compare
dstepanov-yadro force-pushed feat/disable_shard from 7aa0ff295b to 01bcf5ff42 2024-01-31 11:02:13 +00:00 Compare
dstepanov-yadro force-pushed feat/disable_shard from 01bcf5ff42 to d743a4d9ed 2024-01-31 11:04:25 +00:00 Compare
dstepanov-yadro force-pushed feat/disable_shard from d743a4d9ed to e2f2c95143 2024-01-31 12:23:12 +00:00 Compare
dstepanov-yadro force-pushed feat/disable_shard from e2f2c95143 to f3754479eb 2024-01-31 12:42:45 +00:00 Compare
dstepanov-yadro changed title from WIP: Add `frostfs-cli control shards set-mode disabled` command to Disable shard with `frostfs-cli control shards set-mode --mode disabled` command 2024-01-31 12:55:35 +00:00
dstepanov-yadro requested review from storage-core-committers 2024-01-31 12:55:59 +00:00
dstepanov-yadro requested review from storage-core-developers 2024-01-31 12:55:59 +00:00
aarifullin approved these changes 2024-01-31 14:14:12 +00:00
fyrchik reviewed 2024-01-31 14:28:45 +00:00
@ -344,6 +345,87 @@ func (e *StorageEngine) HandleNewEpoch(ctx context.Context, epoch uint64) {
}
}
func (e *StorageEngine) DisableShards(ids []*shard.ID) error {

Why is it a separate method and not a SetMode extension?

Why is it a separate method and not a `SetMode` extension?
Poster
Collaborator

Because this 'DISABLED' mode is not an actual shard mode like 'READ-ONLY' or other. It has other meaning: detach shard (release all resources and close shard).
Now engine.SetShardMode with mode = mode.Disabled doesn't detach shard, but just moves shard to DISABLED mode, so shard holds resources, but doesn't allow to read/write any objects.

Because this 'DISABLED' mode is not an actual shard mode like 'READ-ONLY' or other. It has other meaning: detach shard (release all resources and close shard). Now `engine.SetShardMode` with mode = mode.Disabled doesn't detach shard, but just moves shard to `DISABLED` mode, so shard holds resources, but doesn't allow to read/write any objects.
@ -347,0 +350,4 @@
return logicerr.New("ids must be non-empty")
}
deletedShards, err := e.deleteShards(ids)

If we first delete, then resources could leak, because after the error in closeShards() we will have untraceable, but existing shards

If we first delete, then resources could leak, because after the error in `closeShards()` we will have untraceable, but existing shards
Poster
Collaborator

There are two points here:

  1. SIGHUP handler removes shards the same way: 483a67b170/pkg/local_object_storage/engine/shards.go (L192)
  2. It's unclear what to do with failed to close shards. These shards may be in some undefined state. If engine holds references of failed to close shards, then engine must be able to do something with these shards.

Also engine acquires lock to delete shards, but closing shard can take a lot of time.

Maybe in such case (shard detached from engine, but failed to close) node must panic?

There are two points here: 1. `SIGHUP` handler removes shards the same way: https://git.frostfs.info/TrueCloudLab/frostfs-node/src/commit/483a67b17064990d69e39e23d7b3fd6c72d9bfb3/pkg/local_object_storage/engine/shards.go#L192 2. It's unclear what to do with failed to close shards. These shards may be in some undefined state. If engine holds references of failed to close shards, then engine must be able to do something with these shards. Also engine acquires lock to delete shards, but closing shard can take a lot of time. Maybe in such case (shard detached from engine, but failed to close) node must panic?

SIGHUP handler removes shards the same way

ok, let's leave it like this

Maybe in such case (shard detached from engine, but failed to close) node must panic?

dangerous, I think we better postpone this until we support disabled mode, so that the shard is retained and alerts can be thworn

>SIGHUP handler removes shards the same way ok, let's leave it like this >Maybe in such case (shard detached from engine, but failed to close) node must panic? dangerous, I think we better postpone this until we support `disabled` mode, so that the shard is retained and alerts can be thworn
fyrchik marked this conversation as resolved
@ -347,0 +362,4 @@
// Returns single error with joined shard errors.
func (e *StorageEngine) closeShards(deletedShards []hashedShard) error {
var multiErr error
for _, sh := range deletedShards {

Do we have any reason not to do it in parallel (besides simplicity)?

Do we have any reason not to do it in parallel (besides simplicity)?
Poster
Collaborator

Right, fixed.

Right, fixed.
fyrchik marked this conversation as resolved
@ -45,0 +60,4 @@
require.NoError(t, e.DisableShards([]*shard.ID{ids[0]}))
require.Equal(t, 1, len(e.shards))

Can we also add subsequent "readdition" of the removed shard here? (corresponds to sighup)

Can we also add subsequent "readdition" of the removed shard here? (corresponds to sighup)
Poster
Collaborator

It is too hard too reproduce SIGHUP

It is too hard too reproduce `SIGHUP`
fyrchik marked this conversation as resolved
dstepanov-yadro force-pushed feat/disable_shard from f3754479eb to 104292d43a 2024-01-31 15:02:18 +00:00 Compare
Poster
Collaborator

Discussed with @fyrchik : it's better to do separate frostfs-cli constrol shards detach command.

Discussed with @fyrchik : it's better to do separate `frostfs-cli constrol shards detach` command.
dstepanov-yadro changed title from Disable shard with `frostfs-cli control shards set-mode --mode disabled` command to WIP: Disable shard with `frostfs-cli control shards set-mode --mode disabled` command 2024-01-31 15:24:41 +00:00
dstepanov-yadro force-pushed feat/disable_shard from 104292d43a to 5036a39509 2024-01-31 16:50:02 +00:00 Compare
dstepanov-yadro force-pushed feat/disable_shard from 5036a39509 to d7d0c1905f 2024-02-01 07:00:46 +00:00 Compare
dstepanov-yadro force-pushed feat/disable_shard from d7d0c1905f to 5e2cd54565 2024-02-01 07:12:28 +00:00 Compare
dstepanov-yadro requested review from aarifullin 2024-02-01 08:42:59 +00:00
dstepanov-yadro changed title from WIP: Disable shard with `frostfs-cli control shards set-mode --mode disabled` command to Detach shard with `frostfs-cli control shards detach` command 2024-02-01 09:34:19 +00:00
dstepanov-yadro requested review from storage-core-developers 2024-02-01 09:34:34 +00:00
dstepanov-yadro force-pushed feat/disable_shard from 5e2cd54565 to 359841c15d 2024-02-01 09:38:59 +00:00 Compare
acid-ant approved these changes 2024-02-02 08:50:45 +00:00
aarifullin approved these changes 2024-02-02 09:10:50 +00:00
dstepanov-yadro force-pushed feat/disable_shard from 359841c15d to 80ce70443d 2024-02-02 14:56:46 +00:00 Compare
acid-ant approved these changes 2024-02-05 10:06:56 +00:00
fyrchik approved these changes 2024-02-06 08:18:13 +00:00
@ -347,0 +376,4 @@
zap.Error(err),
)
multiErrGuard.Lock()
multiErr = errors.Join(multiErr, fmt.Errorf("could not change shard (id:%s) mode to disabled: %w", sh.ID().String(), err))

%s doesn't require .String() in the argument list

`%s` doesn't require `.String()` in the argument list
Poster
Collaborator

fixed

fixed
dstepanov-yadro force-pushed feat/disable_shard from 80ce70443d to d7838790c6 2024-02-06 11:50:55 +00:00 Compare
fyrchik merged commit d7838790c6 into master 2024-02-06 12:12:21 +00:00
Sign in to join this conversation.
No reviewers
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#945
There is no content yet.