Detach shard with frostfs-cli control shards detach
command #945
No reviewers
Labels
No labels
P0
P1
P2
P3
badger
frostfs-adm
frostfs-cli
frostfs-ir
frostfs-lens
frostfs-node
good first issue
triage
Infrastructure
blocked
bug
config
discussion
documentation
duplicate
enhancement
go
help wanted
internal
invalid
kludge
observability
perfomance
question
refactoring
wontfix
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: TrueCloudLab/frostfs-node#945
Loading…
Reference in a new issue
No description provided.
Delete branch "dstepanov-yadro/frostfs-node:feat/disable_shard"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Relates #917
Added new command
frostfs-cli control shards detach
.It temporary detaches defined shards (removes shards IDs from engine and closes metabase, writecache, blobstore, pilorama).
Limitations: SIGHUP or restart leads to detached shard will be again attached.
Also documentation was updated and developers wallet
dev/wallet.json
was added as allowed key for VSCode debug config example.Example:
Also checked
SIGHUP
scenario on dev-env.Addto WIP: Addfrostfs-cli control shards set-mode disabled
commandfrostfs-cli control shards set-mode disabled
commandbd6da4d41c
toc03134a287
c03134a287
toc48144ccf1
7aa0ff295b
to01bcf5ff42
01bcf5ff42
tod743a4d9ed
d743a4d9ed
toe2f2c95143
e2f2c95143
tof3754479eb
WIP: Addto Disable shard withfrostfs-cli control shards set-mode disabled
commandfrostfs-cli control shards set-mode --mode disabled
command@ -344,6 +345,87 @@ func (e *StorageEngine) HandleNewEpoch(ctx context.Context, epoch uint64) {
}
}
func (e *StorageEngine) DisableShards(ids []*shard.ID) error {
Why is it a separate method and not a
SetMode
extension?Because this 'DISABLED' mode is not an actual shard mode like 'READ-ONLY' or other. It has other meaning: detach shard (release all resources and close shard).
Now
engine.SetShardMode
with mode = mode.Disabled doesn't detach shard, but just moves shard toDISABLED
mode, so shard holds resources, but doesn't allow to read/write any objects.@ -347,0 +350,4 @@
return logicerr.New("ids must be non-empty")
}
deletedShards, err := e.deleteShards(ids)
If we first delete, then resources could leak, because after the error in
closeShards()
we will have untraceable, but existing shardsThere are two points here:
SIGHUP
handler removes shards the same way:func (e *StorageEngine) removeShards(ids ...string) {
Also engine acquires lock to delete shards, but closing shard can take a lot of time.
Maybe in such case (shard detached from engine, but failed to close) node must panic?
ok, let's leave it like this
dangerous, I think we better postpone this until we support
disabled
mode, so that the shard is retained and alerts can be thworn@ -347,0 +362,4 @@
// Returns single error with joined shard errors.
func (e *StorageEngine) closeShards(deletedShards []hashedShard) error {
var multiErr error
for _, sh := range deletedShards {
Do we have any reason not to do it in parallel (besides simplicity)?
Right, fixed.
@ -45,0 +60,4 @@
require.NoError(t, e.DisableShards([]*shard.ID{ids[0]}))
require.Equal(t, 1, len(e.shards))
Can we also add subsequent "readdition" of the removed shard here? (corresponds to sighup)
It is too hard too reproduce
SIGHUP
f3754479eb
to104292d43a
Discussed with @fyrchik : it's better to do separate
frostfs-cli constrol shards detach
command.Disable shard withto WIP: Disable shard withfrostfs-cli control shards set-mode --mode disabled
commandfrostfs-cli control shards set-mode --mode disabled
command104292d43a
to5036a39509
5036a39509
tod7d0c1905f
d7d0c1905f
to5e2cd54565
WIP: Disable shard withto Detach shard withfrostfs-cli control shards set-mode --mode disabled
commandfrostfs-cli control shards detach
command5e2cd54565
to359841c15d
359841c15d
to80ce70443d
@ -347,0 +376,4 @@
zap.Error(err),
)
multiErrGuard.Lock()
multiErr = errors.Join(multiErr, fmt.Errorf("could not change shard (id:%s) mode to disabled: %w", sh.ID().String(), err))
%s
doesn't require.String()
in the argument listfixed
80ce70443d
tod7838790c6