Async evacuate #329

dstepanov-yadro · 2023-05-05T14:31:34Z

dstepanov-yadro commented

2023-05-05 14:31:34 +00:00

Closes #109

dstepanov-yadro force-pushed feat/async-evacuate from 8d25593d44 to 9822a7835b

2023-05-05 14:36:21 +00:00

Compare

dstepanov-yadro force-pushed feat/async-evacuate from 9822a7835b to 9f0b4d7125

2023-05-05 14:56:15 +00:00

Compare

dstepanov-yadro force-pushed feat/async-evacuate from 9f0b4d7125 to 24cf2a2fff

2023-05-05 15:11:11 +00:00

Compare

dstepanov-yadro force-pushed feat/async-evacuate from 24cf2a2fff to eb1937eb1e

2023-05-10 06:43:16 +00:00

Compare

dstepanov-yadro force-pushed feat/async-evacuate from eb1937eb1e to 4964fa434b

2023-05-10 11:04:45 +00:00

Compare

dstepanov-yadro force-pushed feat/async-evacuate from 4964fa434b to 44a3bec8ee

2023-05-10 11:09:49 +00:00

Compare

dstepanov-yadro force-pushed feat/async-evacuate from 44a3bec8ee to e13ec5c662

2023-05-10 13:20:45 +00:00

Compare

requested reviews from storage-core-committers, storage-core-developers

2023-05-10 13:24:27 +00:00

dstepanov-yadro changed title from ~~WIP: Async evacuate~~ to Async evacuate

2023-05-10 13:24:35 +00:00

dstepanov-yadro force-pushed feat/async-evacuate from e13ec5c662 to 6d3f2a4670

2023-05-15 09:17:12 +00:00

Compare

aarifullin reviewed 2023-05-15 09:37:15 +00:00

pkg/services/control/server/evacuate_async.go Outdated

					
				@ -0,0 +21,4 @@

					prm.WithShardIDList(s.getShardIDList(req.GetBody().GetShard_ID()))

					prm.WithIgnoreErrors(req.GetBody().GetIgnoreErrors())

					prm.WithFaultHandler(s.replicate)

					prm.WithAsync(true)

aarifullin commented

2023-05-15 09:37:15 +00:00

For me it's obvios that WithAsync sets async flag and there's no need to pass the boolean argument - just WithAsync() - WDYT?

For me it's obvios that `WithAsync` sets `async` flag and there's no need to pass the boolean argument - just `WithAsync()` - WDYT?

fyrchik commented

2023-05-15 16:23:35 +00:00

It is usually more convenient because:

Easier to pass down the stream (WithAsync(async) instead of if async { WithAsync }.
Easier to override (e.g. we have defaults in tests, but don't want Async for one specific test).

It is usually more convenient because: 1. Easier to pass down the stream (`WithAsync(async)` instead of `if async { WithAsync }`. 2. Easier to override (e.g. we have defaults in tests, but don't want Async for one specific test).

👍 1

aarifullin reviewed 2023-05-15 09:45:12 +00:00

pkg/local_object_storage/engine/evacuate.go Outdated

					
				@ -65,2 +117,3 @@

				// The shard being moved must be in read-only mode.

				func (e *StorageEngine) Evacuate(ctx context.Context, prm EvacuateShardPrm) (EvacuateShardRes, error) {

				func (e *StorageEngine) Evacuate(ctx context.Context, prm EvacuateShardPrm) (*EvacuateShardRes, error) {

					select {

aarifullin commented

2023-05-15 09:45:12 +00:00

Could you explain, please, why have you put these select-s at the beginning of the methods?
For me this select looks inconvinient. This check should be performed by the code that runs something asynchroniously (like the errgroup below)

Could you explain, please, why have you put these `select`-s at the beginning of the methods? For me this `select` looks inconvinient. This check should be performed by the code that runs something asynchroniously (like the errgroup below)

dstepanov-yadro commented

2023-05-15 10:22:25 +00:00

RPC call has deadline, so <-ctx.Done() can be true.

RPC call has deadline, so `<-ctx.Done()` can be true.

aarifullin commented

2023-05-15 10:51:18 +00:00

can be true

No doubt.

But why don't we rely on the context cancellation on errgroup.Go invocation level?

> can be true No doubt. But why don't we rely on the context cancellation on `errgroup.Go` invocation level?

dstepanov-yadro commented

2023-05-15 12:01:13 +00:00

errGroup can use detached context (context.Background()) in case of async execution, so this check will be the only one for ctx.Done

fyrchik reviewed 2023-05-15 16:24:10 +00:00

pkg/local_object_storage/engine/evacuate_test.go Outdated

					
				@ -80,3 +83,3 @@

					t.Parallel()

					const objPerShard = 3

					var objPerShard uint32 = 3

fyrchik commented

2023-05-15 16:19:05 +00:00

But why?

dstepanov-yadro commented

2023-05-16 07:21:20 +00:00

fixed

fyrchik marked this conversation as resolved

pkg/local_object_storage/engine/evacuate_test.go Outdated

					
				@ -262,0 +316,4 @@

					prm.shardID = ids[1:2]

					prm.handler = func(ctx context.Context, a oid.Address, o *objectSDK.Object) error {

						running.Store(true)

						for !blocker.Load() {

fyrchik commented

2023-05-15 16:21:11 +00:00

We set it only one time, what about closing the channel instead?

dstepanov-yadro commented

2023-05-16 07:21:11 +00:00

fixed

fyrchik marked this conversation as resolved

dstepanov-yadro force-pushed feat/async-evacuate from 6d3f2a4670 to 1bb04aa03a

2023-05-16 07:20:05 +00:00

Compare

dstepanov-yadro force-pushed feat/async-evacuate from 1bb04aa03a to 785d58e76d

2023-05-16 07:25:20 +00:00

Compare

realloc reviewed 2023-05-16 12:51:15 +00:00

docs/evacuation.md Outdated

					
				@ -0,0 +6,4 @@

				To start the evacuation, it is necessary that the shard is in read-only mode (read more [here](./shard-modes.md)).

				First of all, by the evacuation the data is transferred to other shards of the same node; if it is not possible, then the data is transferred to other nodes.

realloc commented

2023-05-16 12:51:15 +00:00

Is it true, that if the one wants to migrate all the data out of a storage node, she needs to add all shards with a shardAllFlag toggled? If so, can we emphasize it in documentation?

Is it true, that if the one wants to migrate all the data out of a storage node, she needs to add all shards with a `shardAllFlag` toggled? If so, can we emphasize it in documentation?

👍 1

dstepanov-yadro commented

2023-05-16 14:24:10 +00:00

Added to Commands section.

Added to `Commands` section.

acid-ant approved these changes 2023-05-16 13:13:53 +00:00

cmd/frostfs-cli/modules/control/evacuation.go Outdated

					
				@ -0,0 +135,4 @@

					const reportIntervalSeconds = 5

					var resp *control.GetEvacuateShardStatusResponse

					reportResponse := atomic.NewPointer(resp)

					poolingCompleted := make(chan interface{})

acid-ant commented

2023-05-16 13:01:44 +00:00

Is it possible to use any here and below?

Is it possible to use `any` here and below?

dstepanov-yadro commented

2023-05-16 13:41:01 +00:00

fixed to struct{}

pkg/local_object_storage/engine/evacuate.go Outdated

					
				@ -86,0 +166,4 @@

					}

					err = eg.Wait()

					return res, err

acid-ant commented

2023-05-16 13:02:51 +00:00

Is it possible to squash it in one line?

dstepanov-yadro commented

2023-05-16 13:55:46 +00:00

fixed.

pkg/services/control/server/convert.go Outdated

					
				@ -0,0 +36,4 @@

						}

					}

					var duration *control.GetEvacuateShardStatusResponse_Body_Duration

					if state.StartedAt() != nil {

acid-ant commented

2023-05-16 13:07:08 +00:00

Is it possible to use one if statement here and above?

Is it possible to use one `if` statement here and above?

dstepanov-yadro commented

2023-05-16 13:55:33 +00:00

It is possible. But first if is for startedAt value, and second if for duration value. It's easier for me this way, i'm old.

It is possible. But first `if` is for `startedAt` value, and second `if` for `duration` value. It's easier for me this way, i'm old.

fyrchik reviewed 2023-05-16 13:18:23 +00:00

cmd/frostfs-cli/modules/control/evacuation.go Outdated

					
				@ -0,0 +135,4 @@

					const reportIntervalSeconds = 5

					var resp *control.GetEvacuateShardStatusResponse

					reportResponse := atomic.NewPointer(resp)

					poolingCompleted := make(chan interface{})

fyrchik commented

2023-05-16 13:00:59 +00:00

pooling or polling?

dstepanov-yadro commented

2023-05-16 13:40:48 +00:00

polling. fixed.

fyrchik marked this conversation as resolved

cmd/frostfs-cli/modules/control/evacuation.go Outdated

					
				@ -0,0 +136,4 @@

					var resp *control.GetEvacuateShardStatusResponse

					reportResponse := atomic.NewPointer(resp)

					poolingCompleted := make(chan interface{})

					progressReportCompleted := make(chan interface{})

fyrchik commented

2023-05-16 13:01:39 +00:00

Also, why interface{} and not struct{}? It seems we only close it.

Also, why `interface{}` and not `struct{}`? It seems we only close it.

dstepanov-yadro commented

2023-05-16 13:40:36 +00:00

Ok, struct{}. Fixed.

fyrchik marked this conversation as resolved

cmd/frostfs-cli/modules/control/evacuation.go Outdated

					
				@ -0,0 +138,4 @@

					poolingCompleted := make(chan interface{})

					progressReportCompleted := make(chan interface{})

					go func() {

fyrchik commented

2023-05-16 13:03:32 +00:00

Why do we need a goroutine here? It seems we already sleep in the main loop.

dstepanov-yadro commented

2023-05-16 13:39:43 +00:00

Goroutine prints report every N seconds. Main loop makes request and sleeps.

👍 1

fyrchik marked this conversation as resolved

pkg/local_object_storage/engine/evacuate_limiter.go Outdated

					
				@ -0,0 +95,4 @@

					return s.errMessage

				}

				func (s *EvacuationState) NextTryAfterSeconds() int64 {

fyrchik commented

2023-05-16 13:14:57 +00:00

Do we need to specify client polling interval here in the engine?

Do we need to specify _client_ polling interval here in the engine?

dstepanov-yadro commented

2023-05-16 13:44:27 +00:00

To reduce the load, the server can increase this interval. But now it's constant.
Inspired by OAuth2 device code flow: https://learn.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-device-code#device-authorization-response

To reduce the load, the server can increase this interval. But now it's constant. Inspired by OAuth2 device code flow: https://learn.microsoft.com/en-us/azure/active-directory/develop/v2-oauth2-device-code#device-authorization-response

dstepanov-yadro commented

2023-05-17 08:21:04 +00:00

Ok, dropped

fyrchik marked this conversation as resolved

pkg/local_object_storage/engine/evacuate_limiter.go Outdated

					
				@ -0,0 +103,4 @@

					if s == nil {

						return nil

					}

					shardIDs := make([]string, len(s.shardIDs))

fyrchik commented

2023-05-16 13:17:24 +00:00

Isn't it read-only?

dstepanov-yadro commented

2023-05-16 13:50:52 +00:00

It is. But since the method is called DeepCopy, then the copy must be deep.

fyrchik marked this conversation as resolved

pkg/local_object_storage/engine/evacuate_limiter.go Outdated

					
				@ -0,0 +170,4 @@

					l.guard.RLock()

					defer l.guard.RUnlock()

					return l.state.DeepCopy()

fyrchik commented

2023-05-16 13:09:54 +00:00

Why do we need deepcopy here? Pointers to atomics are ok to copy.

dstepanov-yadro commented

2023-05-16 13:49:24 +00:00

There were two ways to ensure consistency: mutex inside the structure or deep copy. I chose the second one. This is my engineering decision.

fyrchik commented

2023-05-17 07:31:57 +00:00

Could you elaborate, what consistency issues do we have if we do not do a deep copy?

dstepanov-yadro commented

2023-05-17 08:23:47 +00:00

func (l *evacuationLimiter) Complete(err error) {
	l.guard.Lock()
	defer l.guard.Unlock()

	errMsq := ""
	if err != nil {
		errMsq = err.Error()
	}
	l.state.processState = EvacuateProcessStateCompleted
	l.state.errMessage = errMsq
	l.state.finishedAt = time.Now().UTC()

	l.eg = nil
}

func (l *evacuationLimiter) GetState() *EvacuationState {
	l.guard.RLock()
	defer l.guard.RUnlock()

	return l.state.DeepCopy()
}

state can change by evacuation goroutine, so we can get completed state without error for example.

``` func (l *evacuationLimiter) Complete(err error) { l.guard.Lock() defer l.guard.Unlock() errMsq := "" if err != nil { errMsq = err.Error() } l.state.processState = EvacuateProcessStateCompleted l.state.errMessage = errMsq l.state.finishedAt = time.Now().UTC() l.eg = nil } func (l *evacuationLimiter) GetState() *EvacuationState { l.guard.RLock() defer l.guard.RUnlock() return l.state.DeepCopy() } ``` `state` can change by evacuation goroutine, so we can get completed state without error for example.

pkg/services/control/service.proto Outdated

					
				@ -30,2 +31,4 @@

				    rpc EvacuateShard (EvacuateShardRequest) returns (EvacuateShardResponse);

				    // StartEvacuateShard starts moving all data from one shard to the others.

				    rpc StartEvacuateShard (StartEvacuateShardRequest) returns (StartEvacuateShardResponse);