engine: Allow to remove redundant object copies #191

fyrchik · 2023-03-30T13:40:32Z

fyrchik commented

2023-03-30 13:40:32 +00:00

RemoveDuplicates() accepts a single shard and removes all copies stored on
other shards. The naming comes from the fact that it could support
objects removal from the shard which are lower in the HRW vector.

What to test:

Increasing --concurrency flag (1 vs 1000) should have noticeable effect on the command execution time.
Cancelling CLI operation should cancel background command (no delete operations in logs).
All deleted object should be available from the same node object get --ttl 1.

RemoveDuplicates() accepts a single shard and removes all copies stored on other shards. The naming comes from the fact that it could support objects removal from the shard which are lower in the HRW vector. What to test: 1. Increasing `--concurrency` flag (1 vs 1000) should have noticeable effect on the command execution time. 2. Cancelling CLI operation should cancel background command (no delete operations in logs). 3. All deleted object should be available from the same node `object get --ttl 1`.

fyrchik commented

2023-04-05 11:13:39 +00:00

Blocked until further discussion.

fyrchik added the

blocked

label 2023-04-05 11:14:27 +00:00

fyrchik force-pushed shard-reinsertion from 7d2e088e68 to 945367bb94

2023-04-07 11:52:40 +00:00

Compare

fyrchik changed title from ~~WIP: engine: Allow to remove redundant object copies~~ to engine: Allow to remove redundant object copies

2023-04-07 11:52:48 +00:00

fyrchik added the

frostfs-node

label 2023-04-07 11:53:18 +00:00

aarifullin reviewed 2023-04-07 12:05:13 +00:00

cmd/frostfs-cli/modules/control/doctor.go Outdated

					
				@ -0,0 +24,4 @@

					pk := key.Get(cmd)

					req := &control.DoctorRequest{Body: new(control.DoctorRequest_Body)}

					req.Body.Concurrency, _ = cmd.Flags().GetUint32(concurrencyFlag)

aarifullin commented

2023-04-07 12:05:13 +00:00

[Optinal]

Using GetUint32, GetBool is fine, but if new flags are added to doctor command, then it will be not obvious that concurrencyFlag is uint32 and someNewFlag is someType

I think that is fine to use global variables to initialize flags

ff := doctorCmd.Flags()
ff.BoolVar(&concurrencyVarFlag, concurrencyFlag, false, "Remove duplicate objects")

[Optinal] Using `GetUint32`, `GetBool` is fine, but if new flags are added to `doctor` command, then it will be not obvious that `concurrencyFlag` is uint32 and `someNewFlag` is `someType` I think that is fine to use global variables to initialize flags ``` ff := doctorCmd.Flags() ff.BoolVar(&concurrencyVarFlag, concurrencyFlag, false, "Remove duplicate objects") ```

fyrchik commented

2023-04-07 13:34:15 +00:00

We had a problem with this, because in some cases "the same" flag should have different descriptions/defaults in different commands. With many global variables this had become a mess.

Anyway, I suggest discussing it separately and implementing in all CLI commands atomically, after reaching consensus.

We had a problem with this, because in some cases "the same" flag should have different descriptions/defaults in different commands. With many global variables this had become a mess. Anyway, I suggest discussing it separately and implementing in all CLI commands atomically, after reaching consensus.

👍 1

aarifullin marked this conversation as resolved

aarifullin reviewed 2023-04-07 12:28:22 +00:00

pkg/local_object_storage/engine/remove_copies.go Outdated

					
				@ -0,0 +120,4 @@

							}

							var deletePrm shard.DeletePrm

							deletePrm.SetAddresses(addr)

aarifullin commented

2023-04-07 12:28:22 +00:00

Wouldn't be helpful to log shards that have had the same object?

fyrchik commented

2023-04-07 12:45:09 +00:00

Given the amount of logs we have, no. The only use-case I see is for testing.
Deletion operation is already logged, may be we can add a single log entry when we start processing a shard.

Given the amount of logs we have, no. The only use-case I see is for testing. Deletion operation is already logged, may be we can add a single log entry when we start processing a shard.

👍 1

aarifullin marked this conversation as resolved

dstepanov-yadro reviewed 2023-04-07 13:12:14 +00:00

pkg/local_object_storage/engine/remove_copies.go Outdated

					
				@ -0,0 +61,4 @@

							var cursor *meta.Cursor

							for {

								var listPrm shard.ListWithCursorPrm

								listPrm.WithCount(uint32(prm.Concurrency))

dstepanov-yadro commented

2023-04-07 13:05:23 +00:00

Why count = prm.Concurrency?

Why count = ```prm.Concurrency```?

fyrchik commented

2023-04-07 13:18:15 +00:00

Even named constant is magic in this case and it seems logic to depend on the number of workers which process listed object.

What else could we use here?

Even named constant is magic in this case and it seems logic to depend on the number of workers which process listed object. What else could we use here?

dstepanov-yadro commented

2023-04-07 13:26:24 +00:00

If prm.Concurrency = 1 then there will be too many bbolt requests, it seems to me.

What else could we use here?

If I knew for sure... But if you don't have any other ideas, I agree with this approach.

If ```prm.Concurrency = 1``` then there will be too many bbolt requests, it seems to me. > What else could we use here? If I knew for sure... But if you don't have any other ideas, I agree with this approach.

dstepanov-yadro marked this conversation as resolved

pkg/local_object_storage/engine/remove_copies.go Outdated

					
				@ -0,0 +81,4 @@

							}

						})

						for i := 0; i < defaultRemoveDuplicatesConcurrency; i++ {

dstepanov-yadro commented

2023-04-07 13:04:08 +00:00

Not defaultRemoveDuplicatesConcurrency, but prm.Concurrency ?

Not ```defaultRemoveDuplicatesConcurrency```, but ```prm.Concurrency``` ?

👍 1

fyrchik commented

2023-04-07 13:36:11 +00:00

Fixed.

dstepanov-yadro marked this conversation as resolved

pkg/local_object_storage/engine/remove_copies.go Outdated

					
				@ -0,0 +109,4 @@

							var existsPrm shard.ExistsPrm

							existsPrm.SetAddress(addr)

							res, err := shards[i].Exists(existsPrm)

dstepanov-yadro commented

2023-04-07 13:10:29 +00:00

Do we need to exclude shard, where object was found? If object is placed on single shard, will it be deleted?

aarifullin commented

2023-04-07 13:19:23 +00:00

If object is placed on single shard, will it be deleted?

It won't and I guess that why found flag is needed. The deletion of the first object is ignored by the flag. If object is met again, then _, err = shards[i].Delete(deletePrm)

> If object is placed on single shard, will it be deleted? It won't and I guess that why `found` flag is needed. The deletion of the first object is ignored by the flag. If object is met again, then `_, err = shards[i].Delete(deletePrm)`

fyrchik commented

2023-04-07 13:23:19 +00:00

Here is the logic:

Take object X from the shard A.
Sort shards with HRW.
The first shard an object is found in is considered "the best".
The object is removed from all other shards.

Here is the logic: 1. Take object X from the shard A. 2. Sort shards with HRW. 3. The first shard an object is found in is considered "the best". 4. The object is removed from all other shards.

dstepanov-yadro marked this conversation as resolved

pkg/services/control/rpc.go Outdated

					
				@ -194,0 +198,4 @@

					wResp := &doctorResponseWrapper{new(DoctorResponse)}

					wReq := &requestWrapper{m: req}

					err := client.SendUnary(cli, common.CallMethodInfoUnary(serviceName, rpcDoctor), wReq, wResp, opts...)

dstepanov-yadro commented

2023-04-07 12:57:35 +00:00

There is no timeout for RPC call?

fyrchik commented

2023-04-07 13:38:30 +00:00

It is hidden inside the client. https://git.frostfs.info/TrueCloudLab/frostfs-node/src/branch/master/cmd/frostfs-cli/internal/client/sdk.go#L53 (yes, we could improve this)

dstepanov-yadro marked this conversation as resolved

pkg/services/control/server/doctor.go Outdated

					
				@ -0,0 +9,4 @@

					"google.golang.org/grpc/status"

				)

				func (s *Server) Doctor(ctx context.Context, req *control.DoctorRequest) (*control.DoctorResponse, error) {