node: Add ability to evacuate objects from REP 1 only #1350

Merged
acid-ant merged 1 commit from acid-ant/frostfs-node:feat/evac-skip-rep-one into master 2024-09-30 06:34:32 +00:00
Member

Signed-off-by: Anton Nikiforov an.nikiforov@yadro.com

Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
acid-ant added 2 commits 2024-09-03 12:48:49 +00:00
[#xx] node: Evacuate objects without setting mode to MAINTENANCE
Some checks failed
DCO action / DCO (pull_request) Failing after 3m5s
Tests and linters / Run gofumpt (pull_request) Successful in 3m3s
Tests and linters / Tests (1.23) (pull_request) Successful in 4m45s
Tests and linters / Tests (1.22) (pull_request) Successful in 4m47s
Tests and linters / Lint (pull_request) Successful in 5m3s
Pre-commit hooks / Pre-commit (pull_request) Successful in 5m12s
Vulncheck / Vulncheck (pull_request) Successful in 4m57s
Tests and linters / Staticcheck (pull_request) Successful in 5m39s
Tests and linters / Tests with -race (pull_request) Successful in 6m47s
Build / Build Components (1.23) (pull_request) Successful in 6m58s
Build / Build Components (1.22) (pull_request) Successful in 8m34s
Tests and linters / gopls check (pull_request) Successful in 8m50s
d2c27341a9
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
[#xx] node: Add ability to evacuate objects from REP 1 only
Some checks failed
DCO action / DCO (pull_request) Failing after 5m5s
Tests and linters / Run gofumpt (pull_request) Successful in 5m20s
Vulncheck / Vulncheck (pull_request) Successful in 5m42s
Tests and linters / Tests (1.22) (pull_request) Successful in 7m54s
Tests and linters / Staticcheck (pull_request) Successful in 7m58s
Tests and linters / Tests (1.23) (pull_request) Successful in 8m4s
Pre-commit hooks / Pre-commit (pull_request) Successful in 7m52s
Tests and linters / Lint (pull_request) Successful in 9m10s
Tests and linters / gopls check (pull_request) Successful in 9m3s
Build / Build Components (1.23) (pull_request) Successful in 9m11s
Build / Build Components (1.22) (pull_request) Successful in 9m13s
Tests and linters / Tests with -race (pull_request) Successful in 9m44s
26a007a017
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
acid-ant force-pushed feat/evac-skip-rep-one from 26a007a017 to d64712f25a 2024-09-03 12:50:41 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from d64712f25a to 99441bf077 2024-09-03 13:43:59 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from 99441bf077 to 0574590325 2024-09-04 13:40:25 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from 0574590325 to 37ceddedc4 2024-09-05 13:33:42 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from 37ceddedc4 to aa6083f90c 2024-09-05 13:46:14 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from aa6083f90c to a6986dfcf3 2024-09-06 06:56:27 +00:00 Compare
acid-ant changed title from WIP: node: Add ability to evacuate objects from REP 1 only to node: Add ability to evacuate objects from REP 1 only 2024-09-06 07:01:27 +00:00
acid-ant requested review from storage-core-committers 2024-09-06 07:01:37 +00:00
acid-ant requested review from storage-core-developers 2024-09-06 07:01:38 +00:00
aarifullin reviewed 2024-09-06 07:37:29 +00:00
@ -654,3 +656,3 @@
}
addr := toEvacuate[i].Address
if prm.RepOneOnly {
Member

Could you clarify, please, what a problem does this PR solve? Is there a specific case that requires to evacuate only REP 1 containers? May it be that we'll need to evacuate REP 2, REP 3 etc. only?

Could you clarify, please, what a problem does this PR solve? Is there a specific case that requires to evacuate only `REP 1` containers? May it be that we'll need to evacuate `REP 2`, `REP 3` etc. only?
Author
Member

Just to do it quicker. It is not a recommended way, but may be helpful in emergency situation, when disk is going to die.

Just to do it quicker. It is not a recommended way, but may be helpful in emergency situation, when disk is going to die.
aarifullin marked this conversation as resolved
dstepanov-yadro requested changes 2024-09-06 10:54:56 +00:00
Dismissed
@ -706,0 +720,4 @@
return false, err
}
p := c.Value.PlacementPolicy()
for i := range p.NumberOfReplicas() {

Looks like REP 1 SELECT 1 FROM X1 REP 1 SELECT 1 FROM X2 will defined as repOne.

Looks like `REP 1 SELECT 1 FROM X1 REP 1 SELECT 1 FROM X2` will defined as repOne.
Author
Member

Right, and I think that condition should be strong and simple. In other case, we need to check if current node should store REP 1 or not. Should we do that check instead of the currently implemented?

Right, and I think that condition should be strong and simple. In other case, we need to check if current node should store `REP 1` or not. Should we do that check instead of the currently implemented?

I don't know. But I think there is no difference between REP 1 SELECT 1 FROM X1 REP 1 SELECT 1 FROM X2 and REP 2 SELECT 2 FROM X policies, as both of then require 2 object instances.

I don't know. But I think there is no difference between `REP 1 SELECT 1 FROM X1 REP 1 SELECT 1 FROM X2` and `REP 2 SELECT 2 FROM X` policies, as both of then require 2 object instances.

I was wrong. REP 1 SELECT 1 FROM X1 REP 1 SELECT 1 FROM X2 policy may result in both copies being stored on the same node (without UNIQUE).

I was wrong. `REP 1 SELECT 1 FROM X1 REP 1 SELECT 1 FROM X2` policy may result in both copies being stored on the same node (without `UNIQUE`).
Author
Member

Updated, please review. Also added fix for tests.

Updated, please review. Also added fix for tests.
dstepanov-yadro marked this conversation as resolved
acid-ant force-pushed feat/evac-skip-rep-one from a6986dfcf3 to e281d31fbc 2024-09-09 08:57:04 +00:00 Compare
acid-ant changed title from node: Add ability to evacuate objects from REP 1 only to WIP: node: Add ability to evacuate objects from REP 1 only 2024-09-10 11:20:31 +00:00
acid-ant force-pushed feat/evac-skip-rep-one from e281d31fbc to 69c75a660b 2024-09-18 12:04:28 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from 69c75a660b to 94e8629888 2024-09-18 12:06:01 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from 94e8629888 to 763a7cef1c 2024-09-18 15:04:20 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from 763a7cef1c to bcc0fed99c 2024-09-22 18:19:52 +00:00 Compare
acid-ant added 1 commit 2024-09-23 09:19:50 +00:00
[#1350] node: Add unit test to measure time required for evacuation
All checks were successful
DCO action / DCO (pull_request) Successful in 50s
Tests and linters / Run gofumpt (pull_request) Successful in 1m13s
Vulncheck / Vulncheck (pull_request) Successful in 2m0s
Pre-commit hooks / Pre-commit (pull_request) Successful in 2m14s
Build / Build Components (pull_request) Successful in 2m27s
Tests and linters / Staticcheck (pull_request) Successful in 2m43s
Tests and linters / gopls check (pull_request) Successful in 2m43s
Tests and linters / Lint (pull_request) Successful in 3m24s
Tests and linters / Tests (pull_request) Successful in 4m18s
Tests and linters / Tests with -race (pull_request) Successful in 5m49s
2218e42a10
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
acid-ant force-pushed feat/evac-skip-rep-one from 2218e42a10 to 417fa6d4cb 2024-09-24 05:39:24 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from 417fa6d4cb to fb3d0e23eb 2024-09-24 12:08:35 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from fb3d0e23eb to c2a5a987ee 2024-09-24 12:10:16 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from c2a5a987ee to b76a7f125a 2024-09-24 14:55:15 +00:00 Compare
acid-ant force-pushed feat/evac-skip-rep-one from b76a7f125a to 9fea4f65b2 2024-09-24 14:55:49 +00:00 Compare
acid-ant changed title from WIP: node: Add ability to evacuate objects from REP 1 only to node: Add ability to evacuate objects from REP 1 only 2024-09-24 15:20:40 +00:00
acid-ant requested review from dstepanov-yadro 2024-09-24 15:20:50 +00:00
acid-ant force-pushed feat/evac-skip-rep-one from 9fea4f65b2 to 3ec9900790 2024-09-25 07:49:19 +00:00 Compare
acid-ant added 1 commit 2024-09-25 08:18:31 +00:00
[#1350] Fix typo
All checks were successful
DCO action / DCO (pull_request) Successful in 1m5s
Tests and linters / Run gofumpt (pull_request) Successful in 1m16s
Vulncheck / Vulncheck (pull_request) Successful in 2m0s
Pre-commit hooks / Pre-commit (pull_request) Successful in 2m18s
Build / Build Components (pull_request) Successful in 2m26s
Tests and linters / gopls check (pull_request) Successful in 2m31s
Tests and linters / Tests (pull_request) Successful in 2m43s
Tests and linters / Staticcheck (pull_request) Successful in 2m43s
Tests and linters / Lint (pull_request) Successful in 3m23s
Tests and linters / Tests with -race (pull_request) Successful in 5m59s
517d9bdbbe
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
aarifullin approved these changes 2024-09-25 09:55:44 +00:00
Dismissed
aarifullin reviewed 2024-09-25 10:08:37 +00:00
@ -784,0 +804,4 @@
return false, err
}
p := c.Value.PlacementPolicy()
for i := range p.NumberOfReplicas() {
Member

You'd had already the discussion with @dstepanov-yadro.
I believe your PR solves the problem for evacuation of objects that have single instance across all nodes. But if the placement policy has a few vectors (REP 1 ... REP 1) it seems like it has replicated copy. Could you explain, then, why do you NOT check if p.NumberOfReplicas() == 1?

You'd had already the [discussion](https://git.frostfs.info/TrueCloudLab/frostfs-node/pulls/1350#issuecomment-50772) with @dstepanov-yadro. I believe your PR solves the problem for evacuation of objects that have single instance across all nodes. But if the placement policy has a few vectors (`REP 1 ... REP 1`) it seems like it has replicated copy. Could you explain, then, why do you NOT check if `p.NumberOfReplicas() == 1`?
Author
Member

I have only one argument for this - network connectivity between nodes for two REP 1 may be bad. I see that it is not strong and because you don't see any problems I'll add this check.

I have only one argument for this - network connectivity between nodes for two `REP 1` may be bad. I see that it is not strong and because you don't see any problems I'll add this check.
aarifullin marked this conversation as resolved
dstepanov-yadro requested changes 2024-09-25 12:11:27 +00:00
Dismissed
@ -781,6 +798,20 @@ func (e *StorageEngine) evacuateObject(ctx context.Context, shardID string, objI
return nil
}
func (e *StorageEngine) isNotRepOne(cid cid.ID) (bool, error) {

Looks like the case with deleted container is missing. It could happen that there are object from already deleted container.

Also should check moving to other node in case of deleted container.

Looks like the case with deleted container is missing. It could happen that there are object from already deleted container. Also should check moving to other node in case of deleted container.
Author
Member

Now all objects are skipped for evacuation in case when container already removed.

Now all objects are skipped for evacuation in case when container already removed.
dstepanov-yadro marked this conversation as resolved
acid-ant force-pushed feat/evac-skip-rep-one from 517d9bdbbe to fe2dda1caa 2024-09-26 11:20:15 +00:00 Compare
acid-ant dismissed aarifullin's review 2024-09-26 11:20:15 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

acid-ant force-pushed feat/evac-skip-rep-one from fe2dda1caa to a5b83d3ba8 2024-09-26 11:20:42 +00:00 Compare
Author
Member

Added check for container existence.

Added check for container existence.
acid-ant force-pushed feat/evac-skip-rep-one from a5b83d3ba8 to 318ac167ba 2024-09-26 11:54:13 +00:00 Compare
aarifullin approved these changes 2024-09-26 14:59:08 +00:00
Dismissed
acid-ant force-pushed feat/evac-skip-rep-one from 318ac167ba to 0230e37bda 2024-09-27 12:39:47 +00:00 Compare
acid-ant dismissed aarifullin's review 2024-09-27 12:39:47 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

acid-ant force-pushed feat/evac-skip-rep-one from 0230e37bda to d0ed29b3c7 2024-09-27 12:41:52 +00:00 Compare
acid-ant requested review from dstepanov-yadro 2024-09-27 13:40:30 +00:00
dstepanov-yadro approved these changes 2024-09-27 14:43:26 +00:00
aarifullin approved these changes 2024-09-30 06:31:59 +00:00
acid-ant merged commit d0ed29b3c7 into master 2024-09-30 06:34:32 +00:00
acid-ant deleted branch feat/evac-skip-rep-one 2024-09-30 06:34:34 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1350
No description provided.