Fix maintenance node processing in policer #1604

Merged
fyrchik merged 5 commits from fyrchik/frostfs-node:fix-policer into master 2025-01-17 08:50:09 +00:00
Owner

There is a minor refactoring (aka simplification) in progress, but I may do this in another PR.

Consider REP 1 REP 1 placement (selects/filters are omitted).
The placement is [1, 2], [1, 0]. We are the 0-th node.
Node 1 is under maintenance, so we do not replicate object
on the node 2. In the second replication group node 1 is under maintenance,
but current caching logic considers it as "replica holder" and removes
local copy. Voilà, we have DL if the object is missing from the node 1.

TBD: write testing scenario for QA

There is a minor refactoring (aka simplification) in progress, but I may do this in another PR. Consider `REP 1 REP 1` placement (selects/filters are omitted). The placement is `[1, 2], [1, 0]`. We are the 0-th node. Node 1 is under maintenance, so we do not replicate object on the node 2. In the second replication group node 1 is under maintenance, but current caching logic considers it as "replica holder" and removes local copy. Voilà, we have DL if the object is missing from the node 1. TBD: write testing scenario for QA
fyrchik added 5 commits 2025-01-16 13:37:05 +00:00
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
The node can have MAINTENANCE status in the network map, but can also be
ONLINE while responding with MAINTENANCE. These are 2 different code
paths, let's test them separately.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
policer: Properly handle maintenance nodes
Some checks failed
DCO action / DCO (pull_request) Failing after 31s
Tests and linters / Run gofumpt (pull_request) Successful in 28s
Vulncheck / Vulncheck (pull_request) Successful in 1m17s
Pre-commit hooks / Pre-commit (pull_request) Successful in 1m50s
Build / Build Components (pull_request) Successful in 2m11s
Tests and linters / Staticcheck (pull_request) Successful in 2m8s
Tests and linters / Lint (pull_request) Successful in 3m5s
Tests and linters / gopls check (pull_request) Successful in 5m3s
Tests and linters / Tests (pull_request) Successful in 5m46s
Tests and linters / Tests with -race (pull_request) Successful in 5m44s
c47b639fe3
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
fyrchik requested review from storage-core-committers 2025-01-16 13:37:06 +00:00
fyrchik requested review from storage-core-developers 2025-01-16 13:37:06 +00:00
fyrchik force-pushed fix-policer from c47b639fe3 to 0ab7818c81 2025-01-16 13:37:24 +00:00 Compare
fyrchik force-pushed fix-policer from 0ab7818c81 to 57efa0bc8e 2025-01-16 13:40:07 +00:00 Compare
fyrchik added the
bug
label 2025-01-16 13:40:25 +00:00
fyrchik reviewed 2025-01-16 13:42:59 +00:00
@ -127,7 +128,7 @@ func TestProcessObject(t *testing.T) {
nodeCount: 2,
policy: `REP 2 REP 2`,
placement: [][]int{{0, 1}, {0, 1}},
wantReplicateTo: []int{1, 1}, // is this actually good?
Author
Owner

The question of @ale64bit was finally answered: no, it is not :)

The question of @ale64bit was finally answered: no, it is not :)
dstepanov-yadro approved these changes 2025-01-16 14:10:43 +00:00
acid-ant approved these changes 2025-01-16 15:39:45 +00:00
a-savchuk approved these changes 2025-01-17 08:09:57 +00:00
achuprov approved these changes 2025-01-17 08:12:06 +00:00
fyrchik merged commit 57efa0bc8e into master 2025-01-17 08:50:09 +00:00
fyrchik deleted branch fix-policer 2025-01-17 08:50:11 +00:00
fyrchik added this to the v0.45.0 milestone 2025-01-17 12:32:53 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
5 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: TrueCloudLab/frostfs-node#1604
No description provided.