Current flow is hard to reason about, #1601 is a notorious example of
accidental complexity.
1. Remove multiple nested ifs, use depth=1.
2. Process each status exactly once, hopefully preventing bugs like
#1601.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Consider `REP 1 REP 1` placement (selects/filters are omitted).
The placement is `[1, 2], [1, 0]`. We are the 0-th node.
Node 1 is under maintenance, so we do not replicate object
on the node 2. In the second replication group node 1 is under maintenance,
but current caching logic considers it as "replica holder" and removes
local copy. Voilà, we have DL if the object is missing from the node 1.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Use `zap.Error` instead of `zap.String` for logging errors: change all expressions like
`zap.String("error", err.Error())` or `zap.String("err", err.Error())` to `zap.Error(err)`.
Leave similar expressions with other messages unchanged, for example,
`zap.String("last_error", lastErr.Error())` or `zap.String("reason", ctx.Err().Error())`.
This change was made by applying the following patch:
```diff
@@
var err expression
@@
-zap.String("error", err.Error())
+zap.Error(err)
@@
var err expression
@@
-zap.String("err", err.Error())
+zap.Error(err)
```
Signed-off-by: Aleksey Savchuk <a.savchuk@yadro.com>
Includes extending listing methods in the Storage Engine with object types.
It allows tuning replication/policer algorithms: container nodes do
not remove `LOCK` objects as redundant and try to fulfill `LOCK` placement
on the ohter container nodes.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Node response with `NODE_UNDER_MAINTENANCE` status signals that the node
was switched to maintenance mode. There is a delay between the actual
switch and the reflection in the network map of up to one epoch. To
speed up the reaction to the maintenance, it is required to recognize
such node responses in the Policer.
Make `Policer.processNodes` to exclude elements with shortage decreasing
on `NODE_UNDER_MAINTENANCE` status response.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Nodes under maintenance SHOULD not respond to object requests. Based on
this, storage node's Policer SHOULD consider such nodes as problem ones.
However, to prevent spam with the new replicas, on the contrary, Policer
should consider them normal.
Make `Policer.processNodes` to exclude elements if `IsMaintenance()`
with shortage decreasing.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Make `replicator.TaskResult` to accept `netmap.NodeInfo` type instead of
uint64 in order to clarify the meaning and prevent passing the random
numbers.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Error checkers now support wrapped errors so there is no need to
explicitly unwrap errors in `Policer`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>