Commit graph

41 commits

Author SHA1 Message Date
6db46257c0
[#1437] node: Use ctx for logging
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-11-13 10:36:07 +03:00
4dc9a1b300 [#1413] engine: Remove error counting methods from Shard
All checks were successful
Tests and linters / Run gofumpt (pull_request) Successful in 2m4s
DCO action / DCO (pull_request) Successful in 2m22s
Pre-commit hooks / Pre-commit (pull_request) Successful in 4m10s
Vulncheck / Vulncheck (pull_request) Successful in 4m5s
Build / Build Components (pull_request) Successful in 4m31s
Tests and linters / Staticcheck (pull_request) Successful in 4m21s
Tests and linters / gopls check (pull_request) Successful in 4m43s
Tests and linters / Lint (pull_request) Successful in 4m58s
Tests and linters / Tests (pull_request) Successful in 6m36s
Tests and linters / Tests with -race (pull_request) Successful in 7m41s
All error counting and hangling logic is present on the engine level.
Currently, we pass engine metrics with shard ID metric to shard, then
export 3 methods to manipulate these metrics.
In this commits all methods are removed and error counter is tracked on
the engine level exlusively.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-10-04 15:10:17 +03:00
963faa615a [#1413] engine: Cleanup shard error reporting
- `reportShardErrorBackground()` no longer differs from
  `reportShardError()`, reflect this in its name;
- reuse common pieces of code to make it simpler.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-10-04 15:10:17 +03:00
9a87acb87a [#1410] engine: Provide the default implementation to MetricsRegister
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-10-03 08:23:06 +00:00
a61201a987 [#1337] config: Move rebuild_worker_count to shard section
This makes it simple to limit performance degradation for every shard
because of rebuild.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-09-06 13:57:27 +03:00
9d73f9c2c6 Reapply "[#446] engine: Move to read-only on blobstor errors"
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-06-13 07:35:22 +00:00
40781b3a20 [#1086] engine: Change mode in case of errors async
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-04-10 12:29:43 +00:00
f23e38c285 Revert "[#446] engine: Move to read-only on blobstor errors"
All checks were successful
DCO action / DCO (pull_request) Successful in 2m14s
Build / Build Components (1.20) (pull_request) Successful in 4m7s
Vulncheck / Vulncheck (pull_request) Successful in 3m30s
Build / Build Components (1.21) (pull_request) Successful in 4m15s
Tests and linters / Staticcheck (pull_request) Successful in 5m41s
Tests and linters / Lint (pull_request) Successful in 6m6s
Tests and linters / gopls check (pull_request) Successful in 6m42s
Tests and linters / Tests (1.20) (pull_request) Successful in 7m47s
Tests and linters / Tests (1.21) (pull_request) Successful in 8m21s
Tests and linters / Tests with -race (pull_request) Successful in 8m20s
This reverts commit 69df0d21c2.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-04-01 12:48:30 +03:00
d75e7e9a21 [#864] engine: Drop container size metric if container deleted
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-10 10:44:54 +03:00
f1c7905263 [#661] blobovniczatree: Make Rebuild concurrent
Different DBs can be rebuild concurrently.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-07 15:37:33 +03:00
79088baa06 [#772] node: Apply gofumpt
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-10-31 17:03:03 +03:00
d15199c5d8 [#596] engine: Consider context errors as logical
Some checks failed
Vulncheck / Vulncheck (pull_request) Successful in 2m59s
DCO action / DCO (pull_request) Successful in 2m54s
Build / Build Components (1.20) (pull_request) Successful in 4m2s
Build / Build Components (1.21) (pull_request) Successful in 4m51s
Tests and linters / Staticcheck (pull_request) Successful in 14m8s
Tests and linters / Tests (1.20) (pull_request) Failing after 14m56s
Tests and linters / Lint (pull_request) Successful in 15m27s
Tests and linters / Tests (1.21) (pull_request) Failing after 15m36s
Tests and linters / Tests with -race (pull_request) Failing after 16m18s
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-08-16 10:39:41 +03:00
cac4ed93d6 [#428] engine: Add low_mem config parameter
Concurrent initialization in case of the metabase resync leads to
high memory consumption and potential OOM.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-06-26 13:29:39 +00:00
69df0d21c2 [#446] engine: Move to read-only on blobstor errors
All checks were successful
ci/woodpecker/pr/pre-commit Pipeline was successful
ci/woodpecker/push/pre-commit Pipeline was successful
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-06-16 14:53:32 +03:00
20b84f183a [#446] engine: Simplify logs for shard mode change
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-06-16 14:51:29 +03:00
263c6fdc50 [#372] node: Add metrics for the error counter in the engine
All checks were successful
ci/woodpecker/push/pre-commit Pipeline was successful
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2023-06-07 13:04:47 +00:00
faca861451 [#411] Remove unnecessary pointers for sync objects
All checks were successful
ci/woodpecker/push/pre-commit Pipeline was successful
Signed-off-by: Alejandro Lopez <a.lopez@yadro.com>
2023-05-31 10:19:14 +00:00
4b768fd115 [#381] *: Move to sync/atomic
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-05-23 08:18:01 +03:00
e4889e06ba [#329] node: Make evacuate async
Now it's possible to run evacuate shard in async.
Also only one evacuate process can be in progress.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-05-19 08:43:52 +00:00
0e31c12e63 [#240] logs: Move log messages to constants
Drop duplicate entities.
Format entities.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-04-14 05:06:09 +00:00
dbc3811ff4 [#191] engine: Allow to remove redundant object copies
RemoveDuplicates() removes all duplicate object copies stored on
multiple shards. All shards are processed and the command tries to leave
a copy on the best shard according to HRW.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-04-07 17:25:50 +00:00
20de74a505 Rename package name
Due to source code relocation from GitHub.

Signed-off-by: Alex Vanin <a.vanin@yadro.com>
2023-03-07 16:38:26 +03:00
c3a7039801 [TrueCloudLab/hrw#2] node: Optimize shard hash
Compute shard hash only once

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-02-28 13:36:25 +03:00
cb016d53a6 [#1] Fix comments and error messages
Signed-off-by: Stanislav Bogatyrev <s.bogatyrev@yadro.com>
2023-02-06 17:41:14 +03:00
Pavel Karpy
923f84722a Move to frostfs-node
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
2022-12-28 15:04:29 +03:00
Pavel Karpy
b673d9e472 [#2053] engine: Do not switch mode because of logical errors
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-11-19 11:01:04 +03:00
Evgenii Stratonikov
f2d7e65e39 [#2035] engine: Allow moving to degraded from background workers
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-11-19 11:01:04 +03:00
Evgenii Stratonikov
777fd32d4f [#1818] writecache: Increase error counter on background errors
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-11-02 14:24:02 +03:00
Evgenii Stratonikov
fcdbf5e509 [#1969] local_object_storage: Add a type for logical errors
All logic errors are wrapped in `logicerr.Logical` type and do not
affect shard error counter.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-31 11:41:24 +03:00
Evgenii Stratonikov
3b939d190c [#1957] engine: Move shard to read-only if cannot move to degraded
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-26 08:20:53 +03:00
Pavel Karpy
f037022a7a [#1770] logger: Refactor Logger component
Make it store its internal `zap.Logger`'s level. Also, make all the
components to accept internal `logger.Logger` instead of `zap.Logger`; it
will simplify future refactor.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-12 18:11:05 +03:00
Evgenii Stratonikov
4944490ffb [#1559] local_object_storage: Move shard to the DegradedReadOnly mode
`Degraded` mode can be set by the administrator if needed.
Modifying operations in this mode can lead node into an inconsistent state
because metabase checks such as lock checking are not performed.

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-07-21 17:56:06 +03:00
Evgenii Stratonikov
339864b720 [#1559] local_object_storage: Move shard.Mode to a separate package
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-07-21 17:56:06 +03:00
Elizaveta Chichindaeva
cc7a723d77 [#1320] English Check
Signed-off-by: Elizaveta Chichindaeva <elizaveta@nspcc.ru>
2022-05-11 10:40:02 +03:00
Evgenii Stratonikov
6472a170eb [#1143] shard: Introduce explicit Degraded mode
`Degraded` mode is set automatically after error counter is over the
threshold. `ReadOnly` mode can still be set by an administrator.

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-31 15:33:22 +03:00
Evgenii Stratonikov
6ad2624552 [#1118] engine: allow to set error threshold
There are certain errors which are not expected during usual node
operation and which tell us that something is wrong with the shard.
To prevent possible data corruption, move shard in read-only mode after
amount of errors exceeded some threshold. By default no actions are performed.

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-02-03 15:14:27 +03:00
Leonard Lyubich
ec04e787aa [#922] storage engine: Support operation blocking
There is a need to disable execution of local data operation on storage
engine in runtime. If storage engine ops are blocked, node will act like
always but all local object operations will be denied.

Implement `BlockExecution` / `ResumeExecution` methods on `StorageEngine`
which blocks / resumes the execution of data ops. Wait for the completion of
all operations executed at the time of the call. Return error passed to
`BlockExecution` from all data-related methods until `ResumeExecution` call.
Make `Close` to block operations as well.

Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
2021-11-12 17:28:38 +03:00
Leonard Lyubich
5b1975d52a [#674] storage engine: Use per-shard worker pools for PUT operation
Make `StorageEngine` to use non-blocking worker pools with the same
(configurable) size for PUT operation. This allows you to switch to using
more free shards when overloading others, thereby more evenly distributing
the write load.

Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
2021-10-14 10:20:39 +03:00
Alex Vanin
b8e10571c6 [#426] Put prometheus behind pkg/metrics
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
2021-03-17 10:58:00 +03:00
Alex Vanin
980b774af2 [#426] engine: Support duration metrics
With `enable metrics` option, engine will collect
durations for all public methods.

Signed-off-by: Alex Vanin <alexey@nspcc.ru>
2021-03-17 10:58:00 +03:00
Leonard Lyubich
09750484f9 [#176] localstore: Draft storage engine structure and ops
Implement the primary structure and operation of the local object
storage engine.

Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
2020-12-11 17:19:37 +03:00