TrueCloudLab/frostfs-node

Author	SHA1	Message	Date
Evgenii Stratonikov	4dc9a1b300	[#1413 ] engine: Remove error counting methods from Shard All checks were successful Tests and linters / Run gofumpt (pull_request) Successful in 2m4s Details DCO action / DCO (pull_request) Successful in 2m22s Details Pre-commit hooks / Pre-commit (pull_request) Successful in 4m10s Details Vulncheck / Vulncheck (pull_request) Successful in 4m5s Details Build / Build Components (pull_request) Successful in 4m31s Details Tests and linters / Staticcheck (pull_request) Successful in 4m21s Details Tests and linters / gopls check (pull_request) Successful in 4m43s Details Tests and linters / Lint (pull_request) Successful in 4m58s Details Tests and linters / Tests (pull_request) Successful in 6m36s Details Tests and linters / Tests with -race (pull_request) Successful in 7m41s Details All error counting and hangling logic is present on the engine level. Currently, we pass engine metrics with shard ID metric to shard, then export 3 methods to manipulate these metrics. In this commits all methods are removed and error counter is tracked on the engine level exlusively. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2024-10-04 15:10:17 +03:00
Evgenii Stratonikov	963faa615a	[#1413 ] engine: Cleanup shard error reporting - `reportShardErrorBackground()` no longer differs from `reportShardError()`, reflect this in its name; - reuse common pieces of code to make it simpler. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2024-10-04 15:10:17 +03:00
Evgenii Stratonikov	9a87acb87a	[#1410 ] engine: Provide the default implementation to MetricsRegister Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2024-10-03 08:23:06 +00:00
Dmitrii Stepanov	a61201a987	[#1337 ] config: Move `rebuild_worker_count` to shard section This makes it simple to limit performance degradation for every shard because of rebuild. Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2024-09-06 13:57:27 +03:00
Evgenii Stratonikov	9d73f9c2c6	Reapply "[#446 ] engine: Move to read-only on blobstor errors" Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2024-06-13 07:35:22 +00:00
Dmitrii Stepanov	40781b3a20	[#1086 ] engine: Change mode in case of errors async Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2024-04-10 12:29:43 +00:00
Evgenii Stratonikov	f23e38c285	Revert "[#446 ] engine: Move to read-only on blobstor errors" All checks were successful DCO action / DCO (pull_request) Successful in 2m14s Details Build / Build Components (1.20) (pull_request) Successful in 4m7s Details Vulncheck / Vulncheck (pull_request) Successful in 3m30s Details Build / Build Components (1.21) (pull_request) Successful in 4m15s Details Tests and linters / Staticcheck (pull_request) Successful in 5m41s Details Tests and linters / Lint (pull_request) Successful in 6m6s Details Tests and linters / gopls check (pull_request) Successful in 6m42s Details Tests and linters / Tests (1.20) (pull_request) Successful in 7m47s Details Tests and linters / Tests (1.21) (pull_request) Successful in 8m21s Details Tests and linters / Tests with -race (pull_request) Successful in 8m20s Details This reverts commit `69df0d21c2`. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2024-04-01 12:48:30 +03:00
Dmitrii Stepanov	d75e7e9a21	[#864 ] engine: Drop container size metric if container deleted Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2024-01-10 10:44:54 +03:00
Dmitrii Stepanov	f1c7905263	[#661 ] blobovniczatree: Make Rebuild concurrent Different DBs can be rebuild concurrently. Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2023-12-07 15:37:33 +03:00
Dmitrii Stepanov	79088baa06	[#772 ] node: Apply gofumpt Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2023-10-31 17:03:03 +03:00
Evgenii Stratonikov	d15199c5d8	[#596 ] engine: Consider context errors as logical Some checks failed Vulncheck / Vulncheck (pull_request) Successful in 2m59s Details DCO action / DCO (pull_request) Successful in 2m54s Details Build / Build Components (1.20) (pull_request) Successful in 4m2s Details Build / Build Components (1.21) (pull_request) Successful in 4m51s Details Tests and linters / Staticcheck (pull_request) Successful in 14m8s Details Tests and linters / Tests (1.20) (pull_request) Failing after 14m56s Details Tests and linters / Lint (pull_request) Successful in 15m27s Details Tests and linters / Tests (1.21) (pull_request) Failing after 15m36s Details Tests and linters / Tests with -race (pull_request) Failing after 16m18s Details Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-08-16 10:39:41 +03:00
Dmitrii Stepanov	cac4ed93d6	[#428 ] engine: Add low_mem config parameter Concurrent initialization in case of the metabase resync leads to high memory consumption and potential OOM. Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2023-06-26 13:29:39 +00:00
Evgenii Stratonikov	69df0d21c2	[#446 ] engine: Move to read-only on blobstor errors All checks were successful ci/woodpecker/pr/pre-commit Pipeline was successful Details ci/woodpecker/push/pre-commit Pipeline was successful Details Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-06-16 14:53:32 +03:00
Evgenii Stratonikov	20b84f183a	[#446 ] engine: Simplify logs for shard mode change Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-06-16 14:51:29 +03:00
Anton Nikiforov	263c6fdc50	[#372 ] node: Add metrics for the error counter in the engine All checks were successful ci/woodpecker/push/pre-commit Pipeline was successful Details Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>	2023-06-07 13:04:47 +00:00
Alejandro Lopez	faca861451	[#411 ] Remove unnecessary pointers for sync objects All checks were successful ci/woodpecker/push/pre-commit Pipeline was successful Details Signed-off-by: Alejandro Lopez <a.lopez@yadro.com>	2023-05-31 10:19:14 +00:00
Evgenii Stratonikov	4b768fd115	[#381 ] *: Move to sync/atomic Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-05-23 08:18:01 +03:00
Dmitrii Stepanov	e4889e06ba	[#329 ] node: Make evacuate async Now it's possible to run evacuate shard in async. Also only one evacuate process can be in progress. Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2023-05-19 08:43:52 +00:00
Evgenii Stratonikov	0e31c12e63	[#240 ] logs: Move log messages to constants Drop duplicate entities. Format entities. Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com> Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-04-14 05:06:09 +00:00
Evgenii Stratonikov	dbc3811ff4	[#191 ] engine: Allow to remove redundant object copies RemoveDuplicates() removes all duplicate object copies stored on multiple shards. All shards are processed and the command tries to leave a copy on the best shard according to HRW. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-04-07 17:25:50 +00:00
Alex Vanin	20de74a505	Rename package name Due to source code relocation from GitHub. Signed-off-by: Alex Vanin <a.vanin@yadro.com>	2023-03-07 16:38:26 +03:00
Dmitrii Stepanov	c3a7039801	[TrueCloudLab/hrw#2 ] node: Optimize shard hash Compute shard hash only once Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2023-02-28 13:36:25 +03:00
Stanislav Bogatyrev	cb016d53a6	[#1 ] Fix comments and error messages Signed-off-by: Stanislav Bogatyrev <s.bogatyrev@yadro.com>	2023-02-06 17:41:14 +03:00
Pavel Karpy	923f84722a	Move to frostfs-node Signed-off-by: Pavel Karpy <p.karpy@yadro.com>	2022-12-28 15:04:29 +03:00
Pavel Karpy	b673d9e472	[#2053 ] engine: Do not switch mode because of logical errors Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>	2022-11-19 11:01:04 +03:00
Evgenii Stratonikov	f2d7e65e39	[#2035 ] engine: Allow moving to degraded from background workers Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-11-19 11:01:04 +03:00
Evgenii Stratonikov	777fd32d4f	[#1818 ] writecache: Increase error counter on background errors Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-11-02 14:24:02 +03:00
Evgenii Stratonikov	fcdbf5e509	[#1969 ] local_object_storage: Add a type for logical errors All logic errors are wrapped in `logicerr.Logical` type and do not affect shard error counter. Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-10-31 11:41:24 +03:00
Evgenii Stratonikov	3b939d190c	[#1957 ] engine: Move shard to read-only if cannot move to degraded Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-10-26 08:20:53 +03:00
Pavel Karpy	f037022a7a	[#1770 ] logger: Refactor `Logger` component Make it store its internal `zap.Logger`'s level. Also, make all the components to accept internal `logger.Logger` instead of `zap.Logger`; it will simplify future refactor. Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>	2022-10-12 18:11:05 +03:00
Evgenii Stratonikov	4944490ffb	[#1559 ] local_object_storage: Move shard to the `DegradedReadOnly` mode `Degraded` mode can be set by the administrator if needed. Modifying operations in this mode can lead node into an inconsistent state because metabase checks such as lock checking are not performed. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 17:56:06 +03:00
Evgenii Stratonikov	339864b720	[#1559 ] local_object_storage: Move `shard.Mode` to a separate package Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 17:56:06 +03:00
Elizaveta Chichindaeva	cc7a723d77	[#1320 ] English Check Signed-off-by: Elizaveta Chichindaeva <elizaveta@nspcc.ru>	2022-05-11 10:40:02 +03:00
Evgenii Stratonikov	6472a170eb	[#1143 ] shard: Introduce explicit `Degraded` mode `Degraded` mode is set automatically after error counter is over the threshold. `ReadOnly` mode can still be set by an administrator. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-03-31 15:33:22 +03:00
Evgenii Stratonikov	6ad2624552	[#1118 ] engine: allow to set error threshold There are certain errors which are not expected during usual node operation and which tell us that something is wrong with the shard. To prevent possible data corruption, move shard in read-only mode after amount of errors exceeded some threshold. By default no actions are performed. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-02-03 15:14:27 +03:00
Leonard Lyubich	ec04e787aa	[#922 ] storage engine: Support operation blocking There is a need to disable execution of local data operation on storage engine in runtime. If storage engine ops are blocked, node will act like always but all local object operations will be denied. Implement `BlockExecution` / `ResumeExecution` methods on `StorageEngine` which blocks / resumes the execution of data ops. Wait for the completion of all operations executed at the time of the call. Return error passed to `BlockExecution` from all data-related methods until `ResumeExecution` call. Make `Close` to block operations as well. Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>	2021-11-12 17:28:38 +03:00
Leonard Lyubich	5b1975d52a	[#674 ] storage engine: Use per-shard worker pools for PUT operation Make `StorageEngine` to use non-blocking worker pools with the same (configurable) size for PUT operation. This allows you to switch to using more free shards when overloading others, thereby more evenly distributing the write load. Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>	2021-10-14 10:20:39 +03:00
Alex Vanin	b8e10571c6	[#426 ] Put prometheus behind pkg/metrics Signed-off-by: Alex Vanin <alexey@nspcc.ru>	2021-03-17 10:58:00 +03:00
Alex Vanin	980b774af2	[#426 ] engine: Support duration metrics With `enable metrics` option, engine will collect durations for all public methods. Signed-off-by: Alex Vanin <alexey@nspcc.ru>	2021-03-17 10:58:00 +03:00
Leonard Lyubich	09750484f9	[#176 ] localstore: Draft storage engine structure and ops Implement the primary structure and operation of the local object storage engine. Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>	2020-12-11 17:19:37 +03:00

40 commits