Commit graph

342 commits

Author SHA1 Message Date
059e9e88a2 [#373] metabase: Add metrics
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-06-21 15:13:26 +03:00
01a0c97760 [#453] engine: Set Disabled mode to deleted shard
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-06-20 12:04:07 +03:00
4449006862 [#424] metrics: Use mode value as metric value for shard
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-06-14 18:26:19 +03:00
1b364d8cf4 [#424] metrics: Refactor engine metrics
Use histogram vector to measure request duration.
Fix naming like in Prometheus best practice.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-06-14 14:53:32 +03:00
2541d319de [#266] pilorama: Allow to get current tree height
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-06-13 10:00:45 +00:00
263c6fdc50 [#372] node: Add metrics for the error counter in the engine
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2023-06-07 13:04:47 +00:00
74578052f9 [#412] node: Replace tracing package
Use observability module.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-06-01 13:23:11 +00:00
3220c4df9f [#376] metrics: Add GC metrics
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-05-31 10:22:12 +00:00
365a7ca0f4 [#366] node: Stop GC once termination signal received
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2023-05-29 09:35:08 +03:00
802168c0c6 [#364] node: Stop flushing big object when termination signal received
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2023-05-26 16:46:58 +03:00
20a489bdb5 [#393] gc: Use defer to mark handler done
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-05-26 12:14:02 +00:00
2613351008 [#387] gc: Cancel GC is change mode requested
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-05-25 09:38:16 +03:00
2ce43935f9 [#312] metrics: Add writecache metrcis
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-05-24 10:18:39 +00:00
4b768fd115 [#381] *: Move to sync/atomic
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-05-23 08:18:01 +03:00
e4889e06ba [#329] node: Make evacuate async
Now it's possible to run evacuate shard in async.
Also only one evacuate process can be in progress.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-05-19 08:43:52 +00:00
869fcbf591 [#332] gc: Fix expired complex object deletion
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-05-16 12:44:57 +00:00
ab07bad33d [#332] gc: Add complex object unit test
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-05-16 12:44:57 +00:00
4578d00619 [#321] shard/test: Execute tests in parallel
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-05-12 09:45:03 +00:00
d35e4c389f [#321] shard/test: Parallelize TestWriteCacheObjectLoss
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-05-12 09:45:03 +00:00
969bfb603f [#321] shard/test: Parallelize TestShard_List
```
go test -count=1 -run TestShard_List -race .
Before: 2.492s
After:  0.109s
```

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-05-12 09:45:03 +00:00
a181c9e434 [#332] gc: Add additional logging
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-05-10 17:34:40 +03:00
8466894fdf [#250] control: remove DumpShard and RestoreShard RPC
We have `Evacuate` with a cleaner interface.
Also, remove them from CLI and engine.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-04-14 12:28:49 +00:00
6121b541b5 [#242] treesvc: Add tracing spans
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-04-14 10:25:53 +00:00
d62c6e4ce6 [#242] node: Add tracing spans
Add tracing spans for PUT requests.
Add tracing spans for DELETE requests.
Add tracing spans for SELECT requests.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-04-14 10:25:53 +00:00
0e31c12e63 [#240] logs: Move log messages to constants
Drop duplicate entities.
Format entities.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-04-14 05:06:09 +00:00
0920d848d0 [#135] get-object: Add tracing spans
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-04-12 06:52:00 +00:00
cb172e73a6 [#228] node: Use uber atomic package instead standard
Signed-off-by: Airat Arifullin a.arifullin@yadro.com
2023-04-07 15:37:27 +00:00
8908798f59 [#203] node: Resolve unused vars
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-04-06 16:33:36 +03:00
9e2df4b7c7 [#203] node: Fix double imports
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-04-06 16:33:36 +03:00
23575e1ac0 [#210] policier: Resolve contextcheck linter
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-04-05 14:55:52 +03:00
9098d0eec0 [#211] engine: Unify shard mode checks for tree operations
All operations must ensure the shard is not in a degraded mode.
Write operations must also ensure the shard is not in a read-only mode.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-04-05 11:10:39 +00:00
8e5a0dcf27 [#204] gc: Fix GC handlers start
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-04-04 06:48:27 +00:00
a7c79c773a [#168] node: Refactor node config
Resolve containedctx linter for cfg

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-03-31 09:32:59 +03:00
9f0bce5c15 [#183] gc: Fix drop expired locked simple objects
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-03-30 15:33:42 +03:00
341fe1688f [#139] test: Add test storage implementation
This aims to reduce the usage of chmod hackery to induce or simulate
OS-related failures.

Signed-off-by: Alejandro Lopez <a.lopez@yadro.com>
2023-03-29 14:28:49 +00:00
aarifullin
34329d67ff [#86] node: Fix unit test and linter errors
Signed-off-by: Airat Arifullin <aarifullin@yadro.com>
2023-03-23 12:42:58 +03:00
9808dec591 [#86] node: Move testing utils to one package
Move testing utils from tests in local_object_storage package to
unified testutil package

Signed-off-by: Airat Arifullin <aarifullin@yadro.com>
2023-03-23 08:19:15 +00:00
fb13902db9 [#156] shard: Make refillMetabase() pass linter checks
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-03-22 07:14:18 +00:00
47e8c5bf23 [#156] pilorama: Remove CIDDescriptor from TreeApply()
Initially it was there to check whether an update is being initiated by
a proper node. It is now obsolete for 2 reasons:
1. Background synchronization fetches all operations from a single node.
2. There are a lot more problems with trust in the tree service, it is
   only used in controlled environments.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-03-22 07:14:18 +00:00
5059dcc19d [#145] shard-gc: Delete expired objects after locks
GC deletes expired locks and objects sequentially. Expired locks and
objects are now being deleted concurrently in batches. Added a config
parameter that controls the number of concurrent workers and batch size.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-03-21 11:31:08 +03:00
6c4a1699ef [#145] shard-gc: Expired locked unit test
Added unit test that verifies that GC deletes expired
locked objects in one epoch.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-03-21 11:31:08 +03:00
97c36ed3ec [#148] linter: Add funlen linter
Long functions are hard to understand and source of errors

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-03-21 09:54:41 +03:00
3e6fd4c611 [#82] pilorama: Allow to store last sync height
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-03-13 11:25:44 +00:00
20de74a505 Rename package name
Due to source code relocation from GitHub.

Signed-off-by: Alex Vanin <a.vanin@yadro.com>
2023-03-07 16:38:26 +03:00
Pavel Karpy
3beef10f89 [#61] node: Do not fetch missing objects
If an object is missing in a `meta`, shard should not look for it in
a `blobstor`.

Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
2023-02-20 14:47:38 +03:00
Pavel Karpy
07ec51ea60 [#2244] node: Add object address to WC's operations
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
2023-02-20 13:53:27 +03:00
427fe276f2 [#2238] shard: Try closing all components
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-02-20 13:53:27 +03:00
362f24953a [#47] shard: Switch container size metric from physical to logical capacity
Signed-off-by: Artem Tataurov <a.tataurov@yadro.com>
2023-02-17 12:03:42 +03:00
ab21d90cfb [#1794] shard: Add increasing case for the payload size metric
Signed-off-by: Artem Tataurov <a.tataurov@yadro.com>
2023-02-09 13:30:23 +03:00
cb016d53a6 [#1] Fix comments and error messages
Signed-off-by: Stanislav Bogatyrev <s.bogatyrev@yadro.com>
2023-02-06 17:41:14 +03:00
Pavel Karpy
89a0266f5e [#1794] metrics: Track physical object capacity per shard
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
2023-01-26 20:06:28 +03:00
Evgenii Stratonikov
9513f163aa [#2116] metrics: Track physical object capacity in the container
Currently we track based on `PayloadSize`, because it is already stored
in the metabase and it is easier to calculate without slowing down the
whole system.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
2023-01-26 20:06:28 +03:00
d65a95a2c6 [#28] pilorama: Remove LogMove struct
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-01-25 15:31:47 +03:00
c72576e72f [#2208] engine: Log time-consuming shard operations
Currently the only way to tell whether `evacuate/set-mode` is finished
is to set a very big timeout and _hope_ that the operation will finish.
In this commit we add INFO logs for such operations which should
simplify the life of an administrator.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-01-25 15:31:47 +03:00
Pavel Karpy
64a5294b27 [#2200] shard: Do not fetch big objects from blobovniczas
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-01-25 15:31:47 +03:00
Pavel Karpy
91757329ae [#2200] shard: Fix blobstor obj fetching
In the previous implementation any non-nil error that preceded object
fetching from blobstor led to iterating over every storage (in other words,
no storage ID information was taken into account). Now storage ID is
skipped only if metabase (storage ID source) returns any error.

Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-01-25 15:31:47 +03:00
6451f019d2 [#2203] shard: Do not panic in Close after unsuccessful Init
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-01-25 15:31:47 +03:00
b4e90cdf51 [#2165] pilorama: Optimize TreeApply when used for synchronization
Because synchronization _most likely_ will have apply already existing
operations, it is much faster to check their presence in a read
transaction. However, always doing this will degrade the perfomance
for normal `Apply`. And, let's be honest, it is already not good.
Thus we add a separate parameter which specifies whether this logic is
enabled.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2022-12-30 11:07:35 +03:00
Pavel Karpy
21717262ec [#2016] shard: Check meta first on Get
`meta` should prevent returning removed objects (`GCMark` and `TS` relations
are `meta` abstractions).

Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
2022-12-30 11:07:35 +03:00
Pavel Karpy
74ec71446f [#2167] shard: Do not use write-cache by default in Head
Both `meta` and `write-cache` are expected to have a fast underlying disk,
so it does not seem like an optimisation. Moreover, `write-cache`'s `Head`
is a `Get` with payload cutting, it _must_ use more memory for no reason
(`meta` was created for such requests). Also, `write-cache` does not allow
performing any "meta" relations checks (such as locking, tombstoning).

Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
2022-12-30 11:07:35 +03:00
Pavel Karpy
eea2892109 [#1956] node: Lock shard's mode on its methods switch
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
2022-12-30 11:07:35 +03:00
edb1428248 [#2022] Add metric readonly to get shards mode
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2022-12-30 11:07:35 +03:00
Pavel Karpy
923f84722a Move to frostfs-node
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
2022-12-28 15:04:29 +03:00
Pavel Karpy
634792077e [#1502] node: Store lock object on every container node
Includes extending listing methods in the Storage Engine with object types.
It allows tuning replication/policer algorithms: container nodes do
not remove `LOCK` objects as redundant and try to fulfill `LOCK` placement
on the ohter container nodes.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-11-19 11:01:04 +03:00
Pavel Karpy
34e8d2ba56 [#1502] shard: Add IsLocked method
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-11-19 11:01:04 +03:00
Evgenii Stratonikov
d65604ad30 [#1985] blobstor: Allow to report multiple errors to caller
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-11-19 11:01:04 +03:00
Evgenii Stratonikov
b0e94b6a6b [#1906] writecache: Do not require read-only mode in Flush
It was needed before we started to flush during transition to
`degraded` mode. Now it is confusing.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-11-19 11:01:04 +03:00
Evgenii Stratonikov
d8d3588e1b [#1996] engine: Always select proper shard for a tree
Currently there is a possibility for modifying operations to fail
because of I/O errors and a new tree to be created on another shard.
This commit adds existence check for modifying operations.
Read operations remain as they are, not to slow things.
`TreeDrop` is an exception, because this is a tree removal and trying
multiple shards is not an unwanted behaviour.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-11-03 15:29:23 +03:00
Evgenii Stratonikov
777fd32d4f [#1818] writecache: Increase error counter on background errors
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-11-02 14:24:02 +03:00
Evgenii Stratonikov
34501685b7 [#1969] local_object_storage: Move ErrObjectIsExpired to another package
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-31 11:41:24 +03:00
Evgenii Stratonikov
56de2f1363 [#1969] local_object_storage: Simplify logic error construction
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-31 11:41:24 +03:00
Evgenii Stratonikov
fcdbf5e509 [#1969] local_object_storage: Add a type for logical errors
All logic errors are wrapped in `logicerr.Logical` type and do not
affect shard error counter.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-31 11:41:24 +03:00
Leonard Lyubich
e8c5f03c30 [#1905] shard: Don't log read-only errors of write-cache
There is no need to log `writecache.ErrReadOnly` errors in `Delete`
method of the `Shard`.

Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
2022-10-28 18:30:45 +03:00
Leonard Lyubich
b1fa084756 [#1905] shard: Decrease severity level of write-cache failure logs
In previous implementation `Shard.Delete` logged writecache's removal
failures in `error` level. There is a need to decrease severity of these
log records since they aren't critical and don't require individual
review.

Change level of the message to `info`.

Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
2022-10-28 18:30:45 +03:00
Evgenii Stratonikov
1e6588e761 [#1974] shard: Do not panic in degraded mode
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-26 12:41:12 +03:00
Evgenii Stratonikov
713fdab177 [#1907] shard: Return from Close after GC has stopped
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-25 11:54:45 +03:00
Pavel Karpy
0371d15b2f [#1953] shard: Fix debug log messages
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-24 21:52:18 +03:00
Pavel Karpy
f8180447a1 [#1938] meta: Make version error messages more descriptive
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-24 21:46:18 +03:00
Evgenii Stratonikov
1beafea0b5 [#1869] shard: Add logs for SetMode operations on reload
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-24 13:28:26 +03:00
Evgenii Stratonikov
87be4f1629 [#1869] shard: Restore shard mode on failed reloads
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-24 13:28:26 +03:00
Evgenii Stratonikov
c785e11b20 [#1869] shard: Allow to reload metabase on SIGHUP
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-24 13:28:26 +03:00
Evgenii Stratonikov
f769fc83fc [#1869] shard: Embed gcCfg as raw struct
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-24 13:28:26 +03:00
Pavel Karpy
24e9e3f3bf [#1902] engine, shard: Implement TreeList method
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-20 16:17:57 +03:00
Pavel Karpy
2e199c7ab1 [#1902] shard: Fix pilorama disabled err message
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-20 16:17:57 +03:00
Pavel Karpy
19850ef157 [#1902] pilorama: Add TreeList method
To both `bolt` and `memory` forests; extend `Forest` interface.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-20 16:17:57 +03:00
Evgenii Stratonikov
d772e35aba [#1910] .golangci.yml: Add godot linker
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-18 15:08:26 +03:00
Pavel Karpy
31c623636d [#1863] node: Fix shard id in the object counter metrics
If shard ID is stored in metabase (it is not the first time boot), read it,
set it, use it (not a generated one) in the metrics writer.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-13 13:06:41 +03:00
Pavel Karpy
f037022a7a [#1770] logger: Refactor Logger component
Make it store its internal `zap.Logger`'s level. Also, make all the
components to accept internal `logger.Logger` instead of `zap.Logger`; it
will simplify future refactor.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-12 18:11:05 +03:00
Evgenii Stratonikov
4b005d3178 [#1840] blobstor: Return info about all components
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-10 11:14:55 +03:00
Evgenii Stratonikov
9b241e4a17 [#1840] neofs-node: Allow to use mode: disabled in config
Currently, when removing shard special care must be taken with respect
to shard numbering. `mode: disabled` allows to leave shard configuration
in place while also ignoring it during initialization. This makes
disk replacement much more convenient.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-10 11:14:55 +03:00
Pavel Karpy
4eb0ed11f8 [#1809] node: Do not boot up if metabase is outdated
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-04 12:32:10 +03:00
Evgenii Stratonikov
8b3b16fe62 [#1825] writecache: Flush cache when moving to the DEGRADED mode
Degraded mode allows us to operate without an SSD,
thus writecache should be unavailable in this mode.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-04 12:13:09 +03:00
Pavel Karpy
2d7166f8d0 [#1770] shard: Move NewEpoch event routing on SE level
It will allow dynamic shard management. Closing a shard does not allow
removing event handlers.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-04 10:08:55 +03:00
Evgenii Stratonikov
0a411908ee [#1806] writecache: Allow to ignore read errors during flush
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-28 09:28:01 +03:00
Evgenii Stratonikov
f2045c10d7 [#1806] shard: Check each component mode when setting mode
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-28 09:28:01 +03:00
Evgenii Stratonikov
3d882e9f47 [#1806] engine: Allow to flush write-cache
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-28 09:28:01 +03:00
Pavel Karpy
431e331373 [#1658] shard: Update metric counters
Use meta's operation results to change the metrics. Support typed object
counters.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-09-13 21:32:37 +04:00
Pavel Karpy
d872862710 [#1658] meta: Force counters resync process
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-09-13 21:32:37 +04:00
Pavel Karpy
beb7f2a048 [#1634] node: Change default epoch in tests
Do not treat objects with expiration as expired by default.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-09-13 21:32:37 +04:00
Evgenii Stratonikov
a2bb3a2a96 [#1630] pilorama: Support dropping trees
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-12 09:54:15 +03:00