Commit graph

205 commits

Author SHA1 Message Date
c72576e72f [#2208] engine: Log time-consuming shard operations
Currently the only way to tell whether `evacuate/set-mode` is finished
is to set a very big timeout and _hope_ that the operation will finish.
In this commit we add INFO logs for such operations which should
simplify the life of an administrator.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-01-25 15:31:47 +03:00
ba393e3e91 [#2188] engine: Fix panic during setting shard mode
Under load changing shard mode can lead to it being removed from the
list during some other PUT.
```
Dec 28 07:01:26 az neofs-node[364505]: panic: runtime error: invalid memory address or nil pointer dereference
Dec 28 07:01:26 az neofs-node[364505]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xc9fbb1]
Dec 28 07:01:26 az neofs-node[364505]: goroutine 11791912 [running]:
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).putToShard(0xc000435490, {0xc0003f7a28?, 0xc0001192c0?}, 0x2, {0x0, 0x>
Dec 28 07:01:26 az neofs-node[364505]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:91 +0x1b1
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).put.func1(0xc000435490?, {0xc0003f7a28?, 0xc0001192c0?})
Dec 28 07:01:26 az neofs-node[364505]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:71 +0x19c
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).iterateOverSortedShards(0x1?, {{0x62, 0x23, 0xfe, 0x60, 0x67, 0xd5, 0x>
Dec 28 07:01:26 az neofs-node[364505]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/shards.go:225 +0xc8
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).put(0xc000435490, {0x1?})
Dec 28 07:01:26 az neofs-node[364505]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:66 +0x2a9
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Put.func1()
Dec 28 07:01:26 az neofs-node[364505]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:43 +0x2a
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).execIfNotBlocked(0x8?, 0x38?)
Dec 28 07:01:26 az neofs-node[364505]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/control.go:147 +0xcf
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Put(0xc4df775a80?, {0x0?})
Dec 28 07:01:26 az neofs-node[364505]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:42 +0x65
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.Put(0xc06d928b80?, 0xc06b1b8dc8?)
Dec 28 07:01:26 az neofs-node[364505]:         github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:158 +0x19
Dec 28 07:01:26 az neofs-node[364505]: main.engineWithoutNotifications.Put({0x20301b?}, 0x20301b?)
```

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-01-25 15:31:47 +03:00
3d57f4c961 [#2179] test: Fix test TestEvacuateNetwork/multiple_shards,_evacuate_many
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2023-01-24 13:37:49 +03:00
b4e90cdf51 [#2165] pilorama: Optimize TreeApply when used for synchronization
Because synchronization _most likely_ will have apply already existing
operations, it is much faster to check their presence in a read
transaction. However, always doing this will degrade the perfomance
for normal `Apply`. And, let's be honest, it is already not good.
Thus we add a separate parameter which specifies whether this logic is
enabled.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2022-12-30 11:07:35 +03:00
edb1428248 [#2022] Add metric readonly to get shards mode
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2022-12-30 11:07:35 +03:00
0b78af467e [#2140] engine: Fix error handling in TreeMove
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2022-12-30 11:07:35 +03:00
Pavel Karpy
923f84722a Move to frostfs-node
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
2022-12-28 15:04:29 +03:00
Pavel Karpy
b673d9e472 [#2053] engine: Do not switch mode because of logical errors
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-11-19 11:01:04 +03:00
Pavel Karpy
634792077e [#1502] node: Store lock object on every container node
Includes extending listing methods in the Storage Engine with object types.
It allows tuning replication/policer algorithms: container nodes do
not remove `LOCK` objects as redundant and try to fulfill `LOCK` placement
on the ohter container nodes.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-11-19 11:01:04 +03:00
Pavel Karpy
3b61cb4f49 [#1502] engine: Check all shards for LOCK'ing before inhuming
It allows keeping all the locked objects safe after metabase
resynchronization. Currently, all `LOCK` objects are broadcast to all nodes
in a container, it guarantees `LOCK` object presence in a regular situation.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-11-19 11:01:04 +03:00
Evgenii Stratonikov
f2d7e65e39 [#2035] engine: Allow moving to degraded from background workers
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-11-19 11:01:04 +03:00
Evgenii Stratonikov
2ef38cfbc4 [#1996] engine: Ignore pilorama.ErrTreeNotFound for write operations
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-11-19 11:01:04 +03:00
Evgenii Stratonikov
d8d3588e1b [#1996] engine: Always select proper shard for a tree
Currently there is a possibility for modifying operations to fail
because of I/O errors and a new tree to be created on another shard.
This commit adds existence check for modifying operations.
Read operations remain as they are, not to slow things.
`TreeDrop` is an exception, because this is a tree removal and trying
multiple shards is not an unwanted behaviour.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-11-03 15:29:23 +03:00
Evgenii Stratonikov
777fd32d4f [#1818] writecache: Increase error counter on background errors
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-11-02 14:24:02 +03:00
Evgenii Stratonikov
56de2f1363 [#1969] local_object_storage: Simplify logic error construction
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-31 11:41:24 +03:00
Evgenii Stratonikov
fcdbf5e509 [#1969] local_object_storage: Add a type for logical errors
All logic errors are wrapped in `logicerr.Logical` type and do not
affect shard error counter.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-31 11:41:24 +03:00
Evgenii Stratonikov
1e6588e761 [#1974] shard: Do not panic in degraded mode
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-26 12:41:12 +03:00
Evgenii Stratonikov
3b939d190c [#1957] engine: Move shard to read-only if cannot move to degraded
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-26 08:20:53 +03:00
Evgenii Stratonikov
c785e11b20 [#1869] shard: Allow to reload metabase on SIGHUP
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-24 13:28:26 +03:00
Pavel Karpy
49c38d642d [#1902] engine: Search for the tree IDs in every shard
Iterate over every shard and search for the container's trees. Final result
is a concatenation of shards' results. It is considered that one fixed tree
is placed on one fixed shard but the different trees of a fixed container
could be placed on different shards.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-20 16:17:57 +03:00
Pavel Karpy
24e9e3f3bf [#1902] engine, shard: Implement TreeList method
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-20 16:17:57 +03:00
Pavel Karpy
19850ef157 [#1902] pilorama: Add TreeList method
To both `bolt` and `memory` forests; extend `Forest` interface.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-20 16:17:57 +03:00
Evgenii Stratonikov
d772e35aba [#1910] .golangci.yml: Add godot linker
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-18 15:08:26 +03:00
Pavel Karpy
31c623636d [#1863] node: Fix shard id in the object counter metrics
If shard ID is stored in metabase (it is not the first time boot), read it,
set it, use it (not a generated one) in the metrics writer.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-13 13:06:41 +03:00
Pavel Karpy
f037022a7a [#1770] logger: Refactor Logger component
Make it store its internal `zap.Logger`'s level. Also, make all the
components to accept internal `logger.Logger` instead of `zap.Logger`; it
will simplify future refactor.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-12 18:11:05 +03:00
Evgenii Stratonikov
19c0a74e94 [#1867] services/control: Allow to provide multiple shard IDs to some commands
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-12 11:20:48 +03:00
Evgenii Stratonikov
2d43892fc9 [#1840] neofs-node: Use blobstor paths to identify shard
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-10 11:14:55 +03:00
Evgenii Stratonikov
4b005d3178 [#1840] blobstor: Return info about all components
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-10 11:14:55 +03:00
Evgenii Stratonikov
b2aa9947c2 [#1829] engine: Delete split objects properly
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-07 16:35:46 +03:00
Evgenii Stratonikov
6557f5d249 [#1839] engine: Handle Inhume errors properly
If shard is in read-only or degraded mode, there is no need to increase
error counter.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-07 14:25:52 +03:00
Evgenii Stratonikov
2e3ef817f4 [#1819] engine: Increase error counter for PUT errors
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-04 10:11:52 +03:00
Evgenii Stratonikov
af56574849 [#1819] engine: Fix error counter in Inhume
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-10-04 10:11:52 +03:00
Pavel Karpy
8ebe95747e [#1770] node: Do not lock on shard's Close call
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-04 10:08:55 +03:00
Pavel Karpy
887afeaddb [#1770] engine: Do not lock on shard init
Init can take a lot of time. Because the mutex is taken, all new operations
are blocked.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-04 10:08:55 +03:00
Pavel Karpy
fbd5bc8c38 [#1770] engine: Support configuration reload
Currently, it only supports changing the compound of the shards.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-04 10:08:55 +03:00
Pavel Karpy
2d7166f8d0 [#1770] shard: Move NewEpoch event routing on SE level
It will allow dynamic shard management. Closing a shard does not allow
removing event handlers.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-10-04 10:08:55 +03:00
Evgenii Stratonikov
0a411908ee [#1806] writecache: Allow to ignore read errors during flush
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-28 09:28:01 +03:00
Evgenii Stratonikov
3d882e9f47 [#1806] engine: Allow to flush write-cache
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-28 09:28:01 +03:00
Evgenii Stratonikov
a49137349b [#1731] engine: Allow to use user handler for evacuated objects
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-24 13:47:48 +03:00
Evgenii Stratonikov
3df98ce7ba [#1731] engine: Return the amount of actually moved objects in Evacuate
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-19 11:33:52 +03:00
Evgenii Stratonikov
a51b76056e [#1731] engine: Add Evacuate command
Make it possible to move all data from 1 shard to other shards.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-19 11:33:52 +03:00
Evgenii Stratonikov
7377979e12 [#1731] engine: Move single shard PUT to a separate function
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-19 11:33:52 +03:00
Evgenii Stratonikov
5321f8ef9c [#1786] engine: Unify parameter setters
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-15 10:28:48 +03:00
Evgenii Stratonikov
b064fb24d8 [#1616] engine: Do not use batches in delete
Use a simple loop at the callsite. This way we remove as much as we can.
Also, `Delete` metrics is more meaningful now.

Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-15 10:28:48 +03:00
Pavel Karpy
beb7f2a048 [#1634] node: Change default epoch in tests
Do not treat objects with expiration as expired by default.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-09-13 21:32:37 +04:00
Pavel Karpy
edef26a4fd [#1658] engine: Update metrics interfaces
It now supports typed object counter metrics.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-09-13 21:32:37 +04:00
Evgenii Stratonikov
a2bb3a2a96 [#1630] pilorama: Support dropping trees
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
2022-09-12 09:54:15 +03:00
Pavel Karpy
9c7b3ce799 [#1658] node: Read metrics from meta on startup
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-08-25 19:20:33 +03:00
Pavel Karpy
745f72fff0 [#1658] node: Support object counter metrics
Increment shard's object counter on successful `Put` calls and decrement on
`Delete`.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-08-25 19:20:33 +03:00
Pavel Karpy
30341f2192 [#1687] *: Perform go fmt using go v1.19
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
2022-08-22 18:59:57 +03:00