Because synchronization _most likely_ will have apply already existing
operations, it is much faster to check their presence in a read
transaction. However, always doing this will degrade the perfomance
for normal `Apply`. And, let's be honest, it is already not good.
Thus we add a separate parameter which specifies whether this logic is
enabled.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Includes extending listing methods in the Storage Engine with object types.
It allows tuning replication/policer algorithms: container nodes do
not remove `LOCK` objects as redundant and try to fulfill `LOCK` placement
on the ohter container nodes.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It allows keeping all the locked objects safe after metabase
resynchronization. Currently, all `LOCK` objects are broadcast to all nodes
in a container, it guarantees `LOCK` object presence in a regular situation.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Currently there is a possibility for modifying operations to fail
because of I/O errors and a new tree to be created on another shard.
This commit adds existence check for modifying operations.
Read operations remain as they are, not to slow things.
`TreeDrop` is an exception, because this is a tree removal and trying
multiple shards is not an unwanted behaviour.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
All logic errors are wrapped in `logicerr.Logical` type and do not
affect shard error counter.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Iterate over every shard and search for the container's trees. Final result
is a concatenation of shards' results. It is considered that one fixed tree
is placed on one fixed shard but the different trees of a fixed container
could be placed on different shards.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If shard ID is stored in metabase (it is not the first time boot), read it,
set it, use it (not a generated one) in the metrics writer.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Make it store its internal `zap.Logger`'s level. Also, make all the
components to accept internal `logger.Logger` instead of `zap.Logger`; it
will simplify future refactor.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Use a simple loop at the callsite. This way we remove as much as we can.
Also, `Delete` metrics is more meaningful now.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
1. Move compression parameters to the `shard` section.
2. Allow to use multiple sub-storage components in the blobstor.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
If an object has not been marked for removal by the GC in the current epoch
yet but has already expired, respond with `ErrObjectNotFound` api status.
Also, optimize shard iteration: a node must stop any iteration if the object
is found but gonna be removed soon.
All the checks are performed by the Metabase.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
After a4adb79db new logical error could be returned. Do not increase
error counter in this case.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
`Degraded` mode can be set by the administrator if needed.
Modifying operations in this mode can lead node into an inconsistent state
because metabase checks such as lock checking are not performed.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
There is a need to support working w/o shard if it has problems with
blobovnicza tree.
Make `BlobStor.Init` to return new `ErrInitBlobovniczas` error. Remove
shard from storage engine's shard set if it returned this error from
`Init` call. So if some of the shards (but not all) return this error,
the node will be able to continue working without them.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
1. Modifying operations are not expected to fail, unless the shard is
read-only.
2. `Get*` operations should increase error counter too, unless the
error is `ErrTreeNotFound`.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
The tricky part here is the engine itself: we stop iteration on
`ErrReadOnly` because it is better to synchronize the shard later than
to have partial trees stored in 2 shards.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Replace `ErrRangeOutOfBounds` error from `pkg/core/object` package with
`ObjectOutOfRange` from `apistatus` package. That error is returned by
storage node's server as NeoFS API statuses.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Shard is intended to be used as a separate failure domain,
which usually resides on a separate disk. Thus, sequential
initialization is bound by IO and this change speeds up thing a bit.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Core changes:
* avoid package-colliding variable naming
* avoid using pointers to IDs where unnecessary
* avoid using `idSDK` import alias pattern
* use `EncodeToString` for protocol string calculation and `String` for
printing
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
- Delete objects physically on tombstone's arrival;
- Store information about tombstones in the Graveyard;
- Clear Graveyard every epoch based on the information about TS in the
network.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Morph "NewEpoch" event handling was registered in a closure over
`addNewEpochNotificationHandler` func. That may lead to the data race:
if a shard was initialized before the event registration, everything works
as planned, but if registration was made earlier, it was not able to
include GC handlers since a shard has not called `eventChanInit` yet and,
therefore, it has not registered handler yet.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
`Degraded` mode is set automatically after error counter is over the
threshold. `ReadOnly` mode can still be set by an administrator.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
`apistatus` package provides types which implement build-in `error`
interface. Add `error of type` pattern when documenting these errors in
order to clarify how these errors should be handled (e.g. `errors.Is` is
not good).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace `ErrNotFound`/`ErrAlreadyRemoved` error from
`pkg/core/object` package with `ObjectNotFound`/`ObjectAlreadyRemoved`
one from `apistatus` package. These errors are returned by storage
node's server as NeoFS API statuses.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to process expired `LOCK` objects similar to `TOMBSTONE`
ones: we collect them on `Shard`, notify all other shards about
expiration so they could unlock the objects, and only after that mark
lockers as garbage.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `FormatValidator.ValidateContent` to verify payload of `LOCK`
objects. Pass locked objects to `Locker` interface. Require from
`Locker.Lock` to return `apistatus.IrregularObjectLock` error on a
corresponding condition.
Also add error return to `DeleteHandler.DeleteObjects` method. Require
from method to return `apistatus.ObjectLocked` error on a corresponding
condition. Adopt implementations.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `StorageEngine.Delete` to forward first encountered
`apistatus.ObjectLocked` error during shard processing.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `StorageEngine.Inhume` to forward first encountered
`apistatus.ObjectLocked` error during shard processing.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `StorageEngine.Lock` method which works similar to `Inhume`
but calls `Lock` on the processing shards.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Remove `Object` and `RawObject` types from `pkg/core/object` package.
Use `Object` type from NeoFS SDK Go library everywhere. Avoid using the
deprecated elements.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Similarly to `Get`. Also fix a bug where `ErrNotFound` is returned
instead of `ErrRangeOutOfBounds`.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Metabase is expected to contain actual information about objects stored
in shard. If the object is present in metabase but is missing from
blobstor, peform an additional attempt to fetch it directly without
consulting metabase. Such a situation is unexpected, so error counter
is increased for the shard which has the object in the metabase. We
don't increase error counter for the shard which has the object in
blobstor, because some garbage can be expected there. In this
implementation there is no overhead for objects which are really
missing, i.e. are not present in any metabase.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
There are certain errors which are not expected during usual node
operation and which tell us that something is wrong with the shard.
To prevent possible data corruption, move shard in read-only mode after
amount of errors exceeded some threshold. By default no actions are performed.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Make `BlockExecution` / `ResumeExecution` to not release per-shard worker
pools. Make `StorageEngine.Close` to block these methods and any
data-related operations. It is still releases the pools.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to disable execution of local data operation on storage
engine in runtime. If storage engine ops are blocked, node will act like
always but all local object operations will be denied.
Implement `BlockExecution` / `ResumeExecution` methods on `StorageEngine`
which blocks / resumes the execution of data ops. Wait for the completion of
all operations executed at the time of the call. Return error passed to
`BlockExecution` from all data-related methods until `ResumeExecution` call.
Make `Close` to block operations as well.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `StorageEngine` to use non-blocking worker pools with the same
(configurable) size for PUT operation. This allows you to switch to using
more free shards when overloading others, thereby more evenly distributing
the write load.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
If object to be inhumed is root we need to continue first traverse over the
shards. In case when several children are stored in different shards,
inhuming object in a single shard leads to appearance of inhumed object in
subsequent selections. Also, any object can be already inhumed, and this
case is equivalent to successful inhume.
Do not fail on `object.ErrAlreadyRemoved` error. Continue first iterating
over shards if we detected root object (`SplitInfoError`).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Write unit tests of `StorageEngine.Inhume` which assert that inhumed objects
don't appear in `Select` result.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
This function already reused in different storage engine parts
so it makes sense to keep it in separate package.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Different SplitInfo parts may be stored in different shards. Storage
engine must not stop at first SplitInfoError and should make
best effort to complete SplitInfo structure if needed.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
There were no unit tests of storage engine. This commit
adds first test to reproduce missing link ID in split info
at `engine.Head(raw)` request.
Engine tests uses some constructors from metabase tests,
so it is better to locate such functions in common
package at local_object_storage.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Add `InhumePrm.MarkAsGarbage` method which marks passed objects to be
removed from local storage. Update `InhumePrm.WithTarget` doc to prevent
conflicting use with the new method.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add new epoch event handler to GC that finds all expired tombstones and
marks them and underlying objects to be removed. Shard uses callbacks
provided by the storage engine to mark underlying objects.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace single target address in `InhumePrm` with the list of addresses.
Change corresponding parameter in `WithTarget` and `MarkAsGarbage` methods
to variadic.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `StorageEngine.Delete` to execute `Inhume` operation with
`MarkAsGarbage` parameter on the `Shard` that holds the object. Searching of
the particular shard is performed through iterating over HRW-sorted shards.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation StorageEngine.Inhume operation forced Shard
.Inhume call on all internal shards. There is a need to inhume object in a
single shard. To achieve this, Inhume operation is performed in next steps:
1. iterate over sorted shards, check object presence through Exists call;
2. if object exists at any shard in step 1 => inhume it and return on
success;
3. if no shards contain the object => iterate over sorted shards again and
try to inhume the object at first possible shard;
4. if all Inhume calls are failed => return an error.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Container listing already supported in the metabase for `engine.List`
operation. To get container statistics engine should provide both the
option to get container volume estimation and list of all containers.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Objects of one container can be split among shards, so engine
should iterate over all available shards to sum all size
estimations.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Replace ErrNotFound and ErrRangeOutOfBounds to core/object package in order
to share them across the libraries.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement Put/Get/GetRange/Select/SelectAll functions over storage
engine. These functions are going to be used by Object service.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation each operation on local storage
locked engine mutex. This was done under the assumption that
the weights of the shards change as a result of write operations.
With the transition to static weights of shards, it is no longer
necessary to lock the global mutex during the execution of operations.
However, since the set of engine shards is dynamic, there is still a
need to control multiple access to this set. The same mutex is used
for synchronization.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>