Currently the only way to tell whether `evacuate/set-mode` is finished
is to set a very big timeout and _hope_ that the operation will finish.
In this commit we add INFO logs for such operations which should
simplify the life of an administrator.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
In the previous implementation any non-nil error that preceded object
fetching from blobstor led to iterating over every storage (in other words,
no storage ID information was taken into account). Now storage ID is
skipped only if metabase (storage ID source) returns any error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Because synchronization _most likely_ will have apply already existing
operations, it is much faster to check their presence in a read
transaction. However, always doing this will degrade the perfomance
for normal `Apply`. And, let's be honest, it is already not good.
Thus we add a separate parameter which specifies whether this logic is
enabled.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Both `meta` and `write-cache` are expected to have a fast underlying disk,
so it does not seem like an optimisation. Moreover, `write-cache`'s `Head`
is a `Get` with payload cutting, it _must_ use more memory for no reason
(`meta` was created for such requests). Also, `write-cache` does not allow
performing any "meta" relations checks (such as locking, tombstoning).
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Includes extending listing methods in the Storage Engine with object types.
It allows tuning replication/policer algorithms: container nodes do
not remove `LOCK` objects as redundant and try to fulfill `LOCK` placement
on the ohter container nodes.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It was needed before we started to flush during transition to
`degraded` mode. Now it is confusing.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Currently there is a possibility for modifying operations to fail
because of I/O errors and a new tree to be created on another shard.
This commit adds existence check for modifying operations.
Read operations remain as they are, not to slow things.
`TreeDrop` is an exception, because this is a tree removal and trying
multiple shards is not an unwanted behaviour.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
All logic errors are wrapped in `logicerr.Logical` type and do not
affect shard error counter.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
In previous implementation `Shard.Delete` logged writecache's removal
failures in `error` level. There is a need to decrease severity of these
log records since they aren't critical and don't require individual
review.
Change level of the message to `info`.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
If shard ID is stored in metabase (it is not the first time boot), read it,
set it, use it (not a generated one) in the metrics writer.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Make it store its internal `zap.Logger`'s level. Also, make all the
components to accept internal `logger.Logger` instead of `zap.Logger`; it
will simplify future refactor.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Currently, when removing shard special care must be taken with respect
to shard numbering. `mode: disabled` allows to leave shard configuration
in place while also ignoring it during initialization. This makes
disk replacement much more convenient.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Degraded mode allows us to operate without an SSD,
thus writecache should be unavailable in this mode.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
1. Move compression parameters to the `shard` section.
2. Allow to use multiple sub-storage components in the blobstor.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
1. Remove in-memory cache. It doesn't persist objects and if we want
more speed, `NoSync` option can be used for the bolt DB.
2. Put to the metabase in a synchronous fashion. This considerably
simplifies overall logic and plays nicely with the metabase bolt DB
batch settings.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Allow to extend blobstor with more storage sub-systems. Currently
objects stored in the FSTree have empty byte slice descriptor and object
from blobovnicza tree have the same id as earlier. Each such change in
the identifier formation should be accompanied with metabase version
increase.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
If an object has not been marked for removal by the GC in the current epoch
yet but has already expired, respond with `ErrObjectNotFound` api status.
Also, optimize shard iteration: a node must stop any iteration if the object
is found but gonna be removed soon.
All the checks are performed by the Metabase.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If metabase can't be opened in the default mode, try opening shard
first in `ReadOnly` mode and then in `DegradedReadOnly`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
`Degraded` mode can be set by the administrator if needed.
Modifying operations in this mode can lead node into an inconsistent state
because metabase checks such as lock checking are not performed.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Reduce public interface of this package. Later each result will contain
an additional status, so it makes more sense to use the same functions
and result processing everywhere.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Do not return backend type from the service for now, because memory
backend is expected to vanish.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
The tricky part here is the engine itself: we stop iteration on
`ErrReadOnly` because it is better to synchronize the shard later than
to have partial trees stored in 2 shards.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Replace `ErrRangeOutOfBounds` error from `pkg/core/object` package with
`ObjectOutOfRange` from `apistatus` package. That error is returned by
storage node's server as NeoFS API statuses.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Core changes:
* avoid package-colliding variable naming
* avoid using pointers to IDs where unnecessary
* avoid using `idSDK` import alias pattern
* use `EncodeToString` for protocol string calculation and `String` for
printing
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
- Delete objects physically on tombstone's arrival;
- Store information about tombstones in the Graveyard;
- Clear Graveyard every epoch based on the information about TS in the
network.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Add offset element to the iterations over deleted objects (both the
Graveyard and the Garbage buckets).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It allows storing information about object in both ways at the same time:
1. Metabase should know if an object is covered by a tombstone (that is
not expired yet);
2. It should be possible to physically delete objects covered by a
tombstone immediately (mark with GC) but keep tombstone knowledge.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Morph "NewEpoch" event handling was registered in a closure over
`addNewEpochNotificationHandler` func. That may lead to the data race:
if a shard was initialized before the event registration, everything works
as planned, but if registration was made earlier, it was not able to
include GC handlers since a shard has not called `eventChanInit` yet and,
therefore, it has not registered handler yet.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
`Degraded` mode is set automatically after error counter is over the
threshold. `ReadOnly` mode can still be set by an administrator.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
`apistatus` package provides types which implement build-in `error`
interface. Add `error of type` pattern when documenting these errors in
order to clarify how these errors should be handled (e.g. `errors.Is` is
not good).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace `ErrNotFound`/`ErrAlreadyRemoved` error from
`pkg/core/object` package with `ObjectNotFound`/`ObjectAlreadyRemoved`
one from `apistatus` package. These errors are returned by storage
node's server as NeoFS API statuses.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to process expired `LOCK` objects similar to `TOMBSTONE`
ones: we collect them on `Shard`, notify all other shards about
expiration so they could unlock the objects, and only after that mark
lockers as garbage.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Remove `Object` and `RawObject` types from `pkg/core/object` package.
Use `Object` type from NeoFS SDK Go library everywhere. Avoid using the
deprecated elements.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Similarly to `Get`. Also fix a bug where `ErrNotFound` is returned
instead of `ErrRangeOutOfBounds`.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Metabase is expected to contain actual information about objects stored
in shard. If the object is present in metabase but is missing from
blobstor, peform an additional attempt to fetch it directly without
consulting metabase. Such a situation is unexpected, so error counter
is increased for the shard which has the object in the metabase. We
don't increase error counter for the shard which has the object in
blobstor, because some garbage can be expected there. In this
implementation there is no overhead for objects which are really
missing, i.e. are not present in any metabase.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
It was added back in 2fb379b7 when we had many shard modes. Now we have
only two and comments for constants are rather descriptive.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
We could also ignore errors during evacuate, but this requires
unmarshaling objects first which slowers the process considerably.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
In read-only mode modifying operations are immediately returned with
error and all background operations are suspended.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Dump contains magic and a list of objects prefixed by object size in bytes.
We can't use proto-marshaled list because this requires having all dump
in memory. Using TAR induces 512 byte overhead for each object which can
be a problem in some cases.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Provide shard mode information via `DumpInfo()`. Delete atomic field from
Shard structure since it duplicates new field.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Shard's mode was not used in the Node, so added only two modes whose roles
are clear. More modes will be added in the future.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Use `sync.Once` to prevent locks of stopping GC. It will also allow to
safely call `Shard.Close` multiple times.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Some of the pools are initialized during config initialization,
so it isn't possible currently to release them in one place.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
`List` method of `Shard` must return only physically stored objects.
Use `AddPhyFilter` to select only phy objects.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Tombstone and "alive" objects can be both stored in BlobStor. They can
appear during iterating in different order. Metabase returns
`ErrAlreadyRemoved` error if object is inhumed.
Ignore `object.ErrAlreadyRemoved` errors of `metabase.Put`in Shard's
`refillMetabase` operation.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to refill Metabase data with the objects from BlobStor.
Implement `refillMetabase` method which iterates over all objects from
BlobStor and saves them in Metabase.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Shard should try to read object headers from write-cache if it is enabled.
Extend `writecache.Cache` interface with `Head` method. Call the method in
`Shard.Head` if `Shard.hasWriteCache` returns true.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Change Shard's garbage remover to interrupt iterating over the metabase
graveyard when the buffer is full to the max size (`WithRemoverBatchSize`
Shard's option).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add new epoch event handler to GC that finds all expired tombstones and
marks them and underlying objects to be removed. Shard uses callbacks
provided by the storage engine to mark underlying objects.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add new epoch event handler to GC that finds all expired non-tombstone
objects and marks them to be removed.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Group handlers of the particular event to a WaitGroup and wait for it before
the next event handling. This will ensure that all handlers complete and
prevent potential conflicts between past and present jobs.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`Shard.Init` method creates a new GC instance from shard configuration and
starts GC's workers through `init` call. In initial implementation GC
routines are indefinite and can be killed only with by application shutdown.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Shard's GC component consists of:
* asynchronous remover that periodically wake up and removes all garbage
objects from the shard, and goes to sleep for particular time interval;
* external event listener that distributes jobs between workers;
* group of workers that can handle a single job related to particular
external event.
Remover and event listener represents go-routines which are started by
`init` method (calls from `Shard.Init`). In initial version all event
handlers are interrupted: this means that next event of the same type will
interrupt previous handling and start the new one.
GC is fully encapsulated in Shard. All GC configurations are reflected in
Shard's configuration.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace single target address in `InhumePrm` with the list of addresses.
Change corresponding parameter in `WithTarget` and `MarkAsGarbage` methods
to variadic.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `InhumePrm.MarkAsGarbage` method that leads to marking object as
garbage in metabase. Update `InhumePrm.WithTarget` doc indicating a conflict
with the new method.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Container listing already supported in the metabase for `engine.List`
operation. To get container statistics engine should provide both the
option to get container volume estimation and list of all containers.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
There is a codecov issue because objects are not placed
in the engine the same way every unit test. Therefore
sometimes there are more coverage, sometimes there are
less. Seeded RNG should solve this issue for engine tests.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>