There is a need to limit disk space used by write-cache. It is almost
impossible to calculate the value exactly. It is proposed to estimate the
size of the cache by the number of objects stored in it.
Track amounts of objects saved in DB and FSTree separately. To do this,
`ObjectCounters` interface is defined. It is generalized to a store of
numbers that can be made persistent (new option `WithObjectCounters`). By
default DB number is calculated as key number in default bucket, and FS
number is set same to DB since it is currently hard to read the actual value
from `FSTree` instance. Each PUT/DELETE operation to DB or FS
increases/decreases corresponding counter. Before each PUT op an overflow
check is performed with the following formula for evaluating the occupied
space: `NumDB * MaxDBSize + NumFS * MaxFSSize`. If next PUT can cause
write-cache overflow, object is written to the main storage.
By default maximum write-cache size is set to 1GB.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to keep track of each local storage change. Log messages are
the most convenient way to do it.
Implement function which writes log message about the completed writing
operation in storage engine.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Shard should try to read object headers from write-cache if it is enabled.
Extend `writecache.Cache` interface with `Head` method. Call the method in
`Shard.Head` if `Shard.hasWriteCache` returns true.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Write cache should be able to execute HEAD operations according to spec.
Add simple implementation of `Head` method through the `Get` one. Leave
notes for future optimization.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Changes:
* replace `iotuil` elements with the ones from `os` package;
* replace `os.Filemode` with `fs.FileMode`;
* use `signal.NotifyContext` instead of `NewGracefulContext` (removed).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
If object to be inhumed is root we need to continue first traverse over the
shards. In case when several children are stored in different shards,
inhuming object in a single shard leads to appearance of inhumed object in
subsequent selections. Also, any object can be already inhumed, and this
case is equivalent to successful inhume.
Do not fail on `object.ErrAlreadyRemoved` error. Continue first iterating
over shards if we detected root object (`SplitInfoError`).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Write unit tests of `StorageEngine.Inhume` which assert that inhumed objects
don't appear in `Select` result.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Evicting from cache requires closing blobovnicza which
in turn needs to lock `activeMtx`. This lock is not needed on
every addition, but our LRU library doesn't return evicted keys.
In future we may consider switching to other implementation.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
This function already reused in different storage engine parts
so it makes sense to keep it in separate package.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Different SplitInfo parts may be stored in different shards. Storage
engine must not stop at first SplitInfoError and should make
best effort to complete SplitInfo structure if needed.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
There were no unit tests of storage engine. This commit
adds first test to reproduce missing link ID in split info
at `engine.Head(raw)` request.
Engine tests uses some constructors from metabase tests,
so it is better to locate such functions in common
package at local_object_storage.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
`Inhume` operation can be performed on already deleted objects, and in this
case the entry will be added to the graveyard. `Delete` operation finishes
with error if object is not presented in metabase. However, the entry in the
cemetery must be deleted regardless of the presence of the object.
Additionally, now `Delete` does not return an error in the absence of an
object.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Metabase should not store payloads of objects. Make Put operation to cut
object payload before saving binary object in metabase.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Metabase should not store payloads of objects. Set payload in generated test
object. Ascertain that objects returned by Get method have no payload.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add `InhumePrm.MarkAsGarbage` method which marks passed objects to be
removed from local storage. Update `InhumePrm.WithTarget` doc to prevent
conflicting use with the new method.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Creating tombstones for tombstones is prohibited in NeoFS system. Metabase
graveyard contains records of the form {address: address}: key is an address
of inhumed object, value is an address of the tombstone. To prevent creation
tombstones for tombstones metabase must control incoming Inhume calls:
* if Inhume target is a tombstone, then "grave" should not be added;
* if {a1:a2} "grave" was created earlier and {a2: a3} "grave" came later,
then first "grave" must be removed as tomb-on-tomb.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Change Shard's garbage remover to interrupt iterating over the metabase
graveyard when the buffer is full to the max size (`WithRemoverBatchSize`
Shard's option).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `DB.IterateOverGraveyard` to immediately return nil if GraveHandler
returns ErrInterruptIterator.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add new epoch event handler to GC that finds all expired tombstones and
marks them and underlying objects to be removed. Shard uses callbacks
provided by the storage engine to mark underlying objects.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `DB.IterateCoveredByTombstones` method that iterates over graves
and handles all objects under one of the tombstones.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add new epoch event handler to GC that finds all expired non-tombstone
objects and marks them to be removed.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `DB.IterateExpired` method that iterates over the objects in
metabase that are expired at particular epoch.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Group handlers of the particular event to a WaitGroup and wait for it before
the next event handling. This will ensure that all handlers complete and
prevent potential conflicts between past and present jobs.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`Shard.Init` method creates a new GC instance from shard configuration and
starts GC's workers through `init` call. In initial implementation GC
routines are indefinite and can be killed only with by application shutdown.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Shard's GC component consists of:
* asynchronous remover that periodically wake up and removes all garbage
objects from the shard, and goes to sleep for particular time interval;
* external event listener that distributes jobs between workers;
* group of workers that can handle a single job related to particular
external event.
Remover and event listener represents go-routines which are started by
`init` method (calls from `Shard.Init`). In initial version all event
handlers are interrupted: this means that next event of the same type will
interrupt previous handling and start the new one.
GC is fully encapsulated in Shard. All GC configurations are reflected in
Shard's configuration.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `DB.IterateOverGraveyard` method that iterates over all graves and
passes passes their descriptors (new type `Grave`) to handler (new type
`GraveHandler`). `Grave` currently have buried object address and garbage
flag.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace single target address in `InhumePrm` with the list of addresses.
Change corresponding parameter in `WithTarget` and `MarkAsGarbage` methods
to variadic.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace single target address in `InhumePrm` with the list of addresses.
Rename `WithAddress` method to `WithAddresses` and change parameter to
variadic.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `StorageEngine.Delete` to execute `Inhume` operation with
`MarkAsGarbage` parameter on the `Shard` that holds the object. Searching of
the particular shard is performed through iterating over HRW-sorted shards.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `InhumePrm.MarkAsGarbage` method that leads to marking object as
garbage in metabase. Update `InhumePrm.WithTarget` doc indicating a conflict
with the new method.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `InhumePrm.WithGCMark` method that marks the object as garbage in
graveyard. Update `InhumePrm.WithTombstoneAddress` doc indicating a conflict
with the new method. Update `Inhume` function doc about tombstone address
parameter.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Delete operation of Metabase is performed on group of objects. The set being
removed can contain descendants of a common parent. In the case when all
descendants of a parent object are deleted, it must also be deleted from
the metabase. In the previous implementation, this was not done due to the
chosen approach to counting references to the parent.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation StorageEngine.Inhume operation forced Shard
.Inhume call on all internal shards. There is a need to inhume object in a
single shard. To achieve this, Inhume operation is performed in next steps:
1. iterate over sorted shards, check object presence through Exists call;
2. if object exists at any shard in step 1 => inhume it and return on
success;
3. if no shards contain the object => iterate over sorted shards again and
try to inhume the object at first possible shard;
4. if all Inhume calls are failed => return an error.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Container listing already supported in the metabase for `engine.List`
operation. To get container statistics engine should provide both the
option to get container volume estimation and list of all containers.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Objects of one container can be split among shards, so engine
should iterate over all available shards to sum all size
estimations.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Storage nodes keep container size estimation so they
can announce this info and hope for some basic income
settlements. This is also useful for monitoring.
Container size does not include non regular or inhumed
object sizes.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
There is a codecov issue because objects are not placed
in the engine the same way every unit test. Therefore
sometimes there are more coverage, sometimes there are
less. Seeded RNG should solve this issue for engine tests.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
In previous implementation DB.Containers method could return an error about
invalid container ID string format. This could happen if some of top-level
buckets had name w/o "_" substring.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Deadlock occurs when `getActivate` function opens new blobovnicza and that
invokes evict in LRU cache of open blobovniczas. `getActivate` makes
`activeMtx.Lock()` and then cache evict makes `activeMtx.RLock()` and deadlock
happens.
Fix contains two steps:
- add separate mutex to open blobovniczas (1),
- split single Lock outside of `updateAndGet` (2).
As for the (1) `bbolt.Open()` locks when it tries to open the same file from
two threads. So separate mutex will prevent that.
As for the (2) `updateAndGet` function contains from two parts. At first it
checks if required blobovnicza is ready and it returns it. In this case we can
use the simple RLock. But then there is an option when we should open new
blobovnicza and update map of active blobovniczas.
In this case we call `openBlobovnicza` without activeMtx lock. Cache evict
happens there and it won't cause deadlock.
Then we lock activeMtx to update the map of active blobovniczas. Concurrency can
happen there. However `openBlobovnicza` will not open the same blobovnicza twice,
so we can make one more check if opened blobovnicza was activated while thread was
locked in activeMtx. If so, then return active blobovnicza, else finish activation.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
In previous implementation Blobovnicza could incorrectly initialize
dimensional buckets: if SmallSizeLimit = 2 ^ X + Y && Y < 2 ^ X, then
largest dimensional bucket was [2 ^ (X - 1) : 2 ^ X]. This was caused by an
incorrect condition for stopping the iterator along the dimensional
boundaries.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
All parameters and resulting values of all metabase operations are
structured in new types. The most popular scenarios for using operations are
moved to auxiliary functions.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation Blobovnicza's stored objects in protocol format
which did not allow working with externally compressed objects. To achieve
this goal, operations Get and Put no longer work with the structure of the
object, but only with abstract binary data. Operation GetRange has become
incorrect in its original purpose to receive the payload range. In this
regard, BlobStor receives the payload range of the object through Get
operation. In the future either Blobovnicza will learn to compress objects
by itself, or the GetRange operation will be eliminated.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Blobovnicza returns object, so we can't put compressed
data there. Compressed data won't be deserialized correctly.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Replace ErrNotFound and ErrRangeOutOfBounds to core/object package in order
to share them across the libraries.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add blobovnicza instance to BlobStor structure. Create blobovnicza tree in
BlobStor constructor. Implement Open/Init/Close methods.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to support single blobovnicza in blobovnicza tree. This can
be achieved with a width of 1, and a depth of 0 or 1. With depth = 1 one
redundant directory is created, inside which there is a blobovnicza. If the
depth is zero, the blobobnivza will be in the root path. Fix negative
capacity in iterateDeepest method with zero depth.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>