`Inhume` operation can be performed on already deleted objects, and in this
case the entry will be added to the graveyard. `Delete` operation finishes
with error if object is not presented in metabase. However, the entry in the
cemetery must be deleted regardless of the presence of the object.
Additionally, now `Delete` does not return an error in the absence of an
object.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Metabase should not store payloads of objects. Make Put operation to cut
object payload before saving binary object in metabase.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Metabase should not store payloads of objects. Set payload in generated test
object. Ascertain that objects returned by Get method have no payload.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add `InhumePrm.MarkAsGarbage` method which marks passed objects to be
removed from local storage. Update `InhumePrm.WithTarget` doc to prevent
conflicting use with the new method.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Creating tombstones for tombstones is prohibited in NeoFS system. Metabase
graveyard contains records of the form {address: address}: key is an address
of inhumed object, value is an address of the tombstone. To prevent creation
tombstones for tombstones metabase must control incoming Inhume calls:
* if Inhume target is a tombstone, then "grave" should not be added;
* if {a1:a2} "grave" was created earlier and {a2: a3} "grave" came later,
then first "grave" must be removed as tomb-on-tomb.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Change Shard's garbage remover to interrupt iterating over the metabase
graveyard when the buffer is full to the max size (`WithRemoverBatchSize`
Shard's option).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `DB.IterateOverGraveyard` to immediately return nil if GraveHandler
returns ErrInterruptIterator.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add new epoch event handler to GC that finds all expired tombstones and
marks them and underlying objects to be removed. Shard uses callbacks
provided by the storage engine to mark underlying objects.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `DB.IterateCoveredByTombstones` method that iterates over graves
and handles all objects under one of the tombstones.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add new epoch event handler to GC that finds all expired non-tombstone
objects and marks them to be removed.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `DB.IterateExpired` method that iterates over the objects in
metabase that are expired at particular epoch.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Group handlers of the particular event to a WaitGroup and wait for it before
the next event handling. This will ensure that all handlers complete and
prevent potential conflicts between past and present jobs.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`Shard.Init` method creates a new GC instance from shard configuration and
starts GC's workers through `init` call. In initial implementation GC
routines are indefinite and can be killed only with by application shutdown.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Shard's GC component consists of:
* asynchronous remover that periodically wake up and removes all garbage
objects from the shard, and goes to sleep for particular time interval;
* external event listener that distributes jobs between workers;
* group of workers that can handle a single job related to particular
external event.
Remover and event listener represents go-routines which are started by
`init` method (calls from `Shard.Init`). In initial version all event
handlers are interrupted: this means that next event of the same type will
interrupt previous handling and start the new one.
GC is fully encapsulated in Shard. All GC configurations are reflected in
Shard's configuration.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `DB.IterateOverGraveyard` method that iterates over all graves and
passes passes their descriptors (new type `Grave`) to handler (new type
`GraveHandler`). `Grave` currently have buried object address and garbage
flag.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace single target address in `InhumePrm` with the list of addresses.
Change corresponding parameter in `WithTarget` and `MarkAsGarbage` methods
to variadic.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace single target address in `InhumePrm` with the list of addresses.
Rename `WithAddress` method to `WithAddresses` and change parameter to
variadic.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `StorageEngine.Delete` to execute `Inhume` operation with
`MarkAsGarbage` parameter on the `Shard` that holds the object. Searching of
the particular shard is performed through iterating over HRW-sorted shards.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `InhumePrm.MarkAsGarbage` method that leads to marking object as
garbage in metabase. Update `InhumePrm.WithTarget` doc indicating a conflict
with the new method.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `InhumePrm.WithGCMark` method that marks the object as garbage in
graveyard. Update `InhumePrm.WithTombstoneAddress` doc indicating a conflict
with the new method. Update `Inhume` function doc about tombstone address
parameter.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Delete operation of Metabase is performed on group of objects. The set being
removed can contain descendants of a common parent. In the case when all
descendants of a parent object are deleted, it must also be deleted from
the metabase. In the previous implementation, this was not done due to the
chosen approach to counting references to the parent.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation StorageEngine.Inhume operation forced Shard
.Inhume call on all internal shards. There is a need to inhume object in a
single shard. To achieve this, Inhume operation is performed in next steps:
1. iterate over sorted shards, check object presence through Exists call;
2. if object exists at any shard in step 1 => inhume it and return on
success;
3. if no shards contain the object => iterate over sorted shards again and
try to inhume the object at first possible shard;
4. if all Inhume calls are failed => return an error.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Container listing already supported in the metabase for `engine.List`
operation. To get container statistics engine should provide both the
option to get container volume estimation and list of all containers.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Objects of one container can be split among shards, so engine
should iterate over all available shards to sum all size
estimations.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Storage nodes keep container size estimation so they
can announce this info and hope for some basic income
settlements. This is also useful for monitoring.
Container size does not include non regular or inhumed
object sizes.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
There is a codecov issue because objects are not placed
in the engine the same way every unit test. Therefore
sometimes there are more coverage, sometimes there are
less. Seeded RNG should solve this issue for engine tests.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
In previous implementation DB.Containers method could return an error about
invalid container ID string format. This could happen if some of top-level
buckets had name w/o "_" substring.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Deadlock occurs when `getActivate` function opens new blobovnicza and that
invokes evict in LRU cache of open blobovniczas. `getActivate` makes
`activeMtx.Lock()` and then cache evict makes `activeMtx.RLock()` and deadlock
happens.
Fix contains two steps:
- add separate mutex to open blobovniczas (1),
- split single Lock outside of `updateAndGet` (2).
As for the (1) `bbolt.Open()` locks when it tries to open the same file from
two threads. So separate mutex will prevent that.
As for the (2) `updateAndGet` function contains from two parts. At first it
checks if required blobovnicza is ready and it returns it. In this case we can
use the simple RLock. But then there is an option when we should open new
blobovnicza and update map of active blobovniczas.
In this case we call `openBlobovnicza` without activeMtx lock. Cache evict
happens there and it won't cause deadlock.
Then we lock activeMtx to update the map of active blobovniczas. Concurrency can
happen there. However `openBlobovnicza` will not open the same blobovnicza twice,
so we can make one more check if opened blobovnicza was activated while thread was
locked in activeMtx. If so, then return active blobovnicza, else finish activation.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
In previous implementation Blobovnicza could incorrectly initialize
dimensional buckets: if SmallSizeLimit = 2 ^ X + Y && Y < 2 ^ X, then
largest dimensional bucket was [2 ^ (X - 1) : 2 ^ X]. This was caused by an
incorrect condition for stopping the iterator along the dimensional
boundaries.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
All parameters and resulting values of all metabase operations are
structured in new types. The most popular scenarios for using operations are
moved to auxiliary functions.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation Blobovnicza's stored objects in protocol format
which did not allow working with externally compressed objects. To achieve
this goal, operations Get and Put no longer work with the structure of the
object, but only with abstract binary data. Operation GetRange has become
incorrect in its original purpose to receive the payload range. In this
regard, BlobStor receives the payload range of the object through Get
operation. In the future either Blobovnicza will learn to compress objects
by itself, or the GetRange operation will be eliminated.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Blobovnicza returns object, so we can't put compressed
data there. Compressed data won't be deserialized correctly.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Replace ErrNotFound and ErrRangeOutOfBounds to core/object package in order
to share them across the libraries.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Add blobovnicza instance to BlobStor structure. Create blobovnicza tree in
BlobStor constructor. Implement Open/Init/Close methods.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to support single blobovnicza in blobovnicza tree. This can
be achieved with a width of 1, and a depth of 0 or 1. With depth = 1 one
redundant directory is created, inside which there is a blobovnicza. If the
depth is zero, the blobobnivza will be in the root path. Fix negative
capacity in iterateDeepest method with zero depth.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
With exist check we should index parent first, because
as soon as child will be added to metabase, exist on
parent will return true even if it was not indexed yet.
Also this commit makes one db.Update instead of two for
parent and child.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Place the root of blobovnicza tree in a subdirectory of BlobStor with same
permissions. Abolish WithBlobovniczaRootPath and WithBlobovniczaPersmissions
options.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Blobovnicza ID parameter provides the ability to specify particular
blobovnicza to delete object from. In this case only specified blobovnicza
is processed.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation objects were classified by size according to
payload size. From now they are classified by the size of their binary
representation.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement "big or small" property classifier (only the size of the payload
is temporarily considered). Save "big" objects in shallow dir. Save "small"
objects in shallow dir until the moment of implementation of blobovnicza.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement Put/Get/GetRange/Select/SelectAll functions over storage
engine. These functions are going to be used by Object service.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation Shard accessed the BlobStor to get the
object header. However, the shard must take headers from the metabase.
From now zero length of the requested payload range seens as object
header request. In this case shard calls metabase to get the header.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation WithCompressObjects returned Option
than could panic if zstd (de)compressor creation failed with error.
From now errors with (de)compressor creation result in an option
without using data compression. In this case, the error is written
to the log passed to the option constructor.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation each operation on local storage
locked engine mutex. This was done under the assumption that
the weights of the shards change as a result of write operations.
With the transition to static weights of shards, it is no longer
necessary to lock the global mutex during the execution of operations.
However, since the set of engine shards is dynamic, there is still a
need to control multiple access to this set. The same mutex is used
for synchronization.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation each shard operation locked
RW shard mutex. With this approach RW operations were executed
one-by-one and blocked the execution of RO operations.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Correct the calculation of maximum value of fs tree depth. Fix check
of the max depth overflow in WithShallowDepth function.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
GetPrm has WithPayloadRange option to specify the requested
payload range. In previous implementation StorageEngine.Get
method ignored this option. From now zero length matches
full payload request.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
With updated specification of object related operation
we don't have this search attribute any more and we
should not use functions related to this attribute.
This commit breaks object service logic, however it will
be fixed later.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Now root and phy (leaf) filters work like flags. They work with
any matcher and any value. So meta-storage sets `true` value for
all root and phy objects and puts them into separate bucket.
We also do not work with inversion anymore, so it either added
to the bucket or not. We don't need to store both options.
This is the reason `selectAll` function is changed a bit. Now
it performs some low-level parsing from primary bucket and root
bucket.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Revert commit 0faa40e4 to increase the disk space consumed by the
metabase in favor of the speed of index updates.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In the previous implementation of the metabase, it was necessary to write
virtual objects to the primary index to be able to select them. In this
approach, virtual objects can be obtained directly using Head operation.
This has a side effect in handling object operations that do not expect to
receive a virtual object header in a single operation. With recent changes,
it is no longer necessary to have records of virtual objects in the primary
index, so this no longer happens for system integrity.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Fix a bug in the selection when removed object that matches search query
provoked the return of an empty result.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Fix a bug in the selection when an object could be added to the result after
a mismatch in the previous filter.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In the previous implementation of the metabase, the unique value of the
header was assigned a bucket, the elements of which were leaves with a
key-address and an empty value. This approach was relatively efficient in
terms of write speed. However, a large number of buckets led to a rapid
increase in the database volume (~4GB for 100K objects with unique
attributes). An approach is presented with storing indexes on the value of
headers in the leaves of the tree, where the keys are the unique values of
the header, and the values are a serialized list of addresses (gob
encoding is temporarily used for serialization).
The new approach gave a good result in saving space (~350MB), however, it
significantly reduced the write speed with an increase in the number of
objects (~ 80x after 100K objects).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
The previous metabase implementation took an exclusionary approach: filters
narrowed the set of all objects to those that match all filters. An
inclusive approach is presented. In it, when traversing the indexed headers,
the object becomes a candidate for selection. If at least one of the
subsequent filters is not passed, the object ceases to be a candidate. At
the end of the traversal, the remaining candidates are added to the
resulting sample. The borderline case of no filters is handled in a special
way: all stored objects are added to the resulting selection.
Presented inclusive approach showed better performance in most scenarios
(although not all).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous metabase implementation the absence of an attribute presented in
the search filter did not exclude the object from the result. Change this
behavior to exclude the object from the result.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>