Allow to extend blobstor with more storage sub-systems. Currently
objects stored in the FSTree have empty byte slice descriptor and object
from blobovnicza tree have the same id as earlier. Each such change in
the identifier formation should be accompanied with metabase version
increase.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Return all the objects on the empty common prefix search without search
optimizations that breaks boltDB logic.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If an object has not been marked for removal by the GC in the current epoch
yet but has already expired, respond with `ErrObjectNotFound` api status.
Also, optimize shard iteration: a node must stop any iteration if the object
is found but gonna be removed soon.
All the checks are performed by the Metabase.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If metabase can't be opened in the default mode, try opening shard
first in `ReadOnly` mode and then in `DegradedReadOnly`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
`Degraded` mode can be set by the administrator if needed.
Modifying operations in this mode can lead node into an inconsistent state
because metabase checks such as lock checking are not performed.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Reduce public interface of this package. Later each result will contain
an additional status, so it makes more sense to use the same functions
and result processing everywhere.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
The main problem is to distinguish the case of initial initialization
and update from version 0. We can't do this at `Open`, because of
`resync_metabase` flag. Thus, the following approach was taken:
1. During `Open` check whether the metabase was initialized.
2. Check for the version in `Init` or write the new one if the metabase
is new.
3. Update version in `Reset`.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Core changes:
* avoid package-colliding variable naming
* avoid using pointers to IDs where unnecessary
* avoid using `idSDK` import alias pattern
* use `EncodeToString` for protocol string calculation and `String` for
printing
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
- Delete objects physically on tombstone's arrival;
- Store information about tombstones in the Graveyard;
- Clear Graveyard every epoch based on the information about TS in the
network.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Add offset element to the iterations over deleted objects (both the
Graveyard and the Garbage buckets).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It allows storing information about object in both ways at the same time:
1. Metabase should know if an object is covered by a tombstone (that is
not expired yet);
2. It should be possible to physically delete objects covered by a
tombstone immediately (mark with GC) but keep tombstone knowledge.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
`apistatus` package provides types which implement build-in `error`
interface. Add `error of type` pattern when documenting these errors in
order to clarify how these errors should be handled (e.g. `errors.Is` is
not good).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace `ErrNotFound`/`ErrAlreadyRemoved` error from
`pkg/core/object` package with `ObjectNotFound`/`ObjectAlreadyRemoved`
one from `apistatus` package. These errors are returned by storage
node's server as NeoFS API statuses.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to process expired `LOCK` objects similar to `TOMBSTONE`
ones: we collect them on `Shard`, notify all other shards about
expiration so they could unlock the objects, and only after that mark
lockers as garbage.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `FormatValidator.ValidateContent` to verify payload of `LOCK`
objects. Pass locked objects to `Locker` interface. Require from
`Locker.Lock` to return `apistatus.IrregularObjectLock` error on a
corresponding condition.
Also add error return to `DeleteHandler.DeleteObjects` method. Require
from method to return `apistatus.ObjectLocked` error on a corresponding
condition. Adopt implementations.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`Inhume` operation can potentially mark lockers as garbage. There is a
need to update locker list in locked bucket.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `DB.Lock` to return `apistatus.IrregularObjectLock` if at least one
of the locked objects is irregular (not of type REGULAR).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `DB.Inhume` to return `apistatus.ObjectLocked` if at least one of
the inhumed objects is locked.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `DB.IterateCoveredByTombstones` to not pass locked objects to the
handler. The method is used by GC, therefore it will not consider locked
objects as candidates for deletion even if their tombstone is expired.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `DB.IterateExpired` to not pass locked objects to the handler. The
method is used by GC, therefore it will not consider them as candidates
for deletion.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After introduction of LOCK objects (of type `TypeLock`) complicated
extended its behavior:
* create `lockers` container bucket (LCB) during PUT;
* remove object from LCB during DELETE;
* look up object in LCB during EXISTS;
* get object from LCB during GET;
* list objects from LCB during LIST with cursor;
* select objects from LCB during SELECT with '*'.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `DB.Lock` method which marks list of the objects as locked by
another object. Only regular objects can be locked.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Create class of container buckets with `LOCKED` suffix. Put identifiers
of the objects of type `LOCK` to these buckets during `DB.Put`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Remove `Object` and `RawObject` types from `pkg/core/object` package.
Use `Object` type from NeoFS SDK Go library everywhere. Avoid using the
deprecated elements.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
According to BoltDB documentation bucket `value is only valid for the
life of the transaction`.
Make `DB.IsSmall` copy value slice in order to prevent potential memory
corruptions (e.g. `runtime.stringtobyteslice` cast).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
ListWithCursor allows listing physically stored objects
from metabase with small chunks. Cursor tracks last
processed object, therefore new chunks are returned
on each request.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Container listing should not ignore tombstone and
storage group objects which are not stored in
primary buckets.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Each object from graveyard has tombstone or GC mark. If object has
tombstone, metabase should return `ErrAlreadyRemoved` on object requests.
This is the case when user clearly removed the object from container. GC
marks are used for physical removal which can appear even if object is still
presented in container (Control service, Policer job, etc.). In this case
metabase should return 404 error on object requests.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In the previous implementation of the metabase, there was no possibility of
reinitializing the metabase: clearing information about existing objects and
bringing it back to its initial state. This operation can be useful in
cases when the stored metadata about objects has lost (or possibly lost)
relevance, and you need to generate data from scratch. Also at the
initialization stage, static resources of the base were not created -
container-independent buckets.
Make `Metabase.Init` method to allocate graveyard, container-size and
to-move-it buckets in underlying BoltDB instance. Implement `Metabase.Reset`
method: it works like `Init` but clean up all static buckets and removes
other ones. Due to the logical similarity, the methods share a single piece
of code.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Changes:
* replace `iotuil` elements with the ones from `os` package;
* replace `os.Filemode` with `fs.FileMode`;
* use `signal.NotifyContext` instead of `NewGracefulContext` (removed).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
This function already reused in different storage engine parts
so it makes sense to keep it in separate package.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Different SplitInfo parts may be stored in different shards. Storage
engine must not stop at first SplitInfoError and should make
best effort to complete SplitInfo structure if needed.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
`Inhume` operation can be performed on already deleted objects, and in this
case the entry will be added to the graveyard. `Delete` operation finishes
with error if object is not presented in metabase. However, the entry in the
cemetery must be deleted regardless of the presence of the object.
Additionally, now `Delete` does not return an error in the absence of an
object.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Metabase should not store payloads of objects. Make Put operation to cut
object payload before saving binary object in metabase.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Metabase should not store payloads of objects. Set payload in generated test
object. Ascertain that objects returned by Get method have no payload.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Creating tombstones for tombstones is prohibited in NeoFS system. Metabase
graveyard contains records of the form {address: address}: key is an address
of inhumed object, value is an address of the tombstone. To prevent creation
tombstones for tombstones metabase must control incoming Inhume calls:
* if Inhume target is a tombstone, then "grave" should not be added;
* if {a1:a2} "grave" was created earlier and {a2: a3} "grave" came later,
then first "grave" must be removed as tomb-on-tomb.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `DB.IterateOverGraveyard` to immediately return nil if GraveHandler
returns ErrInterruptIterator.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `DB.IterateCoveredByTombstones` method that iterates over graves
and handles all objects under one of the tombstones.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `DB.IterateExpired` method that iterates over the objects in
metabase that are expired at particular epoch.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `DB.IterateOverGraveyard` method that iterates over all graves and
passes passes their descriptors (new type `Grave`) to handler (new type
`GraveHandler`). `Grave` currently have buried object address and garbage
flag.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace single target address in `InhumePrm` with the list of addresses.
Rename `WithAddress` method to `WithAddresses` and change parameter to
variadic.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement `InhumePrm.WithGCMark` method that marks the object as garbage in
graveyard. Update `InhumePrm.WithTombstoneAddress` doc indicating a conflict
with the new method. Update `Inhume` function doc about tombstone address
parameter.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Delete operation of Metabase is performed on group of objects. The set being
removed can contain descendants of a common parent. In the case when all
descendants of a parent object are deleted, it must also be deleted from
the metabase. In the previous implementation, this was not done due to the
chosen approach to counting references to the parent.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Storage nodes keep container size estimation so they
can announce this info and hope for some basic income
settlements. This is also useful for monitoring.
Container size does not include non regular or inhumed
object sizes.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
There is a codecov issue because objects are not placed
in the engine the same way every unit test. Therefore
sometimes there are more coverage, sometimes there are
less. Seeded RNG should solve this issue for engine tests.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
In previous implementation DB.Containers method could return an error about
invalid container ID string format. This could happen if some of top-level
buckets had name w/o "_" substring.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
All parameters and resulting values of all metabase operations are
structured in new types. The most popular scenarios for using operations are
moved to auxiliary functions.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace ErrNotFound and ErrRangeOutOfBounds to core/object package in order
to share them across the libraries.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
With exist check we should index parent first, because
as soon as child will be added to metabase, exist on
parent will return true even if it was not indexed yet.
Also this commit makes one db.Update instead of two for
parent and child.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
With updated specification of object related operation
we don't have this search attribute any more and we
should not use functions related to this attribute.
This commit breaks object service logic, however it will
be fixed later.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Now root and phy (leaf) filters work like flags. They work with
any matcher and any value. So meta-storage sets `true` value for
all root and phy objects and puts them into separate bucket.
We also do not work with inversion anymore, so it either added
to the bucket or not. We don't need to store both options.
This is the reason `selectAll` function is changed a bit. Now
it performs some low-level parsing from primary bucket and root
bucket.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Revert commit 0faa40e4 to increase the disk space consumed by the
metabase in favor of the speed of index updates.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In the previous implementation of the metabase, it was necessary to write
virtual objects to the primary index to be able to select them. In this
approach, virtual objects can be obtained directly using Head operation.
This has a side effect in handling object operations that do not expect to
receive a virtual object header in a single operation. With recent changes,
it is no longer necessary to have records of virtual objects in the primary
index, so this no longer happens for system integrity.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Fix a bug in the selection when removed object that matches search query
provoked the return of an empty result.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Fix a bug in the selection when an object could be added to the result after
a mismatch in the previous filter.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In the previous implementation of the metabase, the unique value of the
header was assigned a bucket, the elements of which were leaves with a
key-address and an empty value. This approach was relatively efficient in
terms of write speed. However, a large number of buckets led to a rapid
increase in the database volume (~4GB for 100K objects with unique
attributes). An approach is presented with storing indexes on the value of
headers in the leaves of the tree, where the keys are the unique values of
the header, and the values are a serialized list of addresses (gob
encoding is temporarily used for serialization).
The new approach gave a good result in saving space (~350MB), however, it
significantly reduced the write speed with an increase in the number of
objects (~ 80x after 100K objects).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
The previous metabase implementation took an exclusionary approach: filters
narrowed the set of all objects to those that match all filters. An
inclusive approach is presented. In it, when traversing the indexed headers,
the object becomes a candidate for selection. If at least one of the
subsequent filters is not passed, the object ceases to be a candidate. At
the end of the traversal, the remaining candidates are added to the
resulting sample. The borderline case of no filters is handled in a special
way: all stored objects are added to the resulting selection.
Presented inclusive approach showed better performance in most scenarios
(although not all).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous metabase implementation the absence of an attribute presented in
the search filter did not exclude the object from the result. Change this
behavior to exclude the object from the result.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Replace meta Bucket with meta.DB instance in local storage implementation.
Adopt all dependent components to new local storage.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Process parent objects in Put method. Headers of parent object are stored as
regular leaf objects in metabase from now. Build indexes for ROOT, LEAF and
CHILDFREE properties.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement Delete method on DB structure that adds deleted addresses to
tombstone index. Do not attach addresses from tombstone index to Select
result. Return error from Get method if address is presented in tombstone
index.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement bolt-based metabase that is going to be used in local object
storage. Implement Put/Get/Select methods.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>