Dump contains magic and a list of objects prefixed by object size in bytes.
We can't use proto-marshaled list because this requires having all dump
in memory. Using TAR induces 512 byte overhead for each object which can
be a problem in some cases.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
For some data compression makes little sense, as it is already compressed.
This commit allows to leave such data unchanged based on `Content-Type`
attribute. Currently exact, prefix and suffix matching are supported.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Provide shard mode information via `DumpInfo()`. Delete atomic field from
Shard structure since it duplicates new field.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Shard's mode was not used in the Node, so added only two modes whose roles
are clear. More modes will be added in the future.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Make `flushBigObjects` routine to mark objects which are written to
`BlobStor`. This prevents already flushed objects from being written on
the next iterator tick.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
For fullness estimation of `Blobovnicza` we use number of object stored
in each size bucket. In previous implementation we multiplied the number
by the difference in bucket boundaries. This expression rather
estimated the minimum volume (and for the smallest bucket, the maximum)
of objects in the bucket.
Multiply number of objects by mean bucket size.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `syncFullnessCounter` to accept `bbolt.Tx` argument of Bolt
transaction within which counter should be synchronized. Pass
corresponding transaction during `Init`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
According to BoltDB documentation bucket `value is only valid for the
life of the transaction`.
Make `DB.IsSmall` copy value slice in order to prevent potential memory
corruptions (e.g. `runtime.stringtobyteslice` cast).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
ListWithCursor allows listing physically stored objects
from metabase with small chunks. Cursor tracks last
processed object, therefore new chunks are returned
on each request.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Make `BlockExecution` / `ResumeExecution` to not release per-shard worker
pools. Make `StorageEngine.Close` to block these methods and any
data-related operations. It is still releases the pools.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Use `sync.Once` to prevent locks of stopping GC. It will also allow to
safely call `Shard.Close` multiple times.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to disable execution of local data operation on storage
engine in runtime. If storage engine ops are blocked, node will act like
always but all local object operations will be denied.
Implement `BlockExecution` / `ResumeExecution` methods on `StorageEngine`
which blocks / resumes the execution of data ops. Wait for the completion of
all operations executed at the time of the call. Return error passed to
`BlockExecution` from all data-related methods until `ResumeExecution` call.
Make `Close` to block operations as well.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
a1696a8 introduced some logic which in some situations prevented big objects
to be persisted in FSTree. In this commit a refactoring is done with the
goal of simplifying the code and also checking #866 issue.
1. Split a monstrous function into multiple simple ones: memory objects
can only be small and for writing through the cache we can do a dispatch
in `Put` itself.
2. Determine objects to be put in database before the actual update
as setting up a transaction has non-zero overhead.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Container listing should not ignore tombstone and
storage group objects which are not stored in
primary buckets.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>
Some of the pools are initialized during config initialization,
so it isn't possible currently to release them in one place.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Make `StorageEngine` to use non-blocking worker pools with the same
(configurable) size for PUT operation. This allows you to switch to using
more free shards when overloading others, thereby more evenly distributing
the write load.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
We should be able to read whatever we have written earlier.
Compression setting applies only to the new objects.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Do not log in options constructors. Also failure to
initialize compression module (possibly due to invalid options) is
certainly an error deserving proper treatment.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
There is a need to list addresses of the small objects stored in WriteCache
database.
Implement `IterateDB` function which accepts BoltDB instance and iterate
over all saved objects and passes their addresses to the hander.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to open Blobovnicza instances in read-only mode in some
cases.
Add `ReadOnly` option. Do not create dir path in RO. Open underlying BoltDB
instance with ReadOnly flag. Document thal all writing operations should not
be called in ro (otherwise BoltDB txs fail).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`Blobovnicza` can be initialized with any number of range buckets, and
reconstructed with different size limit. In previous implementation
`Iterate` could miss some stored objects if we construct `Blobovnicza` with
smaller number of ranges.
Make `Iterate` to traverse all buckets regardless of current instance
bounds.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation `Blobovnicza.Iterate` op decoded object data only
and passed it to the handler. There is a need to iterate over all addresses
of the stored objects.
Add `DecodeAddresses` and `WithoutData` methods of `IteratePrm` type. Add
`Address` method to `IterationElement` type. Make `Iterate` to decode object
addresses if `DecodeAddress` was called and not read the data if
`WithoutData` was called. Implement `IterateAddresses` helper function to
simplify the code.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Each object from graveyard has tombstone or GC mark. If object has
tombstone, metabase should return `ErrAlreadyRemoved` on object requests.
This is the case when user clearly removed the object from container. GC
marks are used for physical removal which can appear even if object is still
presented in container (Control service, Policer job, etc.). In this case
metabase should return 404 error on object requests.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`List` method of `Shard` must return only physically stored objects.
Use `AddPhyFilter` to select only phy objects.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Tombstone and "alive" objects can be both stored in BlobStor. They can
appear during iterating in different order. Metabase returns
`ErrAlreadyRemoved` error if object is inhumed.
Ignore `object.ErrAlreadyRemoved` errors of `metabase.Put`in Shard's
`refillMetabase` operation.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to refill Metabase data with the objects from BlobStor.
Implement `refillMetabase` method which iterates over all objects from
BlobStor and saves them in Metabase.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to be able to process all objects saved in `BlobStor`.
Implement `BlobStor.Iterate` method which iterates over all objects.
Implement `IterateBinaryObjects` and `IterateObjects` helper functions to
simplify the code.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to be able to process all stored objects saved in
`Blobovnicza`.
Implement `Blobovnicza.Iterate` method which iterates over all objects.
Implement `IterateObjects` helper function to simplify the code.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In the previous implementation of the metabase, there was no possibility of
reinitializing the metabase: clearing information about existing objects and
bringing it back to its initial state. This operation can be useful in
cases when the stored metadata about objects has lost (or possibly lost)
relevance, and you need to generate data from scratch. Also at the
initialization stage, static resources of the base were not created -
container-independent buckets.
Make `Metabase.Init` method to allocate graveyard, container-size and
to-move-it buckets in underlying BoltDB instance. Implement `Metabase.Reset`
method: it works like `Init` but clean up all static buckets and removes
other ones. Due to the logical similarity, the methods share a single piece
of code.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to limit disk space used by write-cache. It is almost
impossible to calculate the value exactly. It is proposed to estimate the
size of the cache by the number of objects stored in it.
Track amounts of objects saved in DB and FSTree separately. To do this,
`ObjectCounters` interface is defined. It is generalized to a store of
numbers that can be made persistent (new option `WithObjectCounters`). By
default DB number is calculated as key number in default bucket, and FS
number is set same to DB since it is currently hard to read the actual value
from `FSTree` instance. Each PUT/DELETE operation to DB or FS
increases/decreases corresponding counter. Before each PUT op an overflow
check is performed with the following formula for evaluating the occupied
space: `NumDB * MaxDBSize + NumFS * MaxFSSize`. If next PUT can cause
write-cache overflow, object is written to the main storage.
By default maximum write-cache size is set to 1GB.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to keep track of each local storage change. Log messages are
the most convenient way to do it.
Implement function which writes log message about the completed writing
operation in storage engine.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Shard should try to read object headers from write-cache if it is enabled.
Extend `writecache.Cache` interface with `Head` method. Call the method in
`Shard.Head` if `Shard.hasWriteCache` returns true.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Write cache should be able to execute HEAD operations according to spec.
Add simple implementation of `Head` method through the `Get` one. Leave
notes for future optimization.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Changes:
* replace `iotuil` elements with the ones from `os` package;
* replace `os.Filemode` with `fs.FileMode`;
* use `signal.NotifyContext` instead of `NewGracefulContext` (removed).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
If object to be inhumed is root we need to continue first traverse over the
shards. In case when several children are stored in different shards,
inhuming object in a single shard leads to appearance of inhumed object in
subsequent selections. Also, any object can be already inhumed, and this
case is equivalent to successful inhume.
Do not fail on `object.ErrAlreadyRemoved` error. Continue first iterating
over shards if we detected root object (`SplitInfoError`).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Write unit tests of `StorageEngine.Inhume` which assert that inhumed objects
don't appear in `Select` result.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Evicting from cache requires closing blobovnicza which
in turn needs to lock `activeMtx`. This lock is not needed on
every addition, but our LRU library doesn't return evicted keys.
In future we may consider switching to other implementation.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
This function already reused in different storage engine parts
so it makes sense to keep it in separate package.
Signed-off-by: Alex Vanin <alexey@nspcc.ru>