In case we have many small objects in the write-cache, `indices` should
not be reused between iterations.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Previously, node could get an "infinite" small object: it could be expired
and thus could not be flushed (update its storage ID) to metabase => could
not be marked as flushed => node never removes such object and repeat all
the cycle one more time. If object exists and is not marked with GC (meta
returns `ErrObjectIsExpired`, not `ObjectNotFound` and not
`ObjectAlreadyRemoved`), its ID is safe to update _in the same_ bbolt
transaction.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Currently, DELETE service sets tombstone expiration epoch to
`current epoch + 5`. This works less than ideal in private networks
where an epoch can be e.g. 10 minutes. In this case, after a node is
unavailable for more than 1 hour, already deleted objects have a chance
to reappear.
After this commit tombstone lifetime can be configured.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
LRU `Peek`/`Contains` take LRU mutex _inside_ of a `View` transaction.
`View` transaction itself takes `mmapLock` [1], which is lifted after tx
finishes (in `tx.Commit()` -> `tx.close()` -> `tx.db.removeTx`)
When we evict items from LRU cache mutex order is different:
first we take LRU mutex and then execute `Batch` which _does_ take
`mmapLock` in case we need to remap. Thus the deadlock.
[1] 8f4a7e1f92/db.go (L708)
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
To achieve high performance we must choose proper values for both
batch size and delay. For user operations we want to set low delay.
However it would prevent tree synchronization operations to form big
enough batches. For these operations, batching gives the most benefit
not only in terms of on-CPU execution cost, but also by speeding up
transaction persist (`fsync`).
In this commit we try merging batches that are already
_triggered_, but not yet _started to execute_. This way we can still
query batches for execution after the provided delay while also allowing
multiple formed batches to execute faster.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
If we had lots of domains in one zone, `dump-hashes` for all others
can miss some domains, because we need to restrict ourselves with _some_
number.
In this commit we use neo-go sessions by default, with a proper
failback to in-script iterator unwrapping.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
"Object is expired" means that object is presented in `meta` but it is not
`ObjectNotFound` error. Previous implementation made `shard` search for an
object without `meta` which was an error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Currently we track based on `PayloadSize`, because it is already stored
in the metabase and it is easier to calculate without slowing down the
whole system.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
We rarely need to list all containers: as one example
we need it for tree service synchronization once per epoch.
Given that cache TTL has the order of block time it makes no sense
to cache the list of all containers.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
In the previous implementation any non-nil error that preceded object
fetching from blobstor led to iterating over every storage (in other words,
no storage ID information was taken into account). Now storage ID is
skipped only if metabase (storage ID source) returns any error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
After the reconnection interval feature there was an bug related to the big
objects collecting: split error is returned from a client directly, not
via API status and was considered as a connection error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
`GETRANGEHASH` request spawns `GETRANGE` requests if an object could not be
found locally. If the original request contains session, it can be static
and, therefore, fetching session key can not be performed successfully.
As the best effort a node could request object's range with its own key.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Both `meta` and `write-cache` are expected to have a fast underlying disk,
so it does not seem like an optimisation. Moreover, `write-cache`'s `Head`
is a `Get` with payload cutting, it _must_ use more memory for no reason
(`meta` was created for such requests). Also, `write-cache` does not allow
performing any "meta" relations checks (such as locking, tombstoning).
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
We use cache to avoid policing the same object multiple times in a short
time span (< 30 seconds). If we have 200_000 objects in a blobstor, it is a bit useless
-- if it takes 1 second to process an object and we have `replicator.pool_size: 20`
in config, the next iteration will happen in 10_000 second which is much
larger than 30 second. However we still consume a lot of memory, so it
makes sense to use saner default.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
In NeoFS containers can be removed on behalf of its owner only. To
improve user experience, there is a need to add ownership check to the
removal command of the NeoFS CLI.
Check container ownership in `container delete` command `Run` function.
The check can be skipped by `--force` option.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Stop child objects collection if the last returned object (the most "left"
object in the collected chain) starts exactly from the `GETRANGE`'s `from`
value.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
That could happen if a node forwards request to a node that closed the
connection during the original object stream.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Could be related to "websocket users limit reached" on the `neo-go` server
side when an SN/IR is rebooting repeatedly.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Includes:
1. mode change read lock operation in every exported method that r/w the
underlying database;
2. returning `ErrDegradedMode` logical error if any exported method is
called in degraded (without a metabase) mode.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Includes extending listing methods in the Storage Engine with object types.
It allows tuning replication/policer algorithms: container nodes do
not remove `LOCK` objects as redundant and try to fulfill `LOCK` placement
on the ohter container nodes.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
A container node is expected to have full "get" access to assemble the
object.
A non-container node is expected to forward any request to a container node.
Any token is expected to be issued for an original request sender not for a
node so any new request is invalid by design with that token.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Do not lose meta information of the original requests: cache session and
bearer tokens of the original request b/w a new generated ones. Middle
request wrappers should not contain any meta information, since it is
useless (e.g. ACL service checks only the original tokens).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
After presenting request statuses on the API level, all the errors are
unwrapped before sending to the caller side. It led to a losing invalid
request's context.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>