New network status of storage nodes is going to be introduced. To
simplify the addition, it would be useful to prepare the code for this.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation Inner Ring allowed storage nodes with any
state to register in the network. According to the current design, only
nodes with ONLINE state are allowed to enter the network map.
Create new `state` sub-package of `nodevalidation` package of Inner Ring
application. Define `state.NetMapCandidateValidator` type and provide
`NodeValidator` interface required by the Inner Ring's processor of
`Netmap` contract's notification events. Embed new validator into the
one used by the Inner Ring application.
From now all `AddPeer` notifications with node state other than `ONLINE`
will be denied.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Degraded mode allows us to operate without an SSD,
thus writecache should be unavailable in this mode.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Add a common error for this case because it is not an error
which should increase error counter. Single error simplifies checks on
the call-site.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Negative values have no sense. On the other hand it differs from the
blobovnicza's configuration and prevents unification.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Allow user to initiate flushing objects from a writecache.
We need this in 2 cases:
1. During writecache storage schema update, it should be flushed with
the old version of node and started clean with a new one.
2. During SSD replacement, to avoid data loss.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Replace `ProcessCurrentNetMap` method of `NodeState` interface with
`ReadCurrentNetMap` one with two changes:
* Replace network map type from NeoFS SDK package with the
protocol-generated message. This replaces all the business logic to
the application layer.
* Support error return. This allows to cover problem node states.
Return an error from `NodeState.ReadCurrentNetMap` method implemeted
through `atomic.Value` if `Store` method has not been called yet.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Use a simple loop at the callsite. This way we remove as much as we can.
Also, `Delete` metrics is more meaningful now.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
- Meta now supports (and requires) inc/dec labeled counters
- The new logic counter is incremented on `Put` operations and is
decremented on `Inhume` and `Delete` operations that are performed on
_stored_ objects only
- Allow force counters sync. "Force" mode should be used on metabase resync
and should not be used on a regular meta start
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It will allow to use labeled object counters, e.g.: "phy" for all
the physically stored objects, "logic" for only available objects. Also, it
will allow to add typed (regular, TS, SG, LOCK) counters if needed.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Do not call `CalculateAction` for the eACL checks since it requires object
headers that are meaningless in the tree context.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
1. Do not require a request to be signed by the container owner if a
bearer token is missing
2. Do not check the system role since public requests are not expected to
be signed by IR or a container node (unlike the object requests)
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Sometimes it is useful to track sidechain events processed by the node.
For example, we need some easy way to look up for log messages about new
epoch notifications.
Make `subscriber.Subscriber` to log names of all events received from
the sidechain notification channel.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Re-configuration of Blobovnicza's object size limit must not affect
already stored objects. In previous implementation `Blobovnicza` didn't
see objects stored in buckets which became too big after size limit
re-configuration. For example, lets consider 1st configuration with 64KB
size limit for stored objects. Lets assume that we stored object of 64KB
size, and re-configured `Blobovnicza` with 32KB limit. After reboot
object should be still available, but actually it isn't. This is caused
by `Get` operation algorithm which iterates over configured size ranges
only, and doesn't process any other existing size bucket. By the way,
increasing of the object size limit didn't lead to the problem even in
previous implementation.
Make `Blobovnicza.Get` method to iterate over all size buckets
regardless of current configuration. This covers the described scenario.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Previously, the depth was restricted because with BFS the amount of
nodes we have in memory blows up exponentially. With DFS is is linear,
so we can process trees of arbitrary depth.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Some methods add "IR" suffix to its names in notary enabled envs
because of contract logic. It was broken due to incorrect notary state
reading (tryNotary != notary is enabled).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Set flush mark in the inside the flush worker because writing to the blobstor
can fail. Because each evicted object must be deleted, it is reasonable
to do this in the evict callback.
The evict callback is protected by LRU mutex and thus potentially interferes
with `Get` and `Iterate` methods. This problem will be addressed in the
future.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
There is a need to sync container-related caching mechanism with the
actual Sidechain changes. To do this, node should be able to listen
incoming notifications about container ops.
Define `PutSuccess` / `DeleteSuccess` notification event's parsers.
Subscribe to these events in node app. As initial implementation node
will log event receipts. Later handling is going to be practically
complicated.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Maintain an invariant that any blobovnicza is present either
in `opened` or in `active` map. Otherwise, the logic becomes too
complicate because it is not obvious when we should close the blobovnicza.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
This check should occur on the shard level, but because
blobstor components expose `Open(readOnly bool)` interface,
it is reasonable to expect an error here.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
If the file doesn't exist, return `apistatus.ObjectNotFound`.
First check is still there as a shortcut.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
This tests check that each blobstor component behaves similarly when
same methods are being used. It is intended to serve as a specification
for all future components.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Includes:
1. Renaming counter key to distinguish logical and physical objects
2. Version update dropping since changes could be done in a compatible way
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
1. Move compression parameters to the `shard` section.
2. Allow to use multiple sub-storage components in the blobstor.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
1. Remove in-memory cache. It doesn't persist objects and if we want
more speed, `NoSync` option can be used for the bolt DB.
2. Put to the metabase in a synchronous fashion. This considerably
simplifies overall logic and plays nicely with the metabase bolt DB
batch settings.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Allow to extend blobstor with more storage sub-systems. Currently
objects stored in the FSTree have empty byte slice descriptor and object
from blobovnicza tree have the same id as earlier. Each such change in
the identifier formation should be accompanied with metabase version
increase.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Close#1647.
Initially, `Sync: false` was provided because we can already lose
objects cached in memory. However, future changes in writecache will
remove inmemory cache and speed up it via other means.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
If an object is found in the Write-cache and is placed at the end of
the in-memory cache, the memory counter update operation tries to
dereference the index that is out of the sliced array. Moreover, even if
panic does not appear, the counter is updated with the wrong value.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Return all the objects on the empty common prefix search without search
optimizations that breaks boltDB logic.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If an object has not been marked for removal by the GC in the current epoch
yet but has already expired, respond with `ErrObjectNotFound` api status.
Also, optimize shard iteration: a node must stop any iteration if the object
is found but gonna be removed soon.
All the checks are performed by the Metabase.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Sort the endpoint by their priority before the first WS client creation to
start the morph client with the highest priority endpoint.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Handling notification in a synchronous manner may lead to a blocking state
if a handler uses neo-go client.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Return listen errors in a synchronous fashion.
Another solution would be to use buffered channel, but this is not
scalable: for each new similar runner we would need to extend the
buffer.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
After a4adb79db new logical error could be returned. Do not increase
error counter in this case.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Also, try to fetch object header info from the local storage to find as much
object info as possible for the requests which do not assume returning
object header as a response.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If metabase can't be opened in the default mode, try opening shard
first in `ReadOnly` mode and then in `DegradedReadOnly`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
`Degraded` mode can be set by the administrator if needed.
Modifying operations in this mode can lead node into an inconsistent state
because metabase checks such as lock checking are not performed.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
There is a need to support working w/o shard if it has problems with
blobovnicza tree.
Make `BlobStor.Init` to return new `ErrInitBlobovniczas` error. Remove
shard from storage engine's shard set if it returned this error from
`Init` call. So if some of the shards (but not all) return this error,
the node will be able to continue working without them.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Reduce public interface of this package. Later each result will contain
an additional status, so it makes more sense to use the same functions
and result processing everywhere.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Do not check that a node indeed belongs to the container, because the
synchronization will fail in this case anyway.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
1. Modifying operations are not expected to fail, unless the shard is
read-only.
2. `Get*` operations should increase error counter too, unless the
error is `ErrTreeNotFound`.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Do not return backend type from the service for now, because memory
backend is expected to vanish.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
The tricky part here is the engine itself: we stop iteration on
`ErrReadOnly` because it is better to synchronize the shard later than
to have partial trees stored in 2 shards.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Current implementation prevents invalid operations to become valid at
some later point (consider adding a child to the non-existent parent and
then adding the parent). This seems to diverge from the paper algorithm
and complicates implementation. Make it simpler.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
In case node is down or failing for some reason, we can expect `Dial` to
fail. In case we actively try to replicate and `Dial` always takes 2
seconds, replication-related channels quickly become full. That affects
latency of all other write operations.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Before this commit the replication channel was quickly filled under
heavy load. This lead to the continuously increasing latency for all
write operations. Now it looks better.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Also fix a bug with replicator using the multiaddress instead of
<host>:<port> format expected by gRPC library.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Currently to find a node by path we iterate over all the children on
each level. This is far from optimal and scales badly with the number of
nodes on a single level. Thus we introduce "indexed attributes" for
which an additional information is stored and which can be use in
`*ByPath` operations. Currently this set only includes `FileName`
attribute but this may change in future.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Consider a node `{FileName: "dir", Attribute: "xxx"}`. In case we add
a new node by path `["dir", "file.txt"]`, create a new intermediate node
with a single attribute.
`GetByPath` now also considers only nodes with a single attribute while building a path.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
In this commit we implement algorithm for CRDT trees from
https://martin.klepmann.com/papers/move-op.pdf
Each tree is identified by the ID of a container it belongs to
and the tree name itself. Essentially, it is a sequence of operations
which should be applied in chronological order to get a usual tree
representation.
There are 2 backends for now: bbolt database and in-memory.
In-memory backend is here for debugging and will eventually act
as a memory-cache for the on-disk database.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Replace `ErrRangeOutOfBounds` error from `pkg/core/object` package with
`ObjectOutOfRange` from `apistatus` package. That error is returned by
storage node's server as NeoFS API statuses.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Shard is intended to be used as a separate failure domain,
which usually resides on a separate disk. Thus, sequential
initialization is bound by IO and this change speeds up thing a bit.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Do not calculate and do not write homomorphic hash for containers that were
configured to store objects without hash.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Do not use homomorphic hash in storage group for containers that have
`homomorphic_hashing_disabled` set to `true`.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If the container ID is not nil and not equal to the container ID in the
request, consider bearer token invalid.
See also nspcc-dev/neofs-api#207.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Allocate memory only if a node chosen as the forwarded request receiver
has responded with a successful status.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
After fixing version fields in forwarded requests, a node does not check
statuses since errors are not covered by direct call error checks.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Error checkers now support wrapped errors so there is no need to
explicitly unwrap errors in `Policer`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Forwarded requests contained zero version in their meta header. It did not
allow responding with API statuses (`v0.0` version considered to be older
than `v2.11`) to the forwarding node and, therefore, did not allow analyzing
responses.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
`auditor` does not need to request SG: processor will fetch that info before
audit context initialization.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
That allows using `ClientCache` for storage group searching before task
context is initialized. Also, that makes it more general purpose.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Do not use `Marshal()` with object's payload. Use `ReadFromObject` func from
SDK instead. That allows checking both attributes and SG body's expiration
epoch.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
After recent changes in NeoFS SDK Go library session tokens aren't
embedded into `container.Container` and `eacl.Table` structures.
Group value, session token and signature in a structure for container
and eACL.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
The main problem is to distinguish the case of initial initialization
and update from version 0. We can't do this at `Open`, because of
`resync_metabase` flag. Thus, the following approach was taken:
1. During `Open` check whether the metabase was initialized.
2. Check for the version in `Init` or write the new one if the metabase
is new.
3. Update version in `Reset`.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
After recent changes `buildContainer` method returns two-dimensional
slice of `NodeInfo` so there is no need to flatten it to build slice of
`common.NodeInfo`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `ReadNetworkConfiguration` method to return `NetworkConfiguration`
by value in order to follow storage engine improvements and prevent heap
escaping.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`Netmap` contract exports enumeration of the node states.
Replace using literals and constants from NeoFS API Go V2 with the
values provided by contract.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Cache object that are being processed. That prevents concurrent
object handling when there is a few number of objects and object handling
takes more time that the policer needs for starting that object handling one
more time.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If placement contains two vectors with intersecting nodes it was possible to
send the object to the nodes twice.
Also optimizes requests: do not ask about storing the object twice from the
same node.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
The node does not support asynchronous object replication anymore, so it
does not need to have replicator worker, channel and `AddTask` function.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
In case we have lots of objects in a single container,
`GetContainerNodes` invoked indirectly by a policer can be seen in
pprof.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
In previous implementation `verifySignature` method of container
processor worked incorrectly for operations without a key and with
session: processor tried to verify signature with one of the bound owner
keys instead of session one.
Use `VerifySessionDataSignature` method to check the signature if
session is used. Refactor `verifySignature` a bit with session check
highlighting for readability.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In order to extend container ACL `F` bit must be set in basic ACL.
Make `Container` contract processor to deny eACL tables bound to
non-extendable containers.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Log errors for network operations. The only places where we are not
interested in errors are `Submit` in pool and unmarshaling.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Node shouldn't perform eACL verification during GET/HEAD request
processing until full object header is received. Otherwise, for some
eACL tables request may be falsely rejected.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Scenario:
* HEAD request of some object
* 1st eACL record allows op for objects with specific user attribute
* 2nd eACL record forbids op by object ID
* node doesn't store the requested object locally
With this scenario node shouldn't deny request.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
It is redundant to process object headers in responses w/o object field
since result will be the same.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Request processing should not be interrupted in case of local storage
failure since error case in normal for relay nodes.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
ACL service should not deny request on local storage failure since in
this case relay nodes won't be able to continue the operation.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
It is useless process since subnet owner is able to delete subnet without an
Alphabet approval. The Alphabet should only validate netmap state after
removal:
1. Update nodes' attributes if they were included in the deleted subnet;
2. Remove nodes without any subnet entrance.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
`log.With` is suitable during initialization, but in other places it induces
some overhead, even when branches with logging are not taken.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
If we should process address based on some condition, there is no need
to read file content in memory.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Currently we use `(*bbolt.Bucket).Stats().KeyN` for estimating database
size. However, it iterates over all pages in bucket and thus heavily
depends on the bucket size. This commit replaces initial size estimation
with a single `os.Stat` call.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Core changes:
* avoid package-colliding variable naming
* avoid using pointers to IDs where unnecessary
* avoid using `idSDK` import alias pattern
* use `EncodeToString` for protocol string calculation and `String` for
printing
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Helper function `client.IsErrObjectNotFound` doesn't support error
unwrapping, so we need to do it on caller side.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`Policer` should pass list of selected candidates into `WithNodes`
method of `replicator.Task`. In previous implementation `processNodes`
method passed an opposite list: failed nodes and/or the local one.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Not all the NeoFS requests must contain OID in their bodies (or must NOT
contain them at all). Do not pass object address in helper functions, pass
CID and OID separately instead.
Also, fixed NPE in the ACL service: updated SDK library brought errors
when working with `Put` and `Search` requests without OID fields.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
In previous implementation `Policer` considered local object copy as
redundant on processing single placement vector.
Make `Policer` to call redundant copy callback after full placement
processing. Also fix 404 error parsing.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Call `UpdateStateIR` and `AddPeerIR` method instead of `UpdateState` and
`AddPeer` if calling client is configured as Alphabet in notary enabled
environment.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Parse all headers beforehand and reject invalid requests.
Another approach would be to remember the error and check
it after `CalculateAction`, which is a bit faster.
The rule of thumb here is "first validate, then use".
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
It is nice to have different paths for different components and also
check that the information returned is different for different shards.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
In previous implementation NeoFS CLI app used `network.Address.HostAddr`
as a server URI, which caused scheme loss since host address doesn't
contain it.
Rename `HostAddr` to `URIAddr` and make it to return URI address with
`grpcs` scheme if TLS is enabled. Make `TLSEnabled` unexported since it
was used to provide default `tls.Config` only (it is used by default in
SDK).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>