Currently, under a mixed load one failed PUT can lead to closing
connection for all concurrent GETs. For PUT it does no harm: we have
many other nodes to choose from. For GET we are limited by `REP N`
factor, so in case of failover we can close the connection with the only
node posessing an object, which leads to failing the whole operation.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
The problem is that accidental timeout errors can make us to ignore
other nodes for some time. The primary purpose of the whole ignore
mechanism is not to degrade in case of failover. For this case,
closing connection and limiting the amount of dials is enough.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
In case we have many small objects in the write-cache, `indices` should
not be reused between iterations.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Missing `ReportError` method did not allow casing multi-client interface to
`errorReporter` interface and dropping broken connections.
`replicationClient` embeds that interface, and it is widely used across
node's code. Embedded interface does not allow casting its parent structure
to `errorReporter` and breaks multi client error reporting logic.
Multi-client scheme is extremely hard to maintain, it makes unpredictable
casts and does not allow tracking code flow, so it will be refactored in the
future anyway.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Previously, node could get an "infinite" small object: it could be expired
and thus could not be flushed (update its storage ID) to metabase => could
not be marked as flushed => node never removes such object and repeat all
the cycle one more time. If object exists and is not marked with GC (meta
returns `ErrObjectIsExpired`, not `ObjectNotFound` and not
`ObjectAlreadyRemoved`), its ID is safe to update _in the same_ bbolt
transaction.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Currently, DELETE service sets tombstone expiration epoch to
`current epoch + 5`. This works less than ideal in private networks
where an epoch can be e.g. 10 minutes. In this case, after a node is
unavailable for more than 1 hour, already deleted objects have a chance
to reappear.
After this commit tombstone lifetime can be configured.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
LRU `Peek`/`Contains` take LRU mutex _inside_ of a `View` transaction.
`View` transaction itself takes `mmapLock` [1], which is lifted after tx
finishes (in `tx.Commit()` -> `tx.close()` -> `tx.db.removeTx`)
When we evict items from LRU cache mutex order is different:
first we take LRU mutex and then execute `Batch` which _does_ take
`mmapLock` in case we need to remap. Thus the deadlock.
[1] 8f4a7e1f92/db.go (L708)
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
To achieve high performance we must choose proper values for both
batch size and delay. For user operations we want to set low delay.
However it would prevent tree synchronization operations to form big
enough batches. For these operations, batching gives the most benefit
not only in terms of on-CPU execution cost, but also by speeding up
transaction persist (`fsync`).
In this commit we try merging batches that are already
_triggered_, but not yet _started to execute_. This way we can still
query batches for execution after the provided delay while also allowing
multiple formed batches to execute faster.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
If we had lots of domains in one zone, `dump-hashes` for all others
can miss some domains, because we need to restrict ourselves with _some_
number.
In this commit we use neo-go sessions by default, with a proper
failback to in-script iterator unwrapping.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
"Object is expired" means that object is presented in `meta` but it is not
`ObjectNotFound` error. Previous implementation made `shard` search for an
object without `meta` which was an error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
```
name old time/op new time/op delta
_addressFromString-8 1.25µs ±30% 1.02µs ± 6% -18.49% (p=0.000 n=9+9)
name old alloc/op new alloc/op delta
_addressFromString-8 352B ± 0% 256B ± 0% -27.27% (p=0.000 n=9+10)
name old allocs/op new allocs/op delta
_addressFromString-8 6.00 ± 0% 4.00 ± 0% -33.33% (p=0.000
n=10+10)
```
Also, assure compiler that `s` doesn't escape:
Before this commit:
```
./fstree.go:74:24: leaking param: s
./fstree.go:90:6: moved to heap: addr
```
After this commit:
```
./fstree.go:74:24: s does not escape
```
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
- Update sample configs to match unit tests
- Remove Docker container for N3 testnet
Will return with updates when FrostFS enters N3
Signed-off-by: Stanislav Bogatyrev <s.bogatyrev@yadro.com>
Currently we track based on `PayloadSize`, because it is already stored
in the metabase and it is easier to calculate without slowing down the
whole system.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
We rarely need to list all containers: as one example
we need it for tree service synchronization once per epoch.
Given that cache TTL has the order of block time it makes no sense
to cache the list of all containers.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Currently the only way to tell whether `evacuate/set-mode` is finished
is to set a very big timeout and _hope_ that the operation will finish.
In this commit we add INFO logs for such operations which should
simplify the life of an administrator.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Currently, under high load clients are blocked on channel send
and the number of goroutines can increase indefinitely.
In this commit we drop replication messages if send/recv queue is full
and rely on a background synchronization.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
1. Reduce allocations inside transactions.
2. Do not encode container ID to string: it allocates a lot and takes more
space.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Under high load we are limited by the _amount_ of keys we need to update
in a single transaction. In this commit we try storing all state
with a single key.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
In the previous implementation any non-nil error that preceded object
fetching from blobstor led to iterating over every storage (in other words,
no storage ID information was taken into account). Now storage ID is
skipped only if metabase (storage ID source) returns any error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
After the reconnection interval feature there was an bug related to the big
objects collecting: split error is returned from a client directly, not
via API status and was considered as a connection error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
It should be similar to a `TreeAddByPath`. `applyOperation` is used for
`Apply` when the operation can be inserted in the middle of a log.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Under load changing shard mode can lead to it being removed from the
list during some other PUT.
```
Dec 28 07:01:26 az neofs-node[364505]: panic: runtime error: invalid memory address or nil pointer dereference
Dec 28 07:01:26 az neofs-node[364505]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xc9fbb1]
Dec 28 07:01:26 az neofs-node[364505]: goroutine 11791912 [running]:
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).putToShard(0xc000435490, {0xc0003f7a28?, 0xc0001192c0?}, 0x2, {0x0, 0x>
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:91 +0x1b1
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).put.func1(0xc000435490?, {0xc0003f7a28?, 0xc0001192c0?})
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:71 +0x19c
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).iterateOverSortedShards(0x1?, {{0x62, 0x23, 0xfe, 0x60, 0x67, 0xd5, 0x>
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/shards.go:225 +0xc8
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).put(0xc000435490, {0x1?})
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:66 +0x2a9
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Put.func1()
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:43 +0x2a
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).execIfNotBlocked(0x8?, 0x38?)
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/control.go:147 +0xcf
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Put(0xc4df775a80?, {0x0?})
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:42 +0x65
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.Put(0xc06d928b80?, 0xc06b1b8dc8?)
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:158 +0x19
Dec 28 07:01:26 az neofs-node[364505]: main.engineWithoutNotifications.Put({0x20301b?}, 0x20301b?)
```
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Because synchronization _most likely_ will have apply already existing
operations, it is much faster to check their presence in a read
transaction. However, always doing this will degrade the perfomance
for normal `Apply`. And, let's be honest, it is already not good.
Thus we add a separate parameter which specifies whether this logic is
enabled.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
`GETRANGEHASH` request spawns `GETRANGE` requests if an object could not be
found locally. If the original request contains session, it can be static
and, therefore, fetching session key can not be performed successfully.
As the best effort a node could request object's range with its own key.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Both `meta` and `write-cache` are expected to have a fast underlying disk,
so it does not seem like an optimisation. Moreover, `write-cache`'s `Head`
is a `Get` with payload cutting, it _must_ use more memory for no reason
(`meta` was created for such requests). Also, `write-cache` does not allow
performing any "meta" relations checks (such as locking, tombstoning).
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Node children are not sorted and could occur in any order.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
We use cache to avoid policing the same object multiple times in a short
time span (< 30 seconds). If we have 200_000 objects in a blobstor, it is a bit useless
-- if it takes 1 second to process an object and we have `replicator.pool_size: 20`
in config, the next iteration will happen in 10_000 second which is much
larger than 30 second. However we still consume a lot of memory, so it
makes sense to use saner default.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
In NeoFS containers can be removed on behalf of its owner only. To
improve user experience, there is a need to add ownership check to the
removal command of the NeoFS CLI.
Check container ownership in `container delete` command `Run` function.
The check can be skipped by `--force` option.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Stop child objects collection if the last returned object (the most "left"
object in the collected chain) starts exactly from the `GETRANGE`'s `from`
value.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
That could happen if a node forwards request to a node that closed the
connection during the original object stream.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Could be related to "websocket users limit reached" on the `neo-go` server
side when an SN/IR is rebooting repeatedly.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
1. Replace `mn` function with a `sigCount`.
2. Use `notary.FakeMultisigAccount` for account creation.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Separate User data and Service data:
- /var/lib/neofs/storage for service persistence
- /srv/neofs for user data
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Includes:
1. mode change read lock operation in every exported method that r/w the
underlying database;
2. returning `ErrDegradedMode` logical error if any exported method is
called in degraded (without a metabase) mode.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
```
2022/11/15 08:40:56 worker exits from a panic: runtime error: index out of range [0] with length 0
2022/11/15 08:40:56 worker exits from panic: goroutine 1188 [running]:
github.com/panjf2000/ants/v2.(*goWorker).run.func1.1()
github.com/panjf2000/ants/v2@v2.4.0/worker.go:58 +0x10c
panic({0x1042b60, 0xc0015ae018})
runtime/panic.go:1038 +0x215
github.com/nspcc-dev/neofs-node/pkg/services/policer.(*Policer).shardPolicyWorker.func1()
github.com/nspcc-dev/neofs-node/pkg/services/policer/process.go:65 +0x366
github.com/panjf2000/ants/v2.(*goWorker).run.func1()
github.com/panjf2000/ants/v2@v2.4.0/worker.go:68 +0x97
created by github.com/panjf2000/ants/v2.(*goWorker).run
github.com/panjf2000/ants/v2@v2.4.0/worker.go:48 +0x68
```
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Includes extending listing methods in the Storage Engine with object types.
It allows tuning replication/policer algorithms: container nodes do
not remove `LOCK` objects as redundant and try to fulfill `LOCK` placement
on the ohter container nodes.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It allows keeping all the locked objects safe after metabase
resynchronization. Currently, all `LOCK` objects are broadcast to all nodes
in a container, it guarantees `LOCK` object presence in a regular situation.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
1. Remove a layer of indirection for mutex, `ClientCache` is already
used by pointer.
2. Fix duplication of a `AllowExternal` field.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
A container node is expected to have full "get" access to assemble the
object.
A non-container node is expected to forward any request to a container node.
Any token is expected to be issued for an original request sender not for a
node so any new request is invalid by design with that token.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Do not lose meta information of the original requests: cache session and
bearer tokens of the original request b/w a new generated ones. Middle
request wrappers should not contain any meta information, since it is
useless (e.g. ACL service checks only the original tokens).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If an external session is provided and is not opened by CLI itself, add
children objects to it too. It fixes "not found" errors when removing a big
object with a predefined session (from a file).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
After presenting request statuses on the API level, all the errors are
unwrapped before sending to the caller side. It led to a losing invalid
request's context.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If a `neofs-cli object delete` operation is performing using a bearer token,
add it to the new `HEAD` requests that collects children OIDs.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It was needed before we started to flush during transition to
`degraded` mode. Now it is confusing.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Do not use node's local storage if it is clear that an object will be
removed anyway as a redundant. It requires moving the changing local storage
logic from the validation step to the local target implementation.
It allows performing any relations checks (e.g. object locking) only if a
node is considered as a valid container member and is expected to store
(stored previously) all the helper objects (e.g. `LOCK`, `TOMBSTONE`, etc).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It is not an error: removing virtual object is expected and should be just
skipped. Getting a virtual object with `raw` flag is considered as an
impossible action, all the virtual objects removals will be handled via
their children's removals implicitly.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Currently there is a possibility for modifying operations to fail
because of I/O errors and a new tree to be created on another shard.
This commit adds existence check for modifying operations.
Read operations remain as they are, not to slow things.
`TreeDrop` is an exception, because this is a tree removal and trying
multiple shards is not an unwanted behaviour.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
By default writecache puts the whole object to update storage ID.
This logic comes from the times when we needed to put objects
in the metabase by the writecache itself. Now this is done by the
blobstor at unmarshaling objects during flush only to update storage ID
is an overkill.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
All logic errors are wrapped in `logicerr.Logical` type and do not
affect shard error counter.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
In previous implementation node lost maintenance status after successful
switching to it. For example, after some period of time node sent
bootstrap requests with the "online" state instead of "maintenance".
Make `startMaintenance` method to set maintenance status in the
`networkState`.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
In previous implementation bootstrapping state was chosen according to
bool flag which was not convenient.
Create separate method to boostrap with "online" and the current state.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
In previous implementation `Shard.Delete` logged writecache's removal
failures in `error` level. There is a need to decrease severity of these
log records since they aren't critical and don't require individual
review.
Change level of the message to `info`.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Object removal session should reflect all objects related to the
removing one.
Make `OpenSessionViaClient` to gather the split members of the original
object in order to spread the session to them.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
From the `Bucket.ForEach` doc:
```
The provided function must not modify the bucket; this will result in undefined behavior.
```
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Add new shard modes as a map entry to automatically parse them in
`set-mode` command. The change also automatically adds new modes to help
message.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
After the refactor there are new storage characteristics: a type and
a general storage id (that could be stringified).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
There is a need to support NeoFS-binary sessions along with JSON ones in
NeoFS CLI.
Provide generic `common.ReadBinaryOrJSON` functions which tries to
decode NeoFS-binary structure and falls back to JSON format. Use this
function in all places with token reading.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
In previous implementation turning to maintenance mode using NeoFS CLI
required NeoFS API endpoint. This was not convenient from the user
perspective. It's worth to move networks settings' check to the server
side.
Add `force_maintenance` field to `SetNetmapStatusRequest.Body` message
of Control API. Add `force` flag to `neofs-cli control set-status`
command which sets corresponding field in the requests body if status is
`maintenance`. Force flag is ignored for any other status.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Node's maintenance state can be set only if the network allows it.
Make `neofs-cli control set-status` to read current network settings and
check is maintenance state is allowed. Fail the execution if the mode is
not allowed.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Iterate over every shard and search for the container's trees. Final result
is a concatenation of shards' results. It is considered that one fixed tree
is placed on one fixed shard but the different trees of a fixed container
could be placed on different shards.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Add `SynchronizeAllTrees` method of the Tree service. It allows fetching
tree IDs and sync all of them. Share common logic b/w the new method and
the `SynchronizeTree`.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Add the node position in a container and the container size to the CID
descriptor that is passed to the `TreeApply`. Previously, `checkValid` does
not allow any log operarations.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It allows general usage of `controlNetmapStatus` getter: does not matter of
handler (`NewEpoch` handler of initial setter).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
In previous implementation node always sent bootstrap requests with
"online" network state. This provoked switching to online of the nodes
under maintenance.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
In the 2nd version, there was a database format change: buckets have changed
their keys, so it becomes impossible to check the version in the 1 -> 2+
migrations because of different buckets that store info about the version.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Storage node should not provide NeoFS Object API service when it is
under maintenance.
Declare `Common` service that unifies behavior of all object operations.
The implementation pre-checks if node is under maintenance and returns
`apistatus.NodeUnderMaintenance` if so. Use `Common` service as a first
logical processor in object service pipeline.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
In some scenarios original session can be unrelated to the objects which
are read internally by the node. For example, node requests child
objects when removing the parent one.
Tune internal NeoFS API client used by node's Object API server to
ignore unrelated sessions in `GetObject` / `HeadObject` / `PayloadRange`
ops.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
If shard ID is stored in metabase (it is not the first time boot), read it,
set it, use it (not a generated one) in the metrics writer.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Make it store its internal `zap.Logger`'s level. Also, make all the
components to accept internal `logger.Logger` instead of `zap.Logger`; it
will simplify future refactor.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Current spec allows denying GET_RANGE requests from other storage nodes.
However, GET should always be allowed and it is enough to perform
GET_RANGE locally
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
In previous implementation `ObjectService.Get` RPC handler failed with
`parent address in child object differs` while assembling the "big"
object. This was caused by the child check which required parent
reference to be set in all child objects. The check was impracticable
because not all elements of the split-chain have a link to the parent.
Make `execCtx.isChild` to return `true` if parameterized object has no
parent header in its own header.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
It does not make sense to open remote sessions with the storage node in
`get`, `head`, `search`, `range` and `hash` sub-commands of `neofs-cli
object` command.
Do not use NeoFS API `SessionService` in mentioned commands. Decode
object session from JSON file specified `--session` flag. Perform some
sanity checks instantly on CLI side.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
If the contract returns a netmap that does not contain the node, update
local `NodeInfo`. It fixes `neofs-cli netmap nodeinfo` command that printed
"state: online" previously.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Node response with `NODE_UNDER_MAINTENANCE` status signals that the node
was switched to maintenance mode. There is a delay between the actual
switch and the reflection in the network map of up to one epoch. To
speed up the reaction to the maintenance, it is required to recognize
such node responses in the Policer.
Make `Policer.processNodes` to exclude elements with shortage decreasing
on `NODE_UNDER_MAINTENANCE` status response.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Nodes under maintenance SHOULD not respond to object requests. Based on
this, storage node's Policer SHOULD consider such nodes as problem ones.
However, to prevent spam with the new replicas, on the contrary, Policer
should consider them normal.
Make `Policer.processNodes` to exclude elements if `IsMaintenance()`
with shortage decreasing.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Make `replicator.TaskResult` to accept `netmap.NodeInfo` type instead of
uint64 in order to clarify the meaning and prevent passing the random
numbers.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
It doesn't make sense to check object relation in session check of
`ObjectService.Put` RPC which has been spawned by `ObjectService.Delete`
with session. Session issuer can't predict identifier of the tombstone
object to be created.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Currently, when removing shard special care must be taken with respect
to shard numbering. `mode: disabled` allows to leave shard configuration
in place while also ignoring it during initialization. This makes
disk replacement much more convenient.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
In previous implementation of `neofs-node` app object session was not
checked for substitution of the object related to it. Also, for access
checks, the session object was substituted instead of the one from the
request. This, on the one hand, made it possible to inherit the session
from the parent object for authorization for certain actions. On the
other hand, it covered the mentioned object substitution, which is a
critical vulnerability.
Next changes are applied to processing of all Object service requests:
- check if object session relates to the requested object
- use requested object in access checks.
Disclosed problem of object context inheritance will be solved within
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Split all the fields in `cfg` structure on:
1. `applicationConfiguration`;
2. `internals`; // shared entities for an application work, such as
`context.Context`
3. `shared`; // holder for the shared entities b/w;
4. `cfgXXX`; // configuration for internal services.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
In previous implementation node blocked any operation of local object
storage in maintenance mode. There is a need to perform some storage
operations like data evacuation or restoration.
Do not call block storage engine in maintenance mode. Make all Object
service operations to return `apistatus.NodeUnderMaintenance` error from
each local op.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Make `netmap snapshot` command to print `MAINTENANCE` state of the nodes
with `IsMaintenance()` flag set.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After recent Netmap contract changes all read methods which return
network map (either candidates or snapshots) encode node descriptors
into same structure.
Decode `netmap.Node` contract-side structure from the call results.
Replace node state with the value from the `netmap.Node.State` field.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Make storage node to return `NODE_UNDER_MAINTENANCE` status
error on each local object operation if the node is in `MAINTENANCE`
mode.
Pass `apistatus.NodeUnderMaintenance` to `StorageEngine.BlockExecution`
during `ControlService.SetNetmapStatus` RPC processing.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After recent changes `MAINTENANCE` state is reflected in the Sidechain.
Storage node should switch its state to "maintenance" during serving the
`ControlService.SetNetmapStatus` RPC with correspoding status in the
request.
Call `UpdatePeerState` operation of Netmap contract's client in
`control.NodeState` provider on Storage node app side. The op is
executed if `BlockExecution` on local object storage is succeeded.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Storage node can be requested to be switched into `MAINTENANCE` state.
Inner Ring should accept such requests only if network configuration
allows it.
Make `Processor` of Netmap contract's notifications to depend on
`state.NetworkSettings`. Make `Processor.processUpdatePeer` to call
`MaintenanceModeAllowed` if notification event relates to `MAINTENANCE`
mode`. Share singe `state.NetworkSettings` provider in Inner Ring
application.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After recent changes Netmap contract can send `UpdateState` notification
event with `MAINTENANCE` node's state. There is a need to provide
functionality to work with the status.
Provide `UpdatePeer.Maintenance` method. Support new state in
`ParseUpdatePeer` and `ParseUpdatePeerNotary` functions.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After recent changes in NeoFS protocol storage nodes can be in
`MAINTENANCE` state. There is a need to support this state in
`UpdateState` operation.
Add `UpdatePeerPrm.SetMaintenance` method which makes node to be
switched into `MAINTENANCE` mode after the `UpdatePeerState` operation.
New functionality is going to be used in Storage node application for
Control API serving.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Inner Ring should allow registering of storage nodes with `MAINTENANCE`
state in the NeoFS network only if its configuration allows this status.
Make `networkSettings.MaintenanceModeAllowed` to call
`MaintenanceModeAllowed` method of underlying Netmap contract's client
in order to assert state allowance.
From now nodes will be accepted to the network with `MAINTENANCE` state
only with the appropriate network configuration.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After recent changes network configuration provided by NeoFS storage
nodes contains `MaintenanceModeAllowed` flag. There is
a need to support this value in NeoFS CLI application.
Print `MaintenanceModeAllowed` flag in `netmap netinfo` command.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After recent changes network configuration stored in the Netmap contract
of the NeoFS Sidechain contains `MaintenanceModeAllowed` flag. There is
a need to support this value in Storage node application.
Make `NetmapService.NetworkInfo` RPC server of the storage node to set
`MaintenanceModeAllowed` flag according to corresponding value in the
Sidechain.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After recent changes in the NeoFS API protocol network configuration
contains `MaintenanceModeAllowed` boolean flag. There is a need to
support the config value in all NeoFS applications.
Provide `Client.MaintenanceModeAllowed` method which read the config
from the Sidechain. Extend `NetworkConfiguration` structure with
`MaintenanceModeAllowed` field.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`NetworkConfiguration` represents NeoFS network configuration stored in
the Sidechain. In previous implementation the configuration missed flag
of disabled homomorphic hashing.
Add `NetworkConfiguration.HomomorphicHashingDisabled` boolean field.
Decode the field in `Client.ReadNetworkConfiguration` method. Print this
value in `netmap netinfo` command of NeoFS CLI.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`readBoolConfig` method is going to be reused for reading other
configuration values. All boolean settings are `false` by default, so it
makes sense to return default value on missing key directly from
`readBoolConfig`.
Handle `ErrConfigNotFound` case in `readBoolConfig` method. Change
`HomomorphicHashDisabled` method to call `readBoolConfig` directly.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to prevent limitless abuse of MAINTENANCE status of the
storage nodes. To do this, configuration of the NeoFS network is going
to be extended with the flag which allows the state. Until this is done,
it makes sense to prepare a site for this in the code.
Define `state.NetworkSettings` interface as an abstraction of global
network configuration within the `state` package. Make
`NetMapCandidateValidator` to depend on `NetworkSettings` and provide
corresponding field setter. Change `VerifyAndUpdate` method's behavior
to return an error for candidates with MAINTENANCE state if this state
is disallowed by the network configuration. Provide `NetworkSettings`
from the wrapper over Netmap contract's client on Inner Ring application
side. The provider is implemented to statically disallow MAINTENANCE
mode in order to save previous behavior.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
New network status of storage nodes is going to be introduced. To
simplify the addition, it would be useful to prepare the code for this.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation storage node interpreted all status values
sent in `SetNetmapStatus` RPC as `OFFLINE` except `ONLINE` and
`MAINTENANCE`. This could lead to incorrect processing of new values,
and also didn't allow detection of problems with sending garbage values.
Make implementation of `NodeState` interface used by Control API server
to deny requests with statuses other than protocol-declared enum.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation Inner Ring allowed storage nodes with any
state to register in the network. According to the current design, only
nodes with ONLINE state are allowed to enter the network map.
Create new `state` sub-package of `nodevalidation` package of Inner Ring
application. Define `state.NetMapCandidateValidator` type and provide
`NodeValidator` interface required by the Inner Ring's processor of
`Netmap` contract's notification events. Embed new validator into the
one used by the Inner Ring application.
From now all `AddPeer` notifications with node state other than `ONLINE`
will be denied.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Degraded mode allows us to operate without an SSD,
thus writecache should be unavailable in this mode.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Add a common error for this case because it is not an error
which should increase error counter. Single error simplifies checks on
the call-site.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
It will allow rereading config values and will simplify distinguishing them
from the custom values in the `cfg` structure.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Negative values have no sense. On the other hand it differs from the
blobovnicza's configuration and prevents unification.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Allow user to initiate flushing objects from a writecache.
We need this in 2 cases:
1. During writecache storage schema update, it should be flushed with
the old version of node and started clean with a new one.
2. During SSD replacement, to avoid data loss.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Replace `ProcessCurrentNetMap` method of `NodeState` interface with
`ReadCurrentNetMap` one with two changes:
* Replace network map type from NeoFS SDK package with the
protocol-generated message. This replaces all the business logic to
the application layer.
* Support error return. This allows to cover problem node states.
Return an error from `NodeState.ReadCurrentNetMap` method implemeted
through `atomic.Value` if `Store` method has not been called yet.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Use a simple loop at the callsite. This way we remove as much as we can.
Also, `Delete` metrics is more meaningful now.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
- Meta now supports (and requires) inc/dec labeled counters
- The new logic counter is incremented on `Put` operations and is
decremented on `Inhume` and `Delete` operations that are performed on
_stored_ objects only
- Allow force counters sync. "Force" mode should be used on metabase resync
and should not be used on a regular meta start
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It will allow to use labeled object counters, e.g.: "phy" for all
the physically stored objects, "logic" for only available objects. Also, it
will allow to add typed (regular, TS, SG, LOCK) counters if needed.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
In current implementation storage node app uses in-memory session
storage if `persistent_sessions.path` value is empty/missing in the
config.
Clarify default behavior in `config/example/node.yaml`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Do not call `CalculateAction` for the eACL checks since it requires object
headers that are meaningless in the tree context.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
1. Do not require a request to be signed by the container owner if a
bearer token is missing
2. Do not check the system role since public requests are not expected to
be signed by IR or a container node (unlike the object requests)
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
There is a need to have the ability to track NeoFS timeline on storage
nodes. Epochs tick on notifications receipt, so the most obvious way to
know about received epochs is logging the events.
Wrap `morphEvent.ParseNewEpoch` event parser into function which writes
log message about new epoch number.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Sometimes it is useful to track sidechain events processed by the node.
For example, we need some easy way to look up for log messages about new
epoch notifications.
Make `subscriber.Subscriber` to log names of all events received from
the sidechain notification channel.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Re-configuration of Blobovnicza's object size limit must not affect
already stored objects. In previous implementation `Blobovnicza` didn't
see objects stored in buckets which became too big after size limit
re-configuration. For example, lets consider 1st configuration with 64KB
size limit for stored objects. Lets assume that we stored object of 64KB
size, and re-configured `Blobovnicza` with 32KB limit. After reboot
object should be still available, but actually it isn't. This is caused
by `Get` operation algorithm which iterates over configured size ranges
only, and doesn't process any other existing size bucket. By the way,
increasing of the object size limit didn't lead to the problem even in
previous implementation.
Make `Blobovnicza.Get` method to iterate over all size buckets
regardless of current configuration. This covers the described scenario.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Implement methods that `actor.Actor` requires:
1. `InvokeFunction` -- wrapper over `InvokeScript`
2. `GetVersion` returns default struct
3. `CalculateNetworkFee` copied and simplified from Neo-go server side.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Previously, the depth was restricted because with BFS the amount of
nodes we have in memory blows up exponentially. With DFS is is linear,
so we can process trees of arbitrary depth.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
NNS proposal describes string N3 address format, however we must also
have hex-string for backwards compatibility.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
In case we already have the record and it is invalid, we should
overwrite it instead of having several conflicting records.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
NEO NNS proposal uses addresses. We should eventually use the same,
but must stay compatible now.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Flag `--pre-check` of `set-eacl` command found to be in demand in most
cases. based on this, it makes sense to add its action to the default
behavior.
Pre-check container extensibility by default. Rename flag to
`--no-precheck` and invert its action.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Container ACL in NeoFS can be extended only for container in which the
corresponding option is enabled. In previous implementation command
`set-eacl` could hang up on modifying eACL of the non-existent or
non-extendable container. To improve UX, there is a need to pre-check
the availability of `SETEACL` operation.
Add boolean `precheck` flag to `set-eacl` cmd which reads the container
before the actual transaction formation. If flag is set, command fails
on non-extendable container ACL.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Some methods add "IR" suffix to its names in notary enabled envs
because of contract logic. It was broken due to incorrect notary state
reading (tryNotary != notary is enabled).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Set flush mark in the inside the flush worker because writing to the blobstor
can fail. Because each evicted object must be deleted, it is reasonable
to do this in the evict callback.
The evict callback is protected by LRU mutex and thus potentially interferes
with `Get` and `Iterate` methods. This problem will be addressed in the
future.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
If container listing cache on node's side is missing (for particular
owner), then updating it as a reaction to successful container creation
leads to potentially invalid cache value for a period of time equivalent
to cache TTL.
Immediately return from `ttlContainerLister.update` method if owner's
container list isn't cached.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
From now cache TTL can be parameterized in the `neofs-node` app using
`cache_ttl` config key. `disable_cache` value is no longer supported.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation storage node responded with the outdated
container list after successful creation/removal up until cache
invalidation due to TTL. In order to decrease the probability of
outdated responses node should update its cache on event receipts.
Implement `ttlContainerLister.update` method which actualizes cached
list of the owner's containers. Make node to call `update` method
on `PutSuccess`/`DeleteSuccess` notifications from the `Container`
contract.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation storage node responded with the removed
container up until cache invalidation due to TTL. In order to avoid
false-positive responses node should update its cache on `DeleteSuccess`
events.
Make node to call `handleRemoval` method of the container cache which
leads to subsequent `apistatus.ErrContainerNotFound` errors.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation failed requests to the Sidechain weren't
cached. It makes sense to cache errors along with the values in order to
decrease potential load spikes onto Sidechain nodes.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to sync container-related caching mechanism with the
actual Sidechain changes. To do this, node should be able to listen
incoming notifications about container ops.
Define `PutSuccess` / `DeleteSuccess` notification event's parsers.
Subscribe to these events in node app. As initial implementation node
will log event receipts. Later handling is going to be practically
complicated.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Maintain an invariant that any blobovnicza is present either
in `opened` or in `active` map. Otherwise, the logic becomes too
complicate because it is not obvious when we should close the blobovnicza.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
This check should occur on the shard level, but because
blobstor components expose `Open(readOnly bool)` interface,
it is reasonable to expect an error here.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
If the file doesn't exist, return `apistatus.ObjectNotFound`.
First check is still there as a shortcut.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
This tests check that each blobstor component behaves similarly when
same methods are being used. It is intended to serve as a specification
for all future components.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Includes:
1. Renaming counter key to distinguish logical and physical objects
2. Version update dropping since changes could be done in a compatible way
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
1. Move compression parameters to the `shard` section.
2. Allow to use multiple sub-storage components in the blobstor.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
1. Remove in-memory cache. It doesn't persist objects and if we want
more speed, `NoSync` option can be used for the bolt DB.
2. Put to the metabase in a synchronous fashion. This considerably
simplifies overall logic and plays nicely with the metabase bolt DB
batch settings.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Allow to extend blobstor with more storage sub-systems. Currently
objects stored in the FSTree have empty byte slice descriptor and object
from blobovnicza tree have the same id as earlier. Each such change in
the identifier formation should be accompanied with metabase version
increase.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Close#1647.
Initially, `Sync: false` was provided because we can already lose
objects cached in memory. However, future changes in writecache will
remove inmemory cache and speed up it via other means.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Define `--with-attr` flag of `container list` which makes the command to
request and print user attributes for each found element.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Define `--with-attr` flag of `container list-objects` which makes the
command to request and print user attributes for each object from the
container.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
If an object is found in the Write-cache and is placed at the end of
the in-memory cache, the memory counter update operation tries to
dereference the index that is out of the sliced array. Moreover, even if
panic does not appear, the counter is updated with the wrong value.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Return all the objects on the empty common prefix search without search
optimizations that breaks boltDB logic.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If an object has not been marked for removal by the GC in the current epoch
yet but has already expired, respond with `ErrObjectNotFound` api status.
Also, optimize shard iteration: a node must stop any iteration if the object
is found but gonna be removed soon.
All the checks are performed by the Metabase.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Sort the endpoint by their priority before the first WS client creation to
start the morph client with the highest priority endpoint.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Handling notification in a synchronous manner may lead to a blocking state
if a handler uses neo-go client.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Return listen errors in a synchronous fashion.
Another solution would be to use buffered channel, but this is not
scalable: for each new similar runner we would need to extend the
buffer.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
We have the default value which is also printed in the help messages but any
call that does not specify that flag leads to an error.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
After a4adb79db new logical error could be returned. Do not increase
error counter in this case.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Also, try to fetch object header info from the local storage to find as much
object info as possible for the requests which do not assume returning
object header as a response.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If metabase can't be opened in the default mode, try opening shard
first in `ReadOnly` mode and then in `DegradedReadOnly`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
`Degraded` mode can be set by the administrator if needed.
Modifying operations in this mode can lead node into an inconsistent state
because metabase checks such as lock checking are not performed.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
There is a need to support working w/o shard if it has problems with
blobovnicza tree.
Make `BlobStor.Init` to return new `ErrInitBlobovniczas` error. Remove
shard from storage engine's shard set if it returned this error from
`Init` call. So if some of the shards (but not all) return this error,
the node will be able to continue working without them.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Reduce public interface of this package. Later each result will contain
an additional status, so it makes more sense to use the same functions
and result processing everywhere.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Do not check that a node indeed belongs to the container, because the
synchronization will fail in this case anyway.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
1. Modifying operations are not expected to fail, unless the shard is
read-only.
2. `Get*` operations should increase error counter too, unless the
error is `ErrTreeNotFound`.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Do not return backend type from the service for now, because memory
backend is expected to vanish.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
The tricky part here is the engine itself: we stop iteration on
`ErrReadOnly` because it is better to synchronize the shard later than
to have partial trees stored in 2 shards.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Current implementation prevents invalid operations to become valid at
some later point (consider adding a child to the non-existent parent and
then adding the parent). This seems to diverge from the paper algorithm
and complicates implementation. Make it simpler.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
In case node is down or failing for some reason, we can expect `Dial` to
fail. In case we actively try to replicate and `Dial` always takes 2
seconds, replication-related channels quickly become full. That affects
latency of all other write operations.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Before this commit the replication channel was quickly filled under
heavy load. This lead to the continuously increasing latency for all
write operations. Now it looks better.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Also fix a bug with replicator using the multiaddress instead of
<host>:<port> format expected by gRPC library.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Currently to find a node by path we iterate over all the children on
each level. This is far from optimal and scales badly with the number of
nodes on a single level. Thus we introduce "indexed attributes" for
which an additional information is stored and which can be use in
`*ByPath` operations. Currently this set only includes `FileName`
attribute but this may change in future.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Consider a node `{FileName: "dir", Attribute: "xxx"}`. In case we add
a new node by path `["dir", "file.txt"]`, create a new intermediate node
with a single attribute.
`GetByPath` now also considers only nodes with a single attribute while building a path.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
In this commit we implement algorithm for CRDT trees from
https://martin.klepmann.com/papers/move-op.pdf
Each tree is identified by the ID of a container it belongs to
and the tree name itself. Essentially, it is a sequence of operations
which should be applied in chronological order to get a usual tree
representation.
There are 2 backends for now: bbolt database and in-memory.
In-memory backend is here for debugging and will eventually act
as a memory-cache for the on-disk database.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Replace `ErrRangeOutOfBounds` error from `pkg/core/object` package with
`ObjectOutOfRange` from `apistatus` package. That error is returned by
storage node's server as NeoFS API statuses.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Shard is intended to be used as a separate failure domain,
which usually resides on a separate disk. Thus, sequential
initialization is bound by IO and this change speeds up thing a bit.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
There is an error only if the flag is not defined, such errors should be
caught during debugging.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Do not calculate and do not write homomorphic hash for containers that were
configured to store objects without hash.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Do not use homomorphic hash in storage group for containers that have
`homomorphic_hashing_disabled` set to `true`.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If the container ID is not nil and not equal to the container ID in the
request, consider bearer token invalid.
See also nspcc-dev/neofs-api#207.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Allocate memory only if a node chosen as the forwarded request receiver
has responded with a successful status.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
After fixing version fields in forwarded requests, a node does not check
statuses since errors are not covered by direct call error checks.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Error checkers now support wrapped errors so there is no need to
explicitly unwrap errors in `Policer`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Forwarded requests contained zero version in their meta header. It did not
allow responding with API statuses (`v0.0` version considered to be older
than `v2.11`) to the forwarding node and, therefore, did not allow analyzing
responses.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
- Add VERSION file to make it work without .git folder
- Remove obsolete GO111MODULE flag
- Move linter version to variable
- clean .cache on `make clean`
- Move help target to a separate help.mk file
Signed-off-by: Stanislav Bogatyrev <stanislav@nspcc.ru>
`auditor` does not need to request SG: processor will fetch that info before
audit context initialization.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
That allows using `ClientCache` for storage group searching before task
context is initialized. Also, that makes it more general purpose.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Do not use `Marshal()` with object's payload. Use `ReadFromObject` func from
SDK instead. That allows checking both attributes and SG body's expiration
epoch.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
The updated version supports reading network configuration uint64 values
that consist of less than 8 bytes.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Eventually more parameters will be supported (#1390) and after blobstor
configuration refactoring the output will certainly change. Implement
the simplest approach now.
Marshaling the result directly results in too ugly names and they cannot
be easily customized. Marshaling the results via `jsonpb` is better but
is not that flexible in terms of what we want to output.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
After recent changes in NeoFS SDK Go library session tokens aren't
embedded into `container.Container` and `eacl.Table` structures.
Group value, session token and signature in a structure for container
and eACL.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
The main problem is to distinguish the case of initial initialization
and update from version 0. We can't do this at `Open`, because of
`resync_metabase` flag. Thus, the following approach was taken:
1. During `Open` check whether the metabase was initialized.
2. Check for the version in `Init` or write the new one if the metabase
is new.
3. Update version in `Reset`.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
After recent changes `buildContainer` method returns two-dimensional
slice of `NodeInfo` so there is no need to flatten it to build slice of
`common.NodeInfo`.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Make `ReadNetworkConfiguration` method to return `NetworkConfiguration`
by value in order to follow storage engine improvements and prevent heap
escaping.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`Netmap` contract exports enumeration of the node states.
Replace using literals and constants from NeoFS API Go V2 with the
values provided by contract.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Cache object that are being processed. That prevents concurrent
object handling when there is a few number of objects and object handling
takes more time that the policer needs for starting that object handling one
more time.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If placement contains two vectors with intersecting nodes it was possible to
send the object to the nodes twice.
Also optimizes requests: do not ask about storing the object twice from the
same node.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
The node does not support asynchronous object replication anymore, so it
does not need to have replicator worker, channel and `AddTask` function.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
In case we have lots of objects in a single container,
`GetContainerNodes` invoked indirectly by a policer can be seen in
pprof.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
In previous implementation `verifySignature` method of container
processor worked incorrectly for operations without a key and with
session: processor tried to verify signature with one of the bound owner
keys instead of session one.
Use `VerifySessionDataSignature` method to check the signature if
session is used. Refactor `verifySignature` a bit with session check
highlighting for readability.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In order to extend container ACL `F` bit must be set in basic ACL.
Make `Container` contract processor to deny eACL tables bound to
non-extendable containers.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Log errors for network operations. The only places where we are not
interested in errors are `Submit` in pool and unmarshaling.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Node shouldn't perform eACL verification during GET/HEAD request
processing until full object header is received. Otherwise, for some
eACL tables request may be falsely rejected.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Scenario:
* HEAD request of some object
* 1st eACL record allows op for objects with specific user attribute
* 2nd eACL record forbids op by object ID
* node doesn't store the requested object locally
With this scenario node shouldn't deny request.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
It is redundant to process object headers in responses w/o object field
since result will be the same.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Request processing should not be interrupted in case of local storage
failure since error case in normal for relay nodes.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
ACL service should not deny request on local storage failure since in
this case relay nodes won't be able to continue the operation.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
It is useless process since subnet owner is able to delete subnet without an
Alphabet approval. The Alphabet should only validate netmap state after
removal:
1. Update nodes' attributes if they were included in the deleted subnet;
2. Remove nodes without any subnet entrance.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
`log.With` is suitable during initialization, but in other places it induces
some overhead, even when branches with logging are not taken.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
If we should process address based on some condition, there is no need
to read file content in memory.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Currently we use `(*bbolt.Bucket).Stats().KeyN` for estimating database
size. However, it iterates over all pages in bucket and thus heavily
depends on the bucket size. This commit replaces initial size estimation
with a single `os.Stat` call.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Core changes:
* avoid package-colliding variable naming
* avoid using pointers to IDs where unnecessary
* avoid using `idSDK` import alias pattern
* use `EncodeToString` for protocol string calculation and `String` for
printing
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Helper function `client.IsErrObjectNotFound` doesn't support error
unwrapping, so we need to do it on caller side.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`Policer` should pass list of selected candidates into `WithNodes`
method of `replicator.Task`. In previous implementation `processNodes`
method passed an opposite list: failed nodes and/or the local one.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Not all the NeoFS requests must contain OID in their bodies (or must NOT
contain them at all). Do not pass object address in helper functions, pass
CID and OID separately instead.
Also, fixed NPE in the ACL service: updated SDK library brought errors
when working with `Put` and `Search` requests without OID fields.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
If the key can't be fetched, an error is always returned, so it makes
sense to fail the whole command inside of a `key.Get*()`.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
In previous implementation `Policer` considered local object copy as
redundant on processing single placement vector.
Make `Policer` to call redundant copy callback after full placement
processing. Also fix 404 error parsing.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Call `UpdateStateIR` and `AddPeerIR` method instead of `UpdateState` and
`AddPeer` if calling client is configured as Alphabet in notary enabled
environment.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Parse all headers beforehand and reject invalid requests.
Another approach would be to remember the error and check
it after `CalculateAction`, which is a bit faster.
The rule of thumb here is "first validate, then use".
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
It is nice to have different paths for different components and also
check that the information returned is different for different shards.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
In previous implementation NeoFS CLI app used `network.Address.HostAddr`
as a server URI, which caused scheme loss since host address doesn't
contain it.
Rename `HostAddr` to `URIAddr` and make it to return URI address with
`grpcs` scheme if TLS is enabled. Make `TLSEnabled` unexported since it
was used to provide default `tls.Config` only (it is used by default in
SDK).
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
- Delete objects physically on tombstone's arrival;
- Store information about tombstones in the Graveyard;
- Clear Graveyard every epoch based on the information about TS in the
network.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
The service fetches tombstones from the network via object service, every
request is handled in the following order:
1. checks local LRU cache;
2. checks local storage engine;
3. tries to find object in the placement nodes.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Add offset element to the iterations over deleted objects (both the
Graveyard and the Garbage buckets).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It allows storing information about object in both ways at the same time:
1. Metabase should know if an object is covered by a tombstone (that is
not expired yet);
2. It should be possible to physically delete objects covered by a
tombstone immediately (mark with GC) but keep tombstone knowledge.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Morph "NewEpoch" event handling was registered in a closure over
`addNewEpochNotificationHandler` func. That may lead to the data race:
if a shard was initialized before the event registration, everything works
as planned, but if registration was made earlier, it was not able to
include GC handlers since a shard has not called `eventChanInit` yet and,
therefore, it has not registered handler yet.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
In previous implementation `Container.Delete` operation caused local
node's cache invalidation (container itself, eACL and listings). Any
subsequent `Container.Get` operation reversed invalidation. Given the
low latency sensitivity of deleting a container, there is no need to
touch the cache. With this approach, all pending deletion operations on
the node via the NeoFS API protocol will be delayed by the cache TTL.
Do not call cache invalidation ops in `morphContainerWriter.Delete`.
Remove no longer needed `InvalidateContainerListByCID` and
`InvalidateContainer` methods.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
2022-04-25 10:34:12 +03:00
1134 changed files with 49798 additions and 26605 deletions
d="m 0,0 c 2.054,-1.831 3.083,-4.465 3.083,-7.902 v -17.935 h -4.484 v 16.366 c 0,2.914 -0.626,5.024 -1.877,6.332 -1.253,1.308 -2.924,1.962 -5.016,1.962 -1.495,0 -2.896,-0.327 -4.204,-0.981 -1.308,-0.654 -2.381,-1.719 -3.222,-3.194 -0.841,-1.477 -1.261,-3.335 -1.261,-5.576 v -14.909 h -4.484 V 1.328 l 4.086,-1.674 0.118,-1.84 c 0.933,1.681 2.222,2.923 3.867,3.727 1.643,0.803 3.493,1.205 5.548,1.205 C -4.671,2.746 -2.055,1.83 0,0"
<ahref="https://fs.neo.org">NeoFS</a> is a decentralized distributed object storage integrated with the <ahref="https://neo.org">NEO Blockchain</a>.
<ahref="https://frostfs.info">FrostFS</a> is a decentralized distributed object storage integrated with the <ahref="https://neo.org">NEO Blockchain</a>.