Once the node was processed it skipped, at the step of forming
result in case when all nodes skipped, because processed for
previous REP, service mark the whole request as incomplete.
Example of policies which are unblocked:
- REP 1 REP 1 CBF 1
- REP 4 IN X REP 4 IN Y
CBF 4
SELECT 2 FROM FX AS X SELECT 2 FROM FY AS Y
FILTER Country EQ Russia OR Country EQ Sweden OR Country EQ Finland AS FY
FILTER Price GE 0 AS FX
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
Estimate cache size after delete objects to update metric.
Update counters on small object deletion.
Do not count bbolt DB file as FSTree object.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
We do not use the return result from Close() and we always execute both
methods in succession. It makes sense to unite them.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Current implementation has some quirks. For example,
using only half of object.put.pool_size_remote threads
tells replicator that is node is 50% loaded,
but in reality we could be putting lot's of big objects.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Previously, we can continue to return `EndOfListing` infinitely.
Reflect iterator reuse via Rewind() method.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Concurrent initialization in case of the metabase resync leads to
high memory consumption and potential OOM.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Do not accept requests until initial sync is finished.
`Apply` is deliberately left out -- we don't want to miss anything new.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Dynamically growing an unbounded buffers can cause a large
amount of memory to be pinned and never be freed.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
1. In redo() we save the old state.
2. If we do redo() for the same operation twice, the old state will be
overritten with the new one.
3. This in turn affects undo() and subsequent isAncestor() check.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Now it's possible to run evacuate shard in async.
Also only one evacuate process can be in progress.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
It's unused and not needed, default fallback lifetime is set by Notary
actor.
Signed-off-by: Anna Shaleva <anna@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
It's not really needed, closing the connection works fine when exiting and
normally the app doesn't need to unsubscribe at all.
Signed-off-by: Roman Khimov <roman@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
If a connection has not been established earlier, it stores `nil` in LRU
cache. Cache eviction tries to close every connection (even a `nil` one) and
panics but not crash the app because we are using pools.
That ugly bug also leads to a deadlock where `Unlock` is not called via
`defer` func (and that is the way I found it).
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Memory forest is here to check the correctness of boltdb optimized
implementation. Let's keep it simple.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
If operation with WC are _fast enough_ (e.g. `Init` failed and `Close` is
called immediately) there is a race and a deadlock that do not allow finish
(and start, in fact) an initialization routine because of taken `modeMtx`
and also do not allow finish `Close` call because of awaiting initialization
finish. So do stop initialization _before_ any mutex is taken.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
After 75d7891ca1
`neo-go` does claim that an empty invocation script is the only way to
fill missing signature for unsigned notary requests. The new notary actor
does it that way and, therefore, breaks notary request parsing by the
Alphabet because of skipping any request that is not filled with a dummy (64
zeros) invocation script. Support both way. The "Dummy" approach will be
dropped later.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Do not use WC's internals in the initialization routines without mode
protection. WC should be able to change its mode even if the initialization
is not finished yet.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
It can't be uint64 in fact, but this error is buried deeply in the NetworkInfo
API structure, so we're not touching MagicNumber() for now.
Signed-off-by: Roman Khimov <roman@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
That's the reason #2230 and #2263 were not detected earlier, we actually had
Global scope being used before reconnection to RPC node.
Signed-off-by: Roman Khimov <roman@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Fixes#2230, fixes#2263. CustomGroups are nice while we're only calling NeoFS
contracts, but it doesn't work at all for standard ones like GAS or Notary.
Signed-off-by: Roman Khimov <roman@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
`blobovniczatree` takes a really long time to prefill, because each
batch takes at least 10ms, so for 10k iterations we have at least 100s of
prefill.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
I tried to add 4 more tests and suddenly, it became harder to navigate in
code. Move directory creation in a common function.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Doctor RPC performs complex operations on the storage engine.
Currently only duplicate removal is supported.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
RemoveDuplicates() removes all duplicate object copies stored on
multiple shards. All shards are processed and the command tries to leave
a copy on the best shard according to HRW.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
All operations must ensure the shard is not in a degraded mode.
Write operations must also ensure the shard is not in a read-only mode.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
With non-blocking pool restricted by 10 in capacity, the probability of
dropping events is unexpectedly big. Notifications are an essential part of the FrostFS,
we should not drop anything, especially new epochs.
```
Mar 31 07:07:03 vedi neofs-ir[19164]: 2023-03-31T07:07:03.901Z debug subscriber/subscriber.go:154 new notification event from sidechain {"name": "NewEpoch"}
Mar 31 07:07:03 vedi neofs-ir[19164]: 2023-03-31T07:07:03.901Z warn event/listener.go:248 listener worker pool drained {"chain": "morph", "capacity": 10}
```
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
It led to a neo-go dead-lock in the `subscriber` component. Subscribing to
notifications is the same RPC as any others, so it could also be blocked
forever if no async listening (reading the notification channel) routine
exists. If a number of subscriptions is big enough (or a caller is lucky
enough) subscribing loop might have not finished subscribing before the
first notification is received and then: subscribing RPC is blocked by
received notification (non)handling and listening notifications routine is
blocked by not finished subscription loop.
That commit starts listening notification channel _before_ any subscription
actions.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
We have already had and solved plenty of deposit issues and notary balance
is a really important thing. Deserves to be INFO even before the huge logs
severity refactor, happens on an app start only.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
It does not use deprecated methods anymore but also adds more code that
removes. Future refactor that will affect more components will optimize
usage of the updated API.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
A directory is read and files are saved to a local variable. The iteration
over such files may lead to a non-existing files reading due to a normal SN
operation cycle and, therefore, may lead to a returning the OS error to a
caller. Skip just removed (or lost) files as the golang std library does in
similar situations:
5f1a0320b9/src/os/dir_unix.go (L128-L133).
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
In case of session token (ST) with object IDs search should
return only objects allowed in static session
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Initially it was there to check whether an update is being initiated by
a proper node. It is now obsolete for 2 reasons:
1. Background synchronization fetches all operations from a single node.
2. There are a lot more problems with trust in the tree service, it is
only used in controlled environments.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Fix sending GAS to an empty extra wallets receivers list. Also, send GAS to
extra wallets even if netmap is empty.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
GC deletes expired locks and objects sequentially. Expired locks and
objects are now being deleted concurrently in batches. Added a config
parameter that controls the number of concurrent workers and batch size.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
This reverts commit 2567f8020e. It assumes
that assembling logic could break some failover scenarios if request
forwarding is done. However, it also breaks requesting big objects via a
non-container node with TTL=2. Failover has been rechecked without that
commit and no problems were found. Any (if found) other bugs related to
the forwarding and object assembling must be solved more carefully.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Allow replication of any (expired too) locked object. Information about
object locking is considered to be presented on the _container nodes_.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Previously a token could've expired in the middle of an object.PUT
stream, leading to upload being interrupted. This is bad, because user
doesn't always now what is the right values for the session token
lifetime. More than that, setting it to a very high value will
eventually blow up the session token database.
In this commit we read the session token once and reuse it for the whole
stream duration.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
It will prevent test fails with `-race` flag on components that have
background processes and make some actions on test framework.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
The problem is that accidental timeout errors can make us to ignore
other nodes for some time. The primary purpose of the whole ignore
mechanism is not to degrade in case of failover. For this case,
closing connection and limiting the amount of dials is enough.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
In case we have many small objects in the write-cache, `indices` should
not be reused between iterations.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Missing `ReportError` method did not allow casing multi-client interface to
`errorReporter` interface and dropping broken connections.
`replicationClient` embeds that interface, and it is widely used across
node's code. Embedded interface does not allow casting its parent structure
to `errorReporter` and breaks multi client error reporting logic.
Multi-client scheme is extremely hard to maintain, it makes unpredictable
casts and does not allow tracking code flow, so it will be refactored in the
future anyway.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Previously, node could get an "infinite" small object: it could be expired
and thus could not be flushed (update its storage ID) to metabase => could
not be marked as flushed => node never removes such object and repeat all
the cycle one more time. If object exists and is not marked with GC (meta
returns `ErrObjectIsExpired`, not `ObjectNotFound` and not
`ObjectAlreadyRemoved`), its ID is safe to update _in the same_ bbolt
transaction.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
LRU `Peek`/`Contains` take LRU mutex _inside_ of a `View` transaction.
`View` transaction itself takes `mmapLock` [1], which is lifted after tx
finishes (in `tx.Commit()` -> `tx.close()` -> `tx.db.removeTx`)
When we evict items from LRU cache mutex order is different:
first we take LRU mutex and then execute `Batch` which _does_ take
`mmapLock` in case we need to remap. Thus the deadlock.
[1] 8f4a7e1f92/db.go (L708)
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
To achieve high performance we must choose proper values for both
batch size and delay. For user operations we want to set low delay.
However it would prevent tree synchronization operations to form big
enough batches. For these operations, batching gives the most benefit
not only in terms of on-CPU execution cost, but also by speeding up
transaction persist (`fsync`).
In this commit we try merging batches that are already
_triggered_, but not yet _started to execute_. This way we can still
query batches for execution after the provided delay while also allowing
multiple formed batches to execute faster.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
"Object is expired" means that object is presented in `meta` but it is not
`ObjectNotFound` error. Previous implementation made `shard` search for an
object without `meta` which was an error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
```
name old time/op new time/op delta
_addressFromString-8 1.25µs ±30% 1.02µs ± 6% -18.49% (p=0.000 n=9+9)
name old alloc/op new alloc/op delta
_addressFromString-8 352B ± 0% 256B ± 0% -27.27% (p=0.000 n=9+10)
name old allocs/op new allocs/op delta
_addressFromString-8 6.00 ± 0% 4.00 ± 0% -33.33% (p=0.000
n=10+10)
```
Also, assure compiler that `s` doesn't escape:
Before this commit:
```
./fstree.go:74:24: leaking param: s
./fstree.go:90:6: moved to heap: addr
```
After this commit:
```
./fstree.go:74:24: s does not escape
```
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Currently we track based on `PayloadSize`, because it is already stored
in the metabase and it is easier to calculate without slowing down the
whole system.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Currently the only way to tell whether `evacuate/set-mode` is finished
is to set a very big timeout and _hope_ that the operation will finish.
In this commit we add INFO logs for such operations which should
simplify the life of an administrator.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Currently, under high load clients are blocked on channel send
and the number of goroutines can increase indefinitely.
In this commit we drop replication messages if send/recv queue is full
and rely on a background synchronization.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
1. Reduce allocations inside transactions.
2. Do not encode container ID to string: it allocates a lot and takes more
space.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Under high load we are limited by the _amount_ of keys we need to update
in a single transaction. In this commit we try storing all state
with a single key.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
In the previous implementation any non-nil error that preceded object
fetching from blobstor led to iterating over every storage (in other words,
no storage ID information was taken into account). Now storage ID is
skipped only if metabase (storage ID source) returns any error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
After the reconnection interval feature there was an bug related to the big
objects collecting: split error is returned from a client directly, not
via API status and was considered as a connection error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
It should be similar to a `TreeAddByPath`. `applyOperation` is used for
`Apply` when the operation can be inserted in the middle of a log.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Under load changing shard mode can lead to it being removed from the
list during some other PUT.
```
Dec 28 07:01:26 az neofs-node[364505]: panic: runtime error: invalid memory address or nil pointer dereference
Dec 28 07:01:26 az neofs-node[364505]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xc9fbb1]
Dec 28 07:01:26 az neofs-node[364505]: goroutine 11791912 [running]:
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).putToShard(0xc000435490, {0xc0003f7a28?, 0xc0001192c0?}, 0x2, {0x0, 0x>
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:91 +0x1b1
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).put.func1(0xc000435490?, {0xc0003f7a28?, 0xc0001192c0?})
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:71 +0x19c
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).iterateOverSortedShards(0x1?, {{0x62, 0x23, 0xfe, 0x60, 0x67, 0xd5, 0x>
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/shards.go:225 +0xc8
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).put(0xc000435490, {0x1?})
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:66 +0x2a9
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Put.func1()
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:43 +0x2a
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).execIfNotBlocked(0x8?, 0x38?)
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/control.go:147 +0xcf
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Put(0xc4df775a80?, {0x0?})
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:42 +0x65
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.Put(0xc06d928b80?, 0xc06b1b8dc8?)
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:158 +0x19
Dec 28 07:01:26 az neofs-node[364505]: main.engineWithoutNotifications.Put({0x20301b?}, 0x20301b?)
```
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Because synchronization _most likely_ will have apply already existing
operations, it is much faster to check their presence in a read
transaction. However, always doing this will degrade the perfomance
for normal `Apply`. And, let's be honest, it is already not good.
Thus we add a separate parameter which specifies whether this logic is
enabled.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
`GETRANGEHASH` request spawns `GETRANGE` requests if an object could not be
found locally. If the original request contains session, it can be static
and, therefore, fetching session key can not be performed successfully.
As the best effort a node could request object's range with its own key.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Both `meta` and `write-cache` are expected to have a fast underlying disk,
so it does not seem like an optimisation. Moreover, `write-cache`'s `Head`
is a `Get` with payload cutting, it _must_ use more memory for no reason
(`meta` was created for such requests). Also, `write-cache` does not allow
performing any "meta" relations checks (such as locking, tombstoning).
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Node children are not sorted and could occur in any order.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
We use cache to avoid policing the same object multiple times in a short
time span (< 30 seconds). If we have 200_000 objects in a blobstor, it is a bit useless
-- if it takes 1 second to process an object and we have `replicator.pool_size: 20`
in config, the next iteration will happen in 10_000 second which is much
larger than 30 second. However we still consume a lot of memory, so it
makes sense to use saner default.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Stop child objects collection if the last returned object (the most "left"
object in the collected chain) starts exactly from the `GETRANGE`'s `from`
value.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
That could happen if a node forwards request to a node that closed the
connection during the original object stream.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Could be related to "websocket users limit reached" on the `neo-go` server
side when an SN/IR is rebooting repeatedly.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
1. Replace `mn` function with a `sigCount`.
2. Use `notary.FakeMultisigAccount` for account creation.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Includes:
1. mode change read lock operation in every exported method that r/w the
underlying database;
2. returning `ErrDegradedMode` logical error if any exported method is
called in degraded (without a metabase) mode.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
```
2022/11/15 08:40:56 worker exits from a panic: runtime error: index out of range [0] with length 0
2022/11/15 08:40:56 worker exits from panic: goroutine 1188 [running]:
github.com/panjf2000/ants/v2.(*goWorker).run.func1.1()
github.com/panjf2000/ants/v2@v2.4.0/worker.go:58 +0x10c
panic({0x1042b60, 0xc0015ae018})
runtime/panic.go:1038 +0x215
github.com/nspcc-dev/neofs-node/pkg/services/policer.(*Policer).shardPolicyWorker.func1()
github.com/nspcc-dev/neofs-node/pkg/services/policer/process.go:65 +0x366
github.com/panjf2000/ants/v2.(*goWorker).run.func1()
github.com/panjf2000/ants/v2@v2.4.0/worker.go:68 +0x97
created by github.com/panjf2000/ants/v2.(*goWorker).run
github.com/panjf2000/ants/v2@v2.4.0/worker.go:48 +0x68
```
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Includes extending listing methods in the Storage Engine with object types.
It allows tuning replication/policer algorithms: container nodes do
not remove `LOCK` objects as redundant and try to fulfill `LOCK` placement
on the ohter container nodes.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It allows keeping all the locked objects safe after metabase
resynchronization. Currently, all `LOCK` objects are broadcast to all nodes
in a container, it guarantees `LOCK` object presence in a regular situation.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
1. Remove a layer of indirection for mutex, `ClientCache` is already
used by pointer.
2. Fix duplication of a `AllowExternal` field.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
A container node is expected to have full "get" access to assemble the
object.
A non-container node is expected to forward any request to a container node.
Any token is expected to be issued for an original request sender not for a
node so any new request is invalid by design with that token.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Do not lose meta information of the original requests: cache session and
bearer tokens of the original request b/w a new generated ones. Middle
request wrappers should not contain any meta information, since it is
useless (e.g. ACL service checks only the original tokens).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
After presenting request statuses on the API level, all the errors are
unwrapped before sending to the caller side. It led to a losing invalid
request's context.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It was needed before we started to flush during transition to
`degraded` mode. Now it is confusing.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Do not use node's local storage if it is clear that an object will be
removed anyway as a redundant. It requires moving the changing local storage
logic from the validation step to the local target implementation.
It allows performing any relations checks (e.g. object locking) only if a
node is considered as a valid container member and is expected to store
(stored previously) all the helper objects (e.g. `LOCK`, `TOMBSTONE`, etc).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It is not an error: removing virtual object is expected and should be just
skipped. Getting a virtual object with `raw` flag is considered as an
impossible action, all the virtual objects removals will be handled via
their children's removals implicitly.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Currently there is a possibility for modifying operations to fail
because of I/O errors and a new tree to be created on another shard.
This commit adds existence check for modifying operations.
Read operations remain as they are, not to slow things.
`TreeDrop` is an exception, because this is a tree removal and trying
multiple shards is not an unwanted behaviour.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
By default writecache puts the whole object to update storage ID.
This logic comes from the times when we needed to put objects
in the metabase by the writecache itself. Now this is done by the
blobstor at unmarshaling objects during flush only to update storage ID
is an overkill.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
All logic errors are wrapped in `logicerr.Logical` type and do not
affect shard error counter.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
In previous implementation `Shard.Delete` logged writecache's removal
failures in `error` level. There is a need to decrease severity of these
log records since they aren't critical and don't require individual
review.
Change level of the message to `info`.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
From the `Bucket.ForEach` doc:
```
The provided function must not modify the bucket; this will result in undefined behavior.
```
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
After the refactor there are new storage characteristics: a type and
a general storage id (that could be stringified).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
In previous implementation turning to maintenance mode using NeoFS CLI
required NeoFS API endpoint. This was not convenient from the user
perspective. It's worth to move networks settings' check to the server
side.
Add `force_maintenance` field to `SetNetmapStatusRequest.Body` message
of Control API. Add `force` flag to `neofs-cli control set-status`
command which sets corresponding field in the requests body if status is
`maintenance`. Force flag is ignored for any other status.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Iterate over every shard and search for the container's trees. Final result
is a concatenation of shards' results. It is considered that one fixed tree
is placed on one fixed shard but the different trees of a fixed container
could be placed on different shards.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Add `SynchronizeAllTrees` method of the Tree service. It allows fetching
tree IDs and sync all of them. Share common logic b/w the new method and
the `SynchronizeTree`.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Add the node position in a container and the container size to the CID
descriptor that is passed to the `TreeApply`. Previously, `checkValid` does
not allow any log operarations.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
In the 2nd version, there was a database format change: buckets have changed
their keys, so it becomes impossible to check the version in the 1 -> 2+
migrations because of different buckets that store info about the version.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Storage node should not provide NeoFS Object API service when it is
under maintenance.
Declare `Common` service that unifies behavior of all object operations.
The implementation pre-checks if node is under maintenance and returns
`apistatus.NodeUnderMaintenance` if so. Use `Common` service as a first
logical processor in object service pipeline.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
In some scenarios original session can be unrelated to the objects which
are read internally by the node. For example, node requests child
objects when removing the parent one.
Tune internal NeoFS API client used by node's Object API server to
ignore unrelated sessions in `GetObject` / `HeadObject` / `PayloadRange`
ops.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
If shard ID is stored in metabase (it is not the first time boot), read it,
set it, use it (not a generated one) in the metrics writer.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Make it store its internal `zap.Logger`'s level. Also, make all the
components to accept internal `logger.Logger` instead of `zap.Logger`; it
will simplify future refactor.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Current spec allows denying GET_RANGE requests from other storage nodes.
However, GET should always be allowed and it is enough to perform
GET_RANGE locally
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
In previous implementation `ObjectService.Get` RPC handler failed with
`parent address in child object differs` while assembling the "big"
object. This was caused by the child check which required parent
reference to be set in all child objects. The check was impracticable
because not all elements of the split-chain have a link to the parent.
Make `execCtx.isChild` to return `true` if parameterized object has no
parent header in its own header.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Node response with `NODE_UNDER_MAINTENANCE` status signals that the node
was switched to maintenance mode. There is a delay between the actual
switch and the reflection in the network map of up to one epoch. To
speed up the reaction to the maintenance, it is required to recognize
such node responses in the Policer.
Make `Policer.processNodes` to exclude elements with shortage decreasing
on `NODE_UNDER_MAINTENANCE` status response.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Nodes under maintenance SHOULD not respond to object requests. Based on
this, storage node's Policer SHOULD consider such nodes as problem ones.
However, to prevent spam with the new replicas, on the contrary, Policer
should consider them normal.
Make `Policer.processNodes` to exclude elements if `IsMaintenance()`
with shortage decreasing.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Make `replicator.TaskResult` to accept `netmap.NodeInfo` type instead of
uint64 in order to clarify the meaning and prevent passing the random
numbers.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
It doesn't make sense to check object relation in session check of
`ObjectService.Put` RPC which has been spawned by `ObjectService.Delete`
with session. Session issuer can't predict identifier of the tombstone
object to be created.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Currently, when removing shard special care must be taken with respect
to shard numbering. `mode: disabled` allows to leave shard configuration
in place while also ignoring it during initialization. This makes
disk replacement much more convenient.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
In previous implementation of `neofs-node` app object session was not
checked for substitution of the object related to it. Also, for access
checks, the session object was substituted instead of the one from the
request. This, on the one hand, made it possible to inherit the session
from the parent object for authorization for certain actions. On the
other hand, it covered the mentioned object substitution, which is a
critical vulnerability.
Next changes are applied to processing of all Object service requests:
- check if object session relates to the requested object
- use requested object in access checks.
Disclosed problem of object context inheritance will be solved within
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
In previous implementation node blocked any operation of local object
storage in maintenance mode. There is a need to perform some storage
operations like data evacuation or restoration.
Do not call block storage engine in maintenance mode. Make all Object
service operations to return `apistatus.NodeUnderMaintenance` error from
each local op.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
After recent Netmap contract changes all read methods which return
network map (either candidates or snapshots) encode node descriptors
into same structure.
Decode `netmap.Node` contract-side structure from the call results.
Replace node state with the value from the `netmap.Node.State` field.
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
Storage node can be requested to be switched into `MAINTENANCE` state.
Inner Ring should accept such requests only if network configuration
allows it.
Make `Processor` of Netmap contract's notifications to depend on
`state.NetworkSettings`. Make `Processor.processUpdatePeer` to call
`MaintenanceModeAllowed` if notification event relates to `MAINTENANCE`
mode`. Share singe `state.NetworkSettings` provider in Inner Ring
application.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After recent changes Netmap contract can send `UpdateState` notification
event with `MAINTENANCE` node's state. There is a need to provide
functionality to work with the status.
Provide `UpdatePeer.Maintenance` method. Support new state in
`ParseUpdatePeer` and `ParseUpdatePeerNotary` functions.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After recent changes in NeoFS protocol storage nodes can be in
`MAINTENANCE` state. There is a need to support this state in
`UpdateState` operation.
Add `UpdatePeerPrm.SetMaintenance` method which makes node to be
switched into `MAINTENANCE` mode after the `UpdatePeerState` operation.
New functionality is going to be used in Storage node application for
Control API serving.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Inner Ring should allow registering of storage nodes with `MAINTENANCE`
state in the NeoFS network only if its configuration allows this status.
Make `networkSettings.MaintenanceModeAllowed` to call
`MaintenanceModeAllowed` method of underlying Netmap contract's client
in order to assert state allowance.
From now nodes will be accepted to the network with `MAINTENANCE` state
only with the appropriate network configuration.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
After recent changes in the NeoFS API protocol network configuration
contains `MaintenanceModeAllowed` boolean flag. There is a need to
support the config value in all NeoFS applications.
Provide `Client.MaintenanceModeAllowed` method which read the config
from the Sidechain. Extend `NetworkConfiguration` structure with
`MaintenanceModeAllowed` field.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`NetworkConfiguration` represents NeoFS network configuration stored in
the Sidechain. In previous implementation the configuration missed flag
of disabled homomorphic hashing.
Add `NetworkConfiguration.HomomorphicHashingDisabled` boolean field.
Decode the field in `Client.ReadNetworkConfiguration` method. Print this
value in `netmap netinfo` command of NeoFS CLI.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
`readBoolConfig` method is going to be reused for reading other
configuration values. All boolean settings are `false` by default, so it
makes sense to return default value on missing key directly from
`readBoolConfig`.
Handle `ErrConfigNotFound` case in `readBoolConfig` method. Change
`HomomorphicHashDisabled` method to call `readBoolConfig` directly.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
There is a need to prevent limitless abuse of MAINTENANCE status of the
storage nodes. To do this, configuration of the NeoFS network is going
to be extended with the flag which allows the state. Until this is done,
it makes sense to prepare a site for this in the code.
Define `state.NetworkSettings` interface as an abstraction of global
network configuration within the `state` package. Make
`NetMapCandidateValidator` to depend on `NetworkSettings` and provide
corresponding field setter. Change `VerifyAndUpdate` method's behavior
to return an error for candidates with MAINTENANCE state if this state
is disallowed by the network configuration. Provide `NetworkSettings`
from the wrapper over Netmap contract's client on Inner Ring application
side. The provider is implemented to statically disallow MAINTENANCE
mode in order to save previous behavior.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Signed-off-by: Leonard Lyubich <ctulhurider@gmail.com>
New network status of storage nodes is going to be introduced. To
simplify the addition, it would be useful to prepare the code for this.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
In previous implementation Inner Ring allowed storage nodes with any
state to register in the network. According to the current design, only
nodes with ONLINE state are allowed to enter the network map.
Create new `state` sub-package of `nodevalidation` package of Inner Ring
application. Define `state.NetMapCandidateValidator` type and provide
`NodeValidator` interface required by the Inner Ring's processor of
`Netmap` contract's notification events. Embed new validator into the
one used by the Inner Ring application.
From now all `AddPeer` notifications with node state other than `ONLINE`
will be denied.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Degraded mode allows us to operate without an SSD,
thus writecache should be unavailable in this mode.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Add a common error for this case because it is not an error
which should increase error counter. Single error simplifies checks on
the call-site.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Negative values have no sense. On the other hand it differs from the
blobovnicza's configuration and prevents unification.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Allow user to initiate flushing objects from a writecache.
We need this in 2 cases:
1. During writecache storage schema update, it should be flushed with
the old version of node and started clean with a new one.
2. During SSD replacement, to avoid data loss.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Replace `ProcessCurrentNetMap` method of `NodeState` interface with
`ReadCurrentNetMap` one with two changes:
* Replace network map type from NeoFS SDK package with the
protocol-generated message. This replaces all the business logic to
the application layer.
* Support error return. This allows to cover problem node states.
Return an error from `NodeState.ReadCurrentNetMap` method implemeted
through `atomic.Value` if `Store` method has not been called yet.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Use a simple loop at the callsite. This way we remove as much as we can.
Also, `Delete` metrics is more meaningful now.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
- Meta now supports (and requires) inc/dec labeled counters
- The new logic counter is incremented on `Put` operations and is
decremented on `Inhume` and `Delete` operations that are performed on
_stored_ objects only
- Allow force counters sync. "Force" mode should be used on metabase resync
and should not be used on a regular meta start
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
It will allow to use labeled object counters, e.g.: "phy" for all
the physically stored objects, "logic" for only available objects. Also, it
will allow to add typed (regular, TS, SG, LOCK) counters if needed.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Do not call `CalculateAction` for the eACL checks since it requires object
headers that are meaningless in the tree context.
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
1. Do not require a request to be signed by the container owner if a
bearer token is missing
2. Do not check the system role since public requests are not expected to
be signed by IR or a container node (unlike the object requests)
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Sometimes it is useful to track sidechain events processed by the node.
For example, we need some easy way to look up for log messages about new
epoch notifications.
Make `subscriber.Subscriber` to log names of all events received from
the sidechain notification channel.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Re-configuration of Blobovnicza's object size limit must not affect
already stored objects. In previous implementation `Blobovnicza` didn't
see objects stored in buckets which became too big after size limit
re-configuration. For example, lets consider 1st configuration with 64KB
size limit for stored objects. Lets assume that we stored object of 64KB
size, and re-configured `Blobovnicza` with 32KB limit. After reboot
object should be still available, but actually it isn't. This is caused
by `Get` operation algorithm which iterates over configured size ranges
only, and doesn't process any other existing size bucket. By the way,
increasing of the object size limit didn't lead to the problem even in
previous implementation.
Make `Blobovnicza.Get` method to iterate over all size buckets
regardless of current configuration. This covers the described scenario.
Signed-off-by: Leonard Lyubich <leonard@nspcc.ru>
Previously, the depth was restricted because with BFS the amount of
nodes we have in memory blows up exponentially. With DFS is is linear,
so we can process trees of arbitrary depth.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>
Some methods add "IR" suffix to its names in notary enabled envs
because of contract logic. It was broken due to incorrect notary state
reading (tryNotary != notary is enabled).
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Set flush mark in the inside the flush worker because writing to the blobstor
can fail. Because each evicted object must be deleted, it is reasonable
to do this in the evict callback.
The evict callback is protected by LRU mutex and thus potentially interferes
with `Get` and `Iterate` methods. This problem will be addressed in the
future.
Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>