Due to the flushing data from the writecache to the storage
and simultaneous deletion, a partial deletion situation is possible.
So as a solution, deletion is allowed only when the object is in storage,
because object will be deleted from writecache by flush goroutine.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Now UpdateStorageID doesn't return error in case of logical error.
If object is in graveyard or GC market, it is still required to
update storage ID.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
It was introduced in 69e1e6ca to help node determine faulty shards.
However, the situation is possible in a real-life scenario:
1. Object O is evacuated from shard A to B.
2. Shard A is unmounted because of lower-level errors.
3. We now have object in meta on A and in blobstor on B. Technically we
have it in meta on shard B too, but we still got the error if B goes
to a degraded mode.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Supposedly, this was added to allow creating 2 different shards without
subtest. Now we use t.TempDir() everywhere, so this should not be a
problem.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
newCustomShard() has many parameters but only the first is obligatory.
`enableWriteCache` is left as-is, because it directly affects the
functionality.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
This was set in #348 to speed up tests.
It seems 100ms doesn't increase overall test time,
but it reduces the amount of logs by 100x factor.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
In some places we have debug=false, in others debug=true.
Let's be consistent.
Semantic patch:
```
@@
@@
-test.NewLogger(..., false)
+test.NewLogger(..., true)
```
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Protect test metric store fields with a mutex. Probably, not every field
should be protected, but better safe than sorry.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Semantic patch:
```
@@
var f expression
var t expression
var a expression
@@
f(
...,
- zap.String(t, a.String()),
+ zap.Stringer(t, a),
...,
)
```
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Some of our pilorama tests fail on CI.
The reasons are not obvious, but one possible improvement
is using `WithNoSync` option for these. It should have much effect,
because we are writing on the tmpfs, but doesn't hurt anyway.
If I replace `t.TempDir()` with a local directory, test execution time
goes down from 5s (sync) to 0.4s (nosync), which is the same time as
with `t.TempDir()`. Maybe we have some strange CI configuration.
```
panic: test timed out after 10m0s
running tests:
TestForest_ApplyRandom (8m22s)
TestForest_ApplyRandom/bbolt (8m21s)
...
goroutine 170 [syscall]:
syscall.Syscall(0xc000100000?, 0xc00047b758?, 0x6aff9a?, 0xc00041c1b0?)
/opt/hostedtoolcache/go/1.20.7/x64/src/syscall/syscall_linux.go:69 +0x27
syscall.Fdatasync(0x9e35c0?)
/opt/hostedtoolcache/go/1.20.7/x64/src/syscall/zsyscall_linux_amd64.go:418 +0x2a
go.etcd.io/bbolt.fdatasync(0xc000189000?)
```
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Estimate cache size after delete objects to update metric.
Update counters on small object deletion.
Do not count bbolt DB file as FSTree object.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Concurrent initialization in case of the metabase resync leads to
high memory consumption and potential OOM.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
1. In redo() we save the old state.
2. If we do redo() for the same operation twice, the old state will be
overritten with the new one.
3. This in turn affects undo() and subsequent isAncestor() check.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Now it's possible to run evacuate shard in async.
Also only one evacuate process can be in progress.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
Memory forest is here to check the correctness of boltdb optimized
implementation. Let's keep it simple.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
If operation with WC are _fast enough_ (e.g. `Init` failed and `Close` is
called immediately) there is a race and a deadlock that do not allow finish
(and start, in fact) an initialization routine because of taken `modeMtx`
and also do not allow finish `Close` call because of awaiting initialization
finish. So do stop initialization _before_ any mutex is taken.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Do not use WC's internals in the initialization routines without mode
protection. WC should be able to change its mode even if the initialization
is not finished yet.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
`blobovniczatree` takes a really long time to prefill, because each
batch takes at least 10ms, so for 10k iterations we have at least 100s of
prefill.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
I tried to add 4 more tests and suddenly, it became harder to navigate in
code. Move directory creation in a common function.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
RemoveDuplicates() removes all duplicate object copies stored on
multiple shards. All shards are processed and the command tries to leave
a copy on the best shard according to HRW.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
All operations must ensure the shard is not in a degraded mode.
Write operations must also ensure the shard is not in a read-only mode.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
A directory is read and files are saved to a local variable. The iteration
over such files may lead to a non-existing files reading due to a normal SN
operation cycle and, therefore, may lead to a returning the OS error to a
caller. Skip just removed (or lost) files as the golang std library does in
similar situations:
5f1a0320b9/src/os/dir_unix.go (L128-L133).
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Initially it was there to check whether an update is being initiated by
a proper node. It is now obsolete for 2 reasons:
1. Background synchronization fetches all operations from a single node.
2. There are a lot more problems with trust in the tree service, it is
only used in controlled environments.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
GC deletes expired locks and objects sequentially. Expired locks and
objects are now being deleted concurrently in batches. Added a config
parameter that controls the number of concurrent workers and batch size.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
It will prevent test fails with `-race` flag on components that have
background processes and make some actions on test framework.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
In case we have many small objects in the write-cache, `indices` should
not be reused between iterations.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Previously, node could get an "infinite" small object: it could be expired
and thus could not be flushed (update its storage ID) to metabase => could
not be marked as flushed => node never removes such object and repeat all
the cycle one more time. If object exists and is not marked with GC (meta
returns `ErrObjectIsExpired`, not `ObjectNotFound` and not
`ObjectAlreadyRemoved`), its ID is safe to update _in the same_ bbolt
transaction.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
LRU `Peek`/`Contains` take LRU mutex _inside_ of a `View` transaction.
`View` transaction itself takes `mmapLock` [1], which is lifted after tx
finishes (in `tx.Commit()` -> `tx.close()` -> `tx.db.removeTx`)
When we evict items from LRU cache mutex order is different:
first we take LRU mutex and then execute `Batch` which _does_ take
`mmapLock` in case we need to remap. Thus the deadlock.
[1] 8f4a7e1f92/db.go (L708)
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
To achieve high performance we must choose proper values for both
batch size and delay. For user operations we want to set low delay.
However it would prevent tree synchronization operations to form big
enough batches. For these operations, batching gives the most benefit
not only in terms of on-CPU execution cost, but also by speeding up
transaction persist (`fsync`).
In this commit we try merging batches that are already
_triggered_, but not yet _started to execute_. This way we can still
query batches for execution after the provided delay while also allowing
multiple formed batches to execute faster.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
"Object is expired" means that object is presented in `meta` but it is not
`ObjectNotFound` error. Previous implementation made `shard` search for an
object without `meta` which was an error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
```
name old time/op new time/op delta
_addressFromString-8 1.25µs ±30% 1.02µs ± 6% -18.49% (p=0.000 n=9+9)
name old alloc/op new alloc/op delta
_addressFromString-8 352B ± 0% 256B ± 0% -27.27% (p=0.000 n=9+10)
name old allocs/op new allocs/op delta
_addressFromString-8 6.00 ± 0% 4.00 ± 0% -33.33% (p=0.000
n=10+10)
```
Also, assure compiler that `s` doesn't escape:
Before this commit:
```
./fstree.go:74:24: leaking param: s
./fstree.go:90:6: moved to heap: addr
```
After this commit:
```
./fstree.go:74:24: s does not escape
```
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Currently we track based on `PayloadSize`, because it is already stored
in the metabase and it is easier to calculate without slowing down the
whole system.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Currently the only way to tell whether `evacuate/set-mode` is finished
is to set a very big timeout and _hope_ that the operation will finish.
In this commit we add INFO logs for such operations which should
simplify the life of an administrator.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
1. Reduce allocations inside transactions.
2. Do not encode container ID to string: it allocates a lot and takes more
space.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Under high load we are limited by the _amount_ of keys we need to update
in a single transaction. In this commit we try storing all state
with a single key.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
In the previous implementation any non-nil error that preceded object
fetching from blobstor led to iterating over every storage (in other words,
no storage ID information was taken into account). Now storage ID is
skipped only if metabase (storage ID source) returns any error.
Signed-off-by: Pavel Karpy <p.karpy@yadro.com>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
It should be similar to a `TreeAddByPath`. `applyOperation` is used for
`Apply` when the operation can be inserted in the middle of a log.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
Under load changing shard mode can lead to it being removed from the
list during some other PUT.
```
Dec 28 07:01:26 az neofs-node[364505]: panic: runtime error: invalid memory address or nil pointer dereference
Dec 28 07:01:26 az neofs-node[364505]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xc9fbb1]
Dec 28 07:01:26 az neofs-node[364505]: goroutine 11791912 [running]:
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).putToShard(0xc000435490, {0xc0003f7a28?, 0xc0001192c0?}, 0x2, {0x0, 0x>
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:91 +0x1b1
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).put.func1(0xc000435490?, {0xc0003f7a28?, 0xc0001192c0?})
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:71 +0x19c
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).iterateOverSortedShards(0x1?, {{0x62, 0x23, 0xfe, 0x60, 0x67, 0xd5, 0x>
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/shards.go:225 +0xc8
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).put(0xc000435490, {0x1?})
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:66 +0x2a9
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Put.func1()
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:43 +0x2a
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).execIfNotBlocked(0x8?, 0x38?)
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/control.go:147 +0xcf
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.(*StorageEngine).Put(0xc4df775a80?, {0x0?})
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:42 +0x65
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine.Put(0xc06d928b80?, 0xc06b1b8dc8?)
Dec 28 07:01:26 az neofs-node[364505]: github.com/nspcc-dev/neofs-node/pkg/local_object_storage/engine/put.go:158 +0x19
Dec 28 07:01:26 az neofs-node[364505]: main.engineWithoutNotifications.Put({0x20301b?}, 0x20301b?)
```
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>