abd502215f
[ #970 ] fstree: Move file locking to the generic writer
...
It is not a part of FSTree itself, but rather a way to solve concurrent
counter update on non-linux implementations. New linux implementations
is pretty simple: link fails when the file exists, unlink fails when the
file doesn't exist.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00
fb74524ac7
[ #970 ] fstree: Move delete implementation to a separate file
...
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00
7f692409cf
[ #970 ] fstree: Handle unsupported O_TMPFILE
...
Metabase test relied on this behaviour, so fix the test too.
Cherry-picking was hard and did too many conflicts,
here is an original PR:
https://github.com/nspcc-dev/neofs-node/pull/2624
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00
Roman Khimov
fc31b9c947
[ #970 ] fstree: Add linux-specific file writer using O_TMPFILE
...
O_TMPFILE is implemented for all modern FSes and it's much easier and safer to
use. If application crashes in the middle of writing this file would be gone
and won't leave any garbage.
Notice that this implementation makes a different choice wrt EEXIST handling,
generic one always overwrites, while this one keeps the old data.
There is no real performance difference.
SSD (slow&old), XFS, Core i7-8565U:
Sync
```
name old time/op new time/op delta
Put/size=1024,thread=1/fstree-8 1.74ms ± 3% 0.06ms ± 7% -96.31% (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8 10.0ms ±41% 1.1ms ±18% -88.95% (p=0.000 n=9+10)
Put/size=1024,thread=100/fstree-8 32.3ms ±60% 6.5ms ±14% -79.97% (p=0.000 n=10+10)
Put/size=1048576,thread=1/fstree-8 17.8ms ±90% 3.4ms ±70% -81.08% (p=0.000 n=10+10)
Put/size=1048576,thread=20/fstree-8 103ms ±174% 112ms ±158% ~ (p=0.971 n=10+10)
Put/size=1048576,thread=100/fstree-8 949ms ±78% 583ms ±132% ~ (p=0.089 n=10+10)
name old alloc/op new alloc/op delta
Put/size=1024,thread=1/fstree-8 3.17kB ± 1% 1.96kB ± 0% -38.09% (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8 59.6kB ± 1% 39.2kB ± 1% -34.30% (p=0.000 n=8+10)
Put/size=1024,thread=100/fstree-8 299kB ± 0% 198kB ± 0% -33.90% (p=0.000 n=7+9)
Put/size=1048576,thread=1/fstree-8 3.38kB ± 1% 2.36kB ± 1% -30.22% (p=0.000 n=10+10)
Put/size=1048576,thread=20/fstree-8 65.7kB ± 4% 47.7kB ± 6% -27.27% (p=0.000 n=10+10)
Put/size=1048576,thread=100/fstree-8 351kB ± 8% 245kB ± 8% -30.22% (p=0.000 n=10+10)
name old allocs/op new allocs/op delta
Put/size=1024,thread=1/fstree-8 30.3 ± 2% 21.0 ± 0% -30.69% (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8 554 ± 1% 413 ± 0% -25.35% (p=0.000 n=8+10)
Put/size=1024,thread=100/fstree-8 2.77k ± 0% 2.07k ± 0% -25.27% (p=0.000 n=7+10)
Put/size=1048576,thread=1/fstree-8 32.0 ± 0% 25.0 ± 0% -21.88% (p=0.000 n=9+8)
Put/size=1048576,thread=20/fstree-8 609 ± 5% 494 ± 6% -18.93% (p=0.000 n=10+10)
Put/size=1048576,thread=100/fstree-8 3.25k ± 9% 2.50k ± 8% -23.21% (p=0.000 n=10+10)
```
No sync
```
name old time/op new time/op delta
Put/size=1024,thread=1/fstree-8 71.3µs ±10% 59.8µs ±10% -16.21% (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8 1.43ms ± 6% 1.22ms ±13% -14.53% (p=0.000 n=10+10)
Put/size=1024,thread=100/fstree-8 8.12ms ± 3% 6.36ms ± 2% -21.67% (p=0.000 n=8+9)
Put/size=1048576,thread=1/fstree-8 1.88ms ±70% 1.61ms ±78% ~ (p=0.393 n=10+10)
Put/size=1048576,thread=20/fstree-8 32.7ms ±28% 34.2ms ±112% ~ (p=0.968 n=9+10)
Put/size=1048576,thread=100/fstree-8 262ms ±56% 226ms ±34% ~ (p=0.447 n=10+9)
name old alloc/op new alloc/op delta
Put/size=1024,thread=1/fstree-8 2.89kB ± 0% 1.96kB ± 0% -32.28% (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8 58.2kB ± 0% 39.5kB ± 0% -32.09% (p=0.000 n=8+8)
Put/size=1024,thread=100/fstree-8 291kB ± 0% 198kB ± 0% -32.19% (p=0.000 n=9+9)
Put/size=1048576,thread=1/fstree-8 3.05kB ± 1% 2.13kB ± 1% -30.16% (p=0.000 n=10+9)
Put/size=1048576,thread=20/fstree-8 62.6kB ± 0% 44.3kB ± 0% -29.23% (p=0.000 n=9+9)
Put/size=1048576,thread=100/fstree-8 302kB ± 0% 210kB ± 1% -30.39% (p=0.000 n=9+9)
name old allocs/op new allocs/op delta
Put/size=1024,thread=1/fstree-8 27.0 ± 0% 21.0 ± 0% -22.22% (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8 539 ± 0% 415 ± 0% -22.98% (p=0.000 n=10+10)
Put/size=1024,thread=100/fstree-8 2.69k ± 0% 2.07k ± 0% -23.09% (p=0.000 n=9+9)
Put/size=1048576,thread=1/fstree-8 28.0 ± 0% 22.3 ± 3% -20.36% (p=0.000 n=8+10)
Put/size=1048576,thread=20/fstree-8 577 ± 0% 458 ± 0% -20.72% (p=0.000 n=9+9)
Put/size=1048576,thread=100/fstree-8 2.76k ± 0% 2.15k ± 0% -22.05% (p=0.000 n=9+8)
```
HDD (LVM), ext4, Ryzen 5 1600:
Sync
```
│ fs.sync-generic │ fs.sync-linux │
│ sec/op │ sec/op vs base │
Put/size=1024,thread=1/fstree-12 34.70m ± 19% 33.59m ± 16% ~ (p=0.529 n=10)
Put/size=1024,thread=20/fstree-12 188.8m ± 8% 189.2m ± 16% ~ (p=0.739 n=10)
Put/size=1024,thread=100/fstree-12 264.8m ± 22% 273.6m ± 28% ~ (p=0.353 n=10)
Put/size=1048576,thread=1/fstree-12 54.90m ± 14% 47.08m ± 18% ~ (p=0.063 n=10)
Put/size=1048576,thread=20/fstree-12 244.1m ± 14% 220.4m ± 22% ~ (p=0.579 n=10)
Put/size=1048576,thread=100/fstree-12 847.2m ± 5% 893.6m ± 3% +5.48% (p=0.000 n=10)
geomean 164.3m 158.9m -3.29%
│ fs.sync-generic │ fs.sync-linux │
│ B/op │ B/op vs base │
Put/size=1024,thread=1/fstree-12 3.375Ki ± 1% 2.471Ki ± 1% -26.80% (p=0.000 n=10)
Put/size=1024,thread=20/fstree-12 66.62Ki ± 6% 49.21Ki ± 6% -26.15% (p=0.000 n=10)
Put/size=1024,thread=100/fstree-12 319.2Ki ± 1% 230.9Ki ± 2% -27.64% (p=0.000 n=10)
Put/size=1048576,thread=1/fstree-12 3.457Ki ± 1% 2.559Ki ± 1% -25.97% (p=0.000 n=10)
Put/size=1048576,thread=20/fstree-12 66.91Ki ± 1% 49.16Ki ± 1% -26.52% (p=0.000 n=10)
Put/size=1048576,thread=100/fstree-12 338.8Ki ± 2% 252.3Ki ± 3% -25.54% (p=0.000 n=10)
geomean 42.17Ki 31.02Ki -26.44%
│ fs.sync-generic │ fs.sync-linux │
│ allocs/op │ allocs/op vs base │
Put/size=1024,thread=1/fstree-12 33.00 ± 0% 27.00 ± 0% -18.18% (p=0.000 n=10)
Put/size=1024,thread=20/fstree-12 639.5 ± 1% 519.0 ± 2% -18.84% (p=0.000 n=10)
Put/size=1024,thread=100/fstree-12 3.059k ± 1% 2.478k ± 2% -18.99% (p=0.000 n=10)
Put/size=1048576,thread=1/fstree-12 33.50 ± 1% 28.00 ± 4% -16.42% (p=0.000 n=10)
Put/size=1048576,thread=20/fstree-12 638.5 ± 1% 520.0 ± 1% -18.56% (p=0.000 n=10)
Put/size=1048576,thread=100/fstree-12 3.209k ± 2% 2.655k ± 2% -17.28% (p=0.000 n=10)
geomean 405.3 332.1 -18.05%
```
No sync
```
│ fs.nosync-generic │ fs.nosync-linux │
│ sec/op │ sec/op vs base │
Put/size=1024,thread=1/fstree-12 148.2µ ± 20% 136.6µ ± 19% -7.89% (p=0.029 n=10)
Put/size=1024,thread=20/fstree-12 1.140m ± 26% 1.364m ± 16% ~ (p=0.143 n=10)
Put/size=1024,thread=100/fstree-12 11.93m ± 68% 26.89m ± 62% ~ (p=0.123 n=10)
Put/size=1048576,thread=1/fstree-12 1.302m ± 3% 1.287m ± 5% ~ (p=0.481 n=10)
Put/size=1048576,thread=20/fstree-12 77.52m ± 8% 74.07m ± 7% ~ (p=0.278 n=10+9)
Put/size=1048576,thread=100/fstree-12 226.1m ± ∞ ¹
geomean 5.986m 3.434m +18.60% ²
¹ need >= 6 samples for confidence interval at level 0.95
² benchmark set differs from baseline; geomeans may not be comparable
│ fs.nosync-generic │ fs.nosync-linux │
│ B/op │ B/op vs base │
Put/size=1024,thread=1/fstree-12 2.879Ki ± 0% 1.972Ki ± 0% -31.51% (p=0.000 n=10)
Put/size=1024,thread=20/fstree-12 55.94Ki ± 1% 37.90Ki ± 1% -32.25% (p=0.000 n=10)
Put/size=1024,thread=100/fstree-12 272.6Ki ± 0% 182.1Ki ± 9% -33.21% (p=0.000 n=10)
Put/size=1048576,thread=1/fstree-12 3.158Ki ± 0% 2.259Ki ± 0% -28.46% (p=0.000 n=10)
Put/size=1048576,thread=20/fstree-12 58.87Ki ± 0% 41.03Ki ± 0% -30.30% (p=0.000 n=10+9)
Put/size=1048576,thread=100/fstree-12 299.8Ki ± ∞ ¹
geomean 36.71Ki 16.60Ki -31.17% ²
¹ need >= 6 samples for confidence interval at level 0.95
² benchmark set differs from baseline; geomeans may not be comparable
│ fs.nosync-generic │ fs.nosync-linux │
│ allocs/op │ allocs/op vs base │
Put/size=1024,thread=1/fstree-12 28.00 ± 0% 22.00 ± 0% -21.43% (p=0.000 n=10)
Put/size=1024,thread=20/fstree-12 530.0 ± 0% 407.5 ± 1% -23.11% (p=0.000 n=10)
Put/size=1024,thread=100/fstree-12 2.567k ± 0% 1.956k ± 9% -23.77% (p=0.000 n=10)
Put/size=1048576,thread=1/fstree-12 30.00 ± 0% 24.00 ± 0% -20.00% (p=0.000 n=10)
Put/size=1048576,thread=20/fstree-12 553.5 ± 0% 434.0 ± 0% -21.59% (n=10+9)
Put/size=1048576,thread=100/fstree-12 2.803k ± ∞ ¹
geomean 347.9 178.8 -21.99% ²
¹ need >= 6 samples for confidence interval at level 0.95
² benchmark set differs from baseline; geomeans may not be comparable
```
Signed-off-by: Roman Khimov <roman@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00
ff488b53a1
[ #970 ] fstree: Move write functions to a separate file
...
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00
9a622a750d
[ #970 ] fstree: Move temporary path handling in a separate function
...
Allow to easier test different implementations.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00
d19ade23c8
[ #959 ] node: Set mode to shard's components when open it
...
Avoid opening database for `metabase` and `cache` in `Degraded` mode.
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-02-09 14:04:01 +00:00
db67c21d55
[ #947 ] engine: Evacuate trees to remote nodes
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-09 11:33:15 +03:00
728150d1d2
[ #947 ] engine: Evacuate trees to local shards
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-09 11:33:15 +03:00
15d853ea22
[ #947 ] controlSvc: Return tree evacuation stat
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-09 11:20:39 +03:00
b3f3505ada
[ #947 ] cli: Allow to specify evacuation scope
...
It may be required to evacuate only objects or only tree or all, so
now it spossible to specify.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-09 11:20:38 +03:00
a6eb66bf9c
[ #947 ] evacuate: Refactor evacuate parameters
...
Drop methods to make it easier to extend.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-09 11:20:38 +03:00
8e2a0611f4
[ #947 ] tree: Add method to list all trees
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-09 11:20:38 +03:00
cfc5ce7853
[ #964 ] metabase: Drop GC marks if object not found
...
GC inhumes expired locks and tombstones on all the shards.
So it could be GC mark without object.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-08 07:54:39 +00:00
9ba48c582d
[ #917 ] engine: Allow to detach shards
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-06 14:49:47 +03:00
d0eadf7ea2
[ #799 ] engine: Skip put when object removed from shard
...
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-02-01 17:49:22 +00:00
e3573de6db
[ #930 ] gc: Stop internal activity by context
...
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-01-31 08:30:34 +00:00
675eec91f3
[ #938 ] shard: Update only changed counters
...
If metric value hasn't changed, but we update metric, then
non existed metric will apear with zero value.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-30 12:37:48 +03:00
c681354afd
[ #938 ] engine: Fix container count removal
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-30 12:37:48 +03:00
5ed330e436
[ #927 ] metabase: Delete GC marks
...
`key` is changed inside `db.get`, so encode address again after get.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-24 18:51:16 +03:00
931a5e9aaf
[ #918 ] engine: Move shard to degraded mode if metabase open failed
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-23 11:16:40 +03:00
f526f49995
[ #874 ] engine: Check object existance concurrently
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-23 09:28:29 +03:00
f5160b27fc
[ #920 ] tests: Fix data races
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-19 14:06:05 +03:00
63d3ed1ad8
[ #904 ] tests: Close test engine after test
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-17 19:04:39 +03:00
57171907e3
[ #904 ] metabase: Return if object was actuall inserted
...
This requires to count metrics properly.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-17 19:04:39 +03:00
c1a80235db
[ #904 ] metabase: Log Inhume operation
...
It will be very useful for troubleshooting.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-17 18:42:52 +03:00
a2ab373a0a
[ #895 ] metabase: Do not delete GC mark for virtual objects
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-11 12:32:09 +00:00
7166e77c2b
[ #895 ] test: Add logger to test shard
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-11 12:32:09 +00:00
47dcfa20f3
[ #895 ] test: Use t.Cleanup only for external resources
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-11 12:32:09 +00:00
f1b2b8bffa
[ #895 ] test: Fix NewLogger arguments list
...
`debug` is always true.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-11 12:32:09 +00:00
4b8b4da681
[ #864 ] engine: Drop container count metric if container removed
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-10 10:45:32 +03:00
d75e7e9a21
[ #864 ] engine: Drop container size metric if container deleted
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-10 10:44:54 +03:00
dfd62ca6b1
[ #864 ] metabase: Refactor delete/inhume
...
Available -> Logic, Raw -> Phy for delete/inhume results.
Use single counter instead of vectors.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-01-09 09:59:42 +03:00
225fe2d4d5
[ #894 ] blobovniczatree: Speedup rebuild test
...
Down from 3s to 300ms.
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2023-12-29 16:28:54 +00:00
581887148a
[ #569 ] cli: Add control shards writecache seal
command
...
It does the same as `control shards flush-writecache --seal`, but
has better name.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-29 16:05:37 +03:00
7a9db5bcdd
[ #569 ] writecache: Do not wait modeMtx if mode changes
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-29 16:05:37 +03:00
32c282ca10
[ #569 ] writecache: Refactor flush
...
Make single RUnlock call instead of two.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-29 16:05:37 +03:00
0cb0fc1735
[ #569 ] writecache: Allow to seal writecache after flush
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-29 16:05:37 +03:00
8180a0664f
[ #887 ] node: Drop badger writecache implementation
...
Badger implementation isn't tested and works not well,
but requires human resources to maintain.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-22 13:00:54 +03:00
d9cbb16bd3
[ #866 ] Use TTL for blobovnicza tree cache
...
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2023-12-19 16:36:28 +00:00
7eb46404a1
[ #863 ] blobovnicza: Fix counters
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-13 13:34:29 +03:00
94ffe8bb45
[ #857 ] golangci: Add testifylint linter
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-12 16:27:02 +03:00
3b7c0362a8
[ #861 ] shard: Fix Delete object
...
It is possible that object doesn't exist in metabase.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-12 14:25:40 +03:00
681b2c5fd4
[ #825 ] policer: Do not drop required linking objects
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-12 11:04:03 +00:00
db49ad16cc
[ #826 ] blobovniczatree: Do not create DB's on init
...
Blobovniczas will be created on write requests.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-07 15:37:33 +03:00
ad0697adc4
[ #661 ] blobovnicza: Compute size with record size
...
To get more accurate size of blobovnicza use record
size (lenght of key + lenght of data).
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-07 15:37:33 +03:00
e54dc3dc7c
[ #698 ] blobovnicza: Store counter values
...
Blobovnicza initialization take a long time because of bucket
Stat() call. So now blobovnicza stores counters in META bucket.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-07 15:37:33 +03:00
5e8c08da3e
[ #661 ] blobstore: Add address to error logs
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-07 15:37:33 +03:00
8911656b1a
[ #661 ] metrcis: Add rebuild percent metric
...
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-07 15:37:33 +03:00
2407e5f5ff
[ #661 ] blobovniczatree: Do not sort DB's and indicies
...
Put stores object to next active DB, so there is no need to sort DBs.
In addition, it adds unnecessary DB openings.
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2023-12-07 15:37:33 +03:00