Commit graph

1163 commits

Author SHA1 Message Date
27caa8a72f [#1256] metabase: Put split parent object ID for EC chunks
It is required to save split parent ID too, not only split ID.
Otherwise inhume operation works incorrect: shard with last part may be skipped
and parent object will be available.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-07-17 18:08:07 +03:00
3940bc17c1 [#1251] pilorama: Allow traversing multiple branches in parallel
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-07-17 11:25:07 +03:00
3bf6e6dde6 [#1238] engine: Add non-blocking send in the shard's notification channel
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-07-10 11:37:11 +03:00
b027a7f91e [#1234] pilorama: Fix GetByPath() on duplicate directories
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-07-10 06:11:38 +00:00
6ace2f597e [#1234] pilorama: Add test for duplicate directory behaviour
When AddByPath() is called concurrently on 2 different nodes,
internal path components may be created twice. This violates some
of our assumptions in GetByPath() and, indirectly, in S3 handling of
GetSubTree() results.

Add a test for the correct behaviour, fixes will follow.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-07-10 06:11:38 +00:00
62cbb72a5e [#1226] blobovniczatree: Delete fix db extensions in Init()
Since several releases have been released, this code is not relevant.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-07-04 12:22:06 +03:00
78b1d9b18d [#1226] blobovniczatree: Drop leaf width limitation
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-07-04 12:22:06 +03:00
40c9ddb6ba [#1226] blobovniczatree: Drop init in advance option
To make blobovniczatree unlimited.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-07-04 12:22:06 +03:00
3a797e4682 [#1222] engine: Fix tree evacuation
Do not fail if it is unable to evacuate tree to other node.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-07-04 10:38:10 +03:00
2bac82cd6f [#1222] engine: Fix object evacuation
Do not fail evacuation if it unable to evacuate object to other node.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-07-04 10:38:10 +03:00
bbe95dac8b [#1225] engine: Log the error when check object existence
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-07-04 07:22:54 +00:00
56eeb630b6 [#1217] Fix grammar mistakes and misspelling
Signed-off-by: Ekaterina Lebedeva <ekaterina.lebedeva@yadro.com>
2024-07-01 19:14:25 +03:00
7a8ac4907a [#1213] engine: Drop unused
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-07-01 06:49:35 +00:00
7085723c6b [#1074] pilorama: Allow empty filenames in SortedByFilename()
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-06-28 17:46:24 +03:00
4c7ff159ec [#1201] writecache: Cancel background flush without lock
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-06-28 09:02:36 +03:00
4951babd5f [#1208] blobstor: Fix delete without storage id
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-06-27 11:59:08 +03:00
81ea91de52 [#451] metrics: Move to internal
`metrics` don't look like something others want to import.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-06-25 08:52:37 +00:00
9ac74efc41 [#1173] shard: Use mode from config on reload
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-06-20 11:29:10 +00:00
75eedf71f3 [#1187] pilorama/test: Remove debug print
Introduced in e12fcc041d.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-06-18 15:09:44 +03:00
76cf7a051b [#1178] shard: Check metabase existence before read shard ID
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-06-17 09:59:15 +03:00
68ac490729 [#1174] shard: Update metric mode_info on Init
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-06-13 08:32:59 +00:00
6a39c3d15e [#1086] engine: Do not use metabase if shard looks bad
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-06-13 07:35:22 +00:00
9d73f9c2c6 Reapply "[#446] engine: Move to read-only on blobstor errors"
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-06-13 07:35:22 +00:00
b9fcaad21f [#1168] shard: Set Disabled as default mode for components
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-06-11 15:13:38 +00:00
6cf512e574 [#1166] blobovniczatree: Handle blobovnicza's NoSpaceLeft error
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-06-07 17:15:43 +03:00
e7d479f4c2 [#1166] blobovnicza: Return NoSpaceLeft error instead of syscall.ENOSPC
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-06-07 17:15:43 +03:00
3f1961157e [#1163] metabase: Handle multiple splitInfos for EC
For REP updating split info is handled explicitly by a high-level PUT logic.
For EC it is trickier, because the address of an object we put is only
distantly related to a split info.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-06-06 16:26:29 +03:00
2e074d3846 [#1163] metabase: Properly save EC parent split ID
Search by SplitID should return all parts of a complex object.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-06-05 12:40:16 +03:00
806236da78 [#1121] node: Change mode of shard components
Signed-off-by: Alexander Chuprov <a.chuprov@yadro.com>
2024-06-05 05:55:24 +00:00
6f2187a420 [#1121] node: Refactor mods of shard
Signed-off-by: Alexander Chuprov <a.chuprov@yadro.com>
2024-06-05 05:55:24 +00:00
cc2449beaf [#1158] metabase: Fix EC storage schema
Do not store EC info twice.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-06-04 17:24:40 +03:00
5aacb8fc86 [#1144] metabase: Save parent attributes for ec-chunks
Signed-off-by: Airat Arifullin <a.arifullin@yadro.com>
2024-05-31 19:55:32 +03:00
f8e33f8e3a [#1144] metabase: Proprely choose root OID for EC-splitted objects
* If EC-parent is a part of Split itself, then save to root bucket
  its parent;
* If EC-parent is not a part of Split itself, then save to root bucket
  OID of this EC-parent.

Signed-off-by: Airat Arifullin <a.arifullin@yadro.com>
2024-05-31 19:53:32 +03:00
f0edebea18 [#1144] metabase: Support ec parent filter for Search
Signed-off-by: Airat Arifullin <a.arifullin@yadro.com>
2024-05-31 19:53:32 +03:00
92e19feb57 [#1147] node: Use public fields for shard.ExistsPrm
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-05-30 08:13:04 +00:00
6130650bb6 [#1147] node: Implement Lock\Delete requests for EC object
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-05-30 08:13:04 +00:00
a82c8cc5b8 [#1147] gc: Execute callback for expired tombstones when they exists
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-05-30 08:13:04 +00:00
40b04c00ef [#1141] metabase: Fix IsUserObject method
Signed-off-by: Airat Arifullin <a.arifullin@yadro.com>
2024-05-20 13:24:15 +03:00
89a80e9a0f [#1141] metabase: Fix putUniqueIndexItem
* `GetECHeader` is not correct way to determine if an object's got
  EC-header: `ECHeader` must be used for that.

Signed-off-by: Airat Arifullin <a.arifullin@yadro.com>
2024-05-20 13:24:08 +03:00
d45d086acd [#1129] policer: Add EC chunk replication
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-05-16 16:28:48 +03:00
5c582e96fd [#1136] metabase: Fix creation of ECInfoError
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-05-15 11:04:27 +00:00
6e71ae3bda [#1130] fstree: Remove useless Stat() call
```
goos: linux
goarch: amd64
pkg: git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/blobstor
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
                                           │     old     │                new                 │
                                           │   sec/op    │   sec/op     vs base               │
SubstorageReadPerf/fstree_nosync-seq100-8    2.689µ ± 2%   2.428µ ± 4%  -9.72% (p=0.000 n=10)
SubstorageReadPerf/fstree_nosync-rand100-8   2.727µ ± 1%   2.497µ ± 2%  -8.42% (p=0.000 n=10)
geomean                                      2.708µ        2.462µ       -9.07%
```

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-05-14 16:05:45 +03:00
00b2b77b26 [#1112] node: Implement Range\RangeHash requests for EC object
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-05-07 14:47:21 +03:00
e07869a8cf [#1100] Remove unused fields
Signed-off-by: Ekaterina Lebedeva <ekaterina.lebedeva@yadro.com>
2024-05-06 10:14:36 +03:00
c9efaa5819 [#966] node: Add path of the write_cache to metric labels
Signed-off-by: Alexander Chuprov <a.chuprov@yadro.com>
2024-05-02 06:46:46 +00:00
4730ecfdb8 [#966] node: Refactor WriteCacheMetrics interface
Grouping common fields of methods will enhance the readability of the interface.

Signed-off-by: Alexander Chuprov <a.chuprov@yadro.com>
2024-05-02 06:46:46 +00:00
411a8d0245 [#1004] blobovnicza: Use TTL for blobovnicza tree cache
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-04-26 19:54:29 +03:00
112a7c690f [#1103] node: Implement Get\Head requests for EC object
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-04-24 18:15:53 +03:00
167c52a1a9 [#1103] node: Reduce amount of lines for method StorageEngine.head
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-04-24 16:31:04 +03:00
e5e0542482 [#1085] log: Move storage log message to constants package
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-04-15 07:57:00 +00:00
5be36924e3 [#41] log: Log storage operations in only in Debug
They are mostly useless unless we need to _debug_ a specific issue.
The amount of logs we produce is too big.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-04-15 07:57:00 +00:00
40781b3a20 [#1086] engine: Change mode in case of errors async
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-04-10 12:29:43 +00:00
5ef5734c4e Reapply "[#972] Drop x/exp/slices dependency"
This reverts commit 946f2ec2bf.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-04-10 12:09:34 +00:00
2c4b50a71e Reapply "[#972] Use require.ElementsMatch() where possible"
This reverts commit 7627d08914.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-04-10 12:09:34 +00:00
3dc81cb4fc Reapply "[#972] Use min/max builtins"
This reverts commit dad56d2e98.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-04-10 12:09:34 +00:00
d864945961 Reapply "[#972] pilorama: Remove removeDuplicatesInPlace()"
This reverts commit 9f68305c2e.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-04-10 12:09:34 +00:00
1b17258c04 [#1029] metabase: Add refill metrics
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-04-10 13:05:44 +03:00
e3d9dd6ee8 [#1024] blobovnicza: Copy data on iterate
DB value is only valid while the tx is alive.
But handler may to run something in other goroutine.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-04-10 10:21:11 +03:00
57466594fb [#1024] shard: Resync metabase concurrently
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-04-10 10:21:10 +03:00
1005bf4f56 [#1024] shard: Add refill metabase benchmark
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-04-10 10:21:10 +03:00
76398c06b0 [#1080] metabase: Add StorageID metric
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-04-10 10:00:08 +03:00
7b1adfed3e [#1080] metabase: Open bucket for container counter once
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-04-10 10:00:08 +03:00
5b8200de88 [#984] blobovnicza: Do not fail rebuild on big objects
If blobovnicza contains objects larger than object size parameter
value, then rebuild fails with an error, because there is no such
bucket in database. This commit forces to create bucket on rebuild.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-04-09 11:51:18 +00:00
d614f04a0a [#1072] Fix gofumpt issues
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-04-03 22:21:14 +03:00
17af91619a [#1070] pilorama: Fix cycling behaviour for sorted listing
In case there are no items left, return empty slice.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-04-02 14:41:31 +00:00
d5194ab2a6 [#949] metabase: fix shard.UpdateID()
metabase.Open() now reports metabase mode metric. shard.UpdateID()
needs to read shard ID from metabase => needs to open metabase.
It caused reporting 'shard undefined' metrics. To avoid reporting
wrong metrics metabase.GetShardID() was added which also opens
metabase and does not report metrics.

Signed-off-by: Ekaterina Lebedeva <ekaterina.lebedeva@yadro.com>
2024-04-01 17:27:34 +03:00
81a0346a96 [#949] metabase: fix metabase mode metric
It used to always show CLOSED regardless of actual mode.
Now metric represents actual metabase mode of operations.

Signed-off-by: Ekaterina Lebedeva <ekaterina.lebedeva@yadro.com>
2024-04-01 17:27:34 +03:00
e12fcc041d [#1059] services/tree: Fast sorted listing
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-04-01 12:37:34 +00:00
f23e38c285 Revert "[#446] engine: Move to read-only on blobstor errors"
This reverts commit 69df0d21c2.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-04-01 12:48:30 +03:00
942d83611b [#874] engine: Revert Check object existance concurrently
This reverts commit f526f49995.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-04-01 08:42:34 +00:00
0990a9b0bd [#1055] blobstor: fix mode metric
It used to always show CLOSED after setting shard mode
to read-only regardless of actual mode.
Now metric represents actual blobstor mode of operations.

Signed-off-by: Ekaterina Lebedeva <ekaterina.lebedeva@yadro.com>
2024-03-29 20:44:47 +00:00
c09c701613 [#1048] metabase: Fix drop buckets during resync
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-03-19 14:28:31 +03:00
31e2396a5f [#1043] control: Add ResetEvacuationStatus implementation
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-03-13 10:29:45 +00:00
3195142d67 [#959] writecache: Avoid manipulation with cache in DEGRADED mode
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-03-11 18:35:41 +00:00
d433b49265 [#973] node: Resolve perfsprint linter
`fmt.Errorf can be replaced with errors.New` and `fmt.Sprintf can be replaced with string addition`

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-03-11 17:55:50 +03:00
66a26b7775 [#973] node: Resolve revive: unused-parameter linter
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-03-11 17:11:49 +03:00
0882840bf5 [#634] shard: Add writecache inhume tests
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-03-06 13:12:34 +03:00
ae5bb87e70 Revert "[#866] Use TTL for blobovnicza tree cache"
This reverts commit d9cbb16bd3.

Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-03-01 19:29:33 +03:00
d6534fd755 [#1016] frostfs-node: Fix gopls issues
Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-03-01 12:13:43 +03:00
918613546f [#1008] metabase: Do not update storageID on put
There may be a race condition between put an object and
flushing the writecache:
1. Put object to the writecache
2. Writecache flushes object to the blobstore and sets blobstore's
storageID
3. Put object to the metabase, set writecache's storageID

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-28 11:01:50 +03:00
2ad433dbcb [#1005] engine: Drop shards weights
Unused.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-26 17:25:05 +03:00
7470c383dd [#997] metabase: Drop toMoveIt bucket
It is not used.

Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>
2024-02-21 10:06:05 +03:00
adf7ebab5b [#996] metabase: Speed up bucket creation
Most of the time it exits, e.g. when it is per-container and use on each
object PUT. Bbolt implementation first tries to create bucket and then
returns it if it exists. Create operation uses cursor and thus is not
very lightweight, we can avoid it.

```
goos: linux
goarch: amd64
pkg: git.frostfs.info/TrueCloudLab/frostfs-node/pkg/local_object_storage/metabase
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
                 │     old     │                new                 │
                 │   sec/op    │   sec/op     vs base               │
Put/parallel-8     174.4µ ± 3%   163.3µ ± 3%  -6.39% (p=0.000 n=10)
Put/sequential-8   263.3µ ± 2%   259.0µ ± 1%  -1.64% (p=0.000 n=10)
geomean            214.3µ        205.6µ       -4.05%

                 │     old      │                 new                 │
                 │     B/op     │     B/op      vs base               │
Put/parallel-8     275.3Ki ± 3%   281.1Ki ± 4%       ~ (p=0.063 n=10)
Put/sequential-8   413.0Ki ± 2%   426.6Ki ± 2%  +3.29% (p=0.003 n=10)
geomean            337.2Ki        346.3Ki       +2.70%

                 │     old     │                 new                 │
                 │  allocs/op  │  allocs/op   vs base                │
Put/parallel-8      678.0 ± 1%    524.5 ± 2%  -22.64% (p=0.000 n=10)
Put/sequential-8   1.329k ± 0%   1.183k ± 0%  -10.91% (p=0.000 n=10)
geomean             949.1         787.9       -16.98%
```

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-20 15:42:58 +00:00
9f68305c2e Revert "[#972] pilorama: Remove removeDuplicatesInPlace()"
This reverts commit 45fd4e4ff1.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-19 15:36:01 +00:00
dad56d2e98 Revert "[#972] Use min/max builtins"
This reverts commit 89784b2e0a.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-19 15:36:01 +00:00
7627d08914 Revert "[#972] Use require.ElementsMatch() where possible"
This reverts commit 6d9707ff1f.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-19 15:36:01 +00:00
946f2ec2bf Revert "[#972] Drop x/exp/slices dependency"
This reverts commit f3e50772fd.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-19 15:36:01 +00:00
f3e50772fd [#972] Drop x/exp/slices dependency
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-19 13:13:09 +00:00
6d9707ff1f [#972] Use require.ElementsMatch() where possible
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-19 13:13:09 +00:00
89784b2e0a [#972] Use min/max builtins
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-19 13:13:09 +00:00
45fd4e4ff1 [#972] pilorama: Remove removeDuplicatesInPlace()
Also, check that slices.CompareFunc() indeed passes all the tests before
removal.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-19 13:13:09 +00:00
05b5f5ca85 [#959] writecache: Fix panic on Get when it is not initialized
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-02-14 16:10:33 +03:00
2429508ac5 [#959] shard: Skip rebuild in DEGRADED mode
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-02-14 15:39:28 +03:00
0bd030507e [#948] metrics: Set actual value for shard_id after restart
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-02-13 09:43:21 +03:00
6a5769d1da [#948] Fix gofumpt issue
Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>
2024-02-13 09:40:46 +03:00
b36a453238 [#970] fstree: Add build tag to enable generic version on linux
Unless tested, generic version can start gaining bugs. With a separate
build tag we can have the best of both worlds:
1. Use optimized implementation for linux by default.
2. Run tests or benchmarks for both. Note that they are not actually
   run automatically now, but this is at leas possible.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00
abd502215f [#970] fstree: Move file locking to the generic writer
It is not a part of FSTree itself, but rather a way to solve concurrent
counter update on non-linux implementations. New linux implementations
is pretty simple: link fails when the file exists, unlink fails when the
file doesn't exist.

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00
fb74524ac7 [#970] fstree: Move delete implementation to a separate file
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00
7f692409cf [#970] fstree: Handle unsupported O_TMPFILE
Metabase test relied on this behaviour, so fix the test too.

Cherry-picking was hard and did too many conflicts,
here is an original PR:
https://github.com/nspcc-dev/neofs-node/pull/2624

Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00
Roman Khimov
fc31b9c947 [#970] fstree: Add linux-specific file writer using O_TMPFILE
O_TMPFILE is implemented for all modern FSes and it's much easier and safer to
use. If application crashes in the middle of writing this file would be gone
and won't leave any garbage.

Notice that this implementation makes a different choice wrt EEXIST handling,
generic one always overwrites, while this one keeps the old data.

There is no real performance difference.

SSD (slow&old), XFS, Core i7-8565U:

Sync
```
name                                  old time/op    new time/op    delta
Put/size=1024,thread=1/fstree-8         1.74ms ± 3%    0.06ms ± 7%  -96.31%  (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8        10.0ms ±41%     1.1ms ±18%  -88.95%  (p=0.000 n=9+10)
Put/size=1024,thread=100/fstree-8       32.3ms ±60%     6.5ms ±14%  -79.97%  (p=0.000 n=10+10)
Put/size=1048576,thread=1/fstree-8      17.8ms ±90%     3.4ms ±70%  -81.08%  (p=0.000 n=10+10)
Put/size=1048576,thread=20/fstree-8     103ms ±174%    112ms ±158%     ~     (p=0.971 n=10+10)
Put/size=1048576,thread=100/fstree-8     949ms ±78%    583ms ±132%     ~     (p=0.089 n=10+10)

name                                  old alloc/op   new alloc/op   delta
Put/size=1024,thread=1/fstree-8         3.17kB ± 1%    1.96kB ± 0%  -38.09%  (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8        59.6kB ± 1%    39.2kB ± 1%  -34.30%  (p=0.000 n=8+10)
Put/size=1024,thread=100/fstree-8        299kB ± 0%     198kB ± 0%  -33.90%  (p=0.000 n=7+9)
Put/size=1048576,thread=1/fstree-8      3.38kB ± 1%    2.36kB ± 1%  -30.22%  (p=0.000 n=10+10)
Put/size=1048576,thread=20/fstree-8     65.7kB ± 4%    47.7kB ± 6%  -27.27%  (p=0.000 n=10+10)
Put/size=1048576,thread=100/fstree-8     351kB ± 8%     245kB ± 8%  -30.22%  (p=0.000 n=10+10)

name                                  old allocs/op  new allocs/op  delta
Put/size=1024,thread=1/fstree-8           30.3 ± 2%      21.0 ± 0%  -30.69%  (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8           554 ± 1%       413 ± 0%  -25.35%  (p=0.000 n=8+10)
Put/size=1024,thread=100/fstree-8        2.77k ± 0%     2.07k ± 0%  -25.27%  (p=0.000 n=7+10)
Put/size=1048576,thread=1/fstree-8        32.0 ± 0%      25.0 ± 0%  -21.88%  (p=0.000 n=9+8)
Put/size=1048576,thread=20/fstree-8        609 ± 5%       494 ± 6%  -18.93%  (p=0.000 n=10+10)
Put/size=1048576,thread=100/fstree-8     3.25k ± 9%     2.50k ± 8%  -23.21%  (p=0.000 n=10+10)
```

No sync
```
name                                  old time/op    new time/op    delta
Put/size=1024,thread=1/fstree-8         71.3µs ±10%    59.8µs ±10%  -16.21%  (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8        1.43ms ± 6%    1.22ms ±13%  -14.53%  (p=0.000 n=10+10)
Put/size=1024,thread=100/fstree-8       8.12ms ± 3%    6.36ms ± 2%  -21.67%  (p=0.000 n=8+9)
Put/size=1048576,thread=1/fstree-8      1.88ms ±70%    1.61ms ±78%     ~     (p=0.393 n=10+10)
Put/size=1048576,thread=20/fstree-8     32.7ms ±28%   34.2ms ±112%     ~     (p=0.968 n=9+10)
Put/size=1048576,thread=100/fstree-8     262ms ±56%     226ms ±34%     ~     (p=0.447 n=10+9)

name                                  old alloc/op   new alloc/op   delta
Put/size=1024,thread=1/fstree-8         2.89kB ± 0%    1.96kB ± 0%  -32.28%  (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8        58.2kB ± 0%    39.5kB ± 0%  -32.09%  (p=0.000 n=8+8)
Put/size=1024,thread=100/fstree-8        291kB ± 0%     198kB ± 0%  -32.19%  (p=0.000 n=9+9)
Put/size=1048576,thread=1/fstree-8      3.05kB ± 1%    2.13kB ± 1%  -30.16%  (p=0.000 n=10+9)
Put/size=1048576,thread=20/fstree-8     62.6kB ± 0%    44.3kB ± 0%  -29.23%  (p=0.000 n=9+9)
Put/size=1048576,thread=100/fstree-8     302kB ± 0%     210kB ± 1%  -30.39%  (p=0.000 n=9+9)

name                                  old allocs/op  new allocs/op  delta
Put/size=1024,thread=1/fstree-8           27.0 ± 0%      21.0 ± 0%  -22.22%  (p=0.000 n=10+10)
Put/size=1024,thread=20/fstree-8           539 ± 0%       415 ± 0%  -22.98%  (p=0.000 n=10+10)
Put/size=1024,thread=100/fstree-8        2.69k ± 0%     2.07k ± 0%  -23.09%  (p=0.000 n=9+9)
Put/size=1048576,thread=1/fstree-8        28.0 ± 0%      22.3 ± 3%  -20.36%  (p=0.000 n=8+10)
Put/size=1048576,thread=20/fstree-8        577 ± 0%       458 ± 0%  -20.72%  (p=0.000 n=9+9)
Put/size=1048576,thread=100/fstree-8     2.76k ± 0%     2.15k ± 0%  -22.05%  (p=0.000 n=9+8)
```

HDD (LVM), ext4, Ryzen 5 1600:

Sync
```
                                      │ fs.sync-generic │            fs.sync-linux            │
                                      │     sec/op      │    sec/op     vs base               │
Put/size=1024,thread=1/fstree-12           34.70m ± 19%   33.59m ± 16%       ~ (p=0.529 n=10)
Put/size=1024,thread=20/fstree-12          188.8m ±  8%   189.2m ± 16%       ~ (p=0.739 n=10)
Put/size=1024,thread=100/fstree-12         264.8m ± 22%   273.6m ± 28%       ~ (p=0.353 n=10)
Put/size=1048576,thread=1/fstree-12        54.90m ± 14%   47.08m ± 18%       ~ (p=0.063 n=10)
Put/size=1048576,thread=20/fstree-12       244.1m ± 14%   220.4m ± 22%       ~ (p=0.579 n=10)
Put/size=1048576,thread=100/fstree-12      847.2m ±  5%   893.6m ±  3%  +5.48% (p=0.000 n=10)
geomean                                    164.3m         158.9m        -3.29%

                                      │ fs.sync-generic │            fs.sync-linux             │
                                      │      B/op       │     B/op      vs base                │
Put/size=1024,thread=1/fstree-12           3.375Ki ± 1%   2.471Ki ± 1%  -26.80% (p=0.000 n=10)
Put/size=1024,thread=20/fstree-12          66.62Ki ± 6%   49.21Ki ± 6%  -26.15% (p=0.000 n=10)
Put/size=1024,thread=100/fstree-12         319.2Ki ± 1%   230.9Ki ± 2%  -27.64% (p=0.000 n=10)
Put/size=1048576,thread=1/fstree-12        3.457Ki ± 1%   2.559Ki ± 1%  -25.97% (p=0.000 n=10)
Put/size=1048576,thread=20/fstree-12       66.91Ki ± 1%   49.16Ki ± 1%  -26.52% (p=0.000 n=10)
Put/size=1048576,thread=100/fstree-12      338.8Ki ± 2%   252.3Ki ± 3%  -25.54% (p=0.000 n=10)
geomean                                    42.17Ki        31.02Ki       -26.44%

                                      │ fs.sync-generic │            fs.sync-linux            │
                                      │    allocs/op    │  allocs/op   vs base                │
Put/size=1024,thread=1/fstree-12             33.00 ± 0%    27.00 ± 0%  -18.18% (p=0.000 n=10)
Put/size=1024,thread=20/fstree-12            639.5 ± 1%    519.0 ± 2%  -18.84% (p=0.000 n=10)
Put/size=1024,thread=100/fstree-12          3.059k ± 1%   2.478k ± 2%  -18.99% (p=0.000 n=10)
Put/size=1048576,thread=1/fstree-12          33.50 ± 1%    28.00 ± 4%  -16.42% (p=0.000 n=10)
Put/size=1048576,thread=20/fstree-12         638.5 ± 1%    520.0 ± 1%  -18.56% (p=0.000 n=10)
Put/size=1048576,thread=100/fstree-12       3.209k ± 2%   2.655k ± 2%  -17.28% (p=0.000 n=10)
geomean                                      405.3         332.1       -18.05%
```

No sync
```
                                      │ fs.nosync-generic │             fs.nosync-linux              │
                                      │      sec/op       │    sec/op     vs base                    │
Put/size=1024,thread=1/fstree-12           148.2µ ± 20%     136.6µ ± 19%   -7.89% (p=0.029 n=10)
Put/size=1024,thread=20/fstree-12          1.140m ± 26%     1.364m ± 16%        ~ (p=0.143 n=10)
Put/size=1024,thread=100/fstree-12         11.93m ± 68%     26.89m ± 62%        ~ (p=0.123 n=10)
Put/size=1048576,thread=1/fstree-12        1.302m ±  3%     1.287m ±  5%        ~ (p=0.481 n=10)
Put/size=1048576,thread=20/fstree-12       77.52m ±  8%     74.07m ±  7%        ~ (p=0.278 n=10+9)
Put/size=1048576,thread=100/fstree-12      226.1m ±   ∞ ¹
geomean                                    5.986m           3.434m        +18.60%                  ²
¹ need >= 6 samples for confidence interval at level 0.95
² benchmark set differs from baseline; geomeans may not be comparable

                                      │ fs.nosync-generic │             fs.nosync-linux              │
                                      │       B/op        │     B/op      vs base                    │
Put/size=1024,thread=1/fstree-12           2.879Ki ± 0%     1.972Ki ± 0%  -31.51% (p=0.000 n=10)
Put/size=1024,thread=20/fstree-12          55.94Ki ± 1%     37.90Ki ± 1%  -32.25% (p=0.000 n=10)
Put/size=1024,thread=100/fstree-12         272.6Ki ± 0%     182.1Ki ± 9%  -33.21% (p=0.000 n=10)
Put/size=1048576,thread=1/fstree-12        3.158Ki ± 0%     2.259Ki ± 0%  -28.46% (p=0.000 n=10)
Put/size=1048576,thread=20/fstree-12       58.87Ki ± 0%     41.03Ki ± 0%  -30.30% (p=0.000 n=10+9)
Put/size=1048576,thread=100/fstree-12      299.8Ki ±  ∞ ¹
geomean                                    36.71Ki          16.60Ki       -31.17%                  ²
¹ need >= 6 samples for confidence interval at level 0.95
² benchmark set differs from baseline; geomeans may not be comparable

                                      │ fs.nosync-generic │            fs.nosync-linux            │
                                      │     allocs/op     │  allocs/op   vs base                  │
Put/size=1024,thread=1/fstree-12             28.00 ± 0%      22.00 ± 0%  -21.43% (p=0.000 n=10)
Put/size=1024,thread=20/fstree-12            530.0 ± 0%      407.5 ± 1%  -23.11% (p=0.000 n=10)
Put/size=1024,thread=100/fstree-12          2.567k ± 0%     1.956k ± 9%  -23.77% (p=0.000 n=10)
Put/size=1048576,thread=1/fstree-12          30.00 ± 0%      24.00 ± 0%  -20.00% (p=0.000 n=10)
Put/size=1048576,thread=20/fstree-12         553.5 ± 0%      434.0 ± 0%  -21.59% (n=10+9)
Put/size=1048576,thread=100/fstree-12       2.803k ±  ∞ ¹
geomean                                      347.9           178.8       -21.99%                ²
¹ need >= 6 samples for confidence interval at level 0.95
² benchmark set differs from baseline; geomeans may not be comparable
```

Signed-off-by: Roman Khimov <roman@nspcc.ru>
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2024-02-09 16:12:11 +00:00