frostfs-node

Author	SHA1	Message	Date
Dmitrii Stepanov	2541d319de	[#266 ] pilorama: Allow to get current tree height Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2023-06-13 10:00:45 +00:00
Dmitrii Stepanov	74578052f9	[#412 ] node: Replace tracing package Use observability module. Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2023-06-01 13:23:11 +00:00
Dmitrii Stepanov	6121b541b5	[#242 ] treesvc: Add tracing spans Signed-off-by: Dmitrii Stepanov <d.stepanov@yadro.com>	2023-04-14 10:25:53 +00:00
Alejandro Lopez	341fe1688f	[#139 ] test: Add test storage implementation This aims to reduce the usage of chmod hackery to induce or simulate OS-related failures. Signed-off-by: Alejandro Lopez <a.lopez@yadro.com>	2023-03-29 14:28:49 +00:00
Evgenii Stratonikov	47e8c5bf23	[#156 ] pilorama: Remove CIDDescriptor from TreeApply() Initially it was there to check whether an update is being initiated by a proper node. It is now obsolete for 2 reasons: 1. Background synchronization fetches all operations from a single node. 2. There are a lot more problems with trust in the tree service, it is only used in controlled environments. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-03-22 07:14:18 +00:00
Evgenii Stratonikov	3e6fd4c611	[#82 ] pilorama: Allow to store last sync height Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-03-13 11:25:44 +00:00
Alex Vanin	20de74a505	Rename package name Due to source code relocation from GitHub. Signed-off-by: Alex Vanin <a.vanin@yadro.com>	2023-03-07 16:38:26 +03:00
Evgenii Stratonikov	58367e4df6	[#2232 ] pilorama: Merge in-queue batches To achieve high performance we must choose proper values for both batch size and delay. For user operations we want to set low delay. However it would prevent tree synchronization operations to form big enough batches. For these operations, batching gives the most benefit not only in terms of on-CPU execution cost, but also by speeding up transaction persist (`fsync`). In this commit we try merging batches that are already _triggered_, but not yet _started to execute_. This way we can still query batches for execution after the provided delay while also allowing multiple formed batches to execute faster. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-20 13:53:27 +03:00
Pavel Karpy	73bc1b0b68	[#38 ] node: Fix linter warnings Signed-off-by: Pavel Karpy <p.karpy@yadro.com>	2023-02-06 17:27:54 +03:00
Evgenii Stratonikov	d65a95a2c6	[#28 ] pilorama: Remove `LogMove` struct Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-01-25 15:31:47 +03:00
Evgenii Stratonikov	25d5995cef	[#2210 ] pilorama: Allocate bucket name outside of batches 1. Reduce allocations inside transactions. 2. Do not encode container ID to string: it allocates a lot and takes more space. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-01-25 15:31:47 +03:00
Evgenii Stratonikov	165a600624	[#2210 ] pilorama: Reduce the amount of keys per node Under high load we are limited by the _amount_ of keys we need to update in a single transaction. In this commit we try storing all state with a single key. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-01-25 15:31:47 +03:00
Evgenii Stratonikov	ac81c70c09	[#1621 ] pilorama: Batch related operations Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru> Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru> Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-01-25 15:31:47 +03:00
Evgenii Stratonikov	cedbd380f2	[#2197 ] pilorama: Close database in degraded mode Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-01-25 15:31:47 +03:00
Evgenii Stratonikov	b0ad1b9ed2	[#2193 ] pilorama: Use `do` in `TreeMove` It should be similar to a `TreeAddByPath`. `applyOperation` is used for `Apply` when the operation can be inserted in the middle of a log. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-01-25 15:31:47 +03:00
Evgenii Stratonikov	b4e90cdf51	[#2165 ] pilorama: Optimize `TreeApply` when used for synchronization Because synchronization _most likely_ will have apply already existing operations, it is much faster to check their presence in a read transaction. However, always doing this will degrade the perfomance for normal `Apply`. And, let's be honest, it is already not good. Thus we add a separate parameter which specifies whether this logic is enabled. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2022-12-30 11:07:35 +03:00
Evgenii Stratonikov	1044adbe94	[#1621 ] pilorama: Improve memory allocation Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru> Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-12-30 11:07:35 +03:00
Evgenii Stratonikov	2539d466a6	[#1621 ] pilorama: Seek after cursor invalidation Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru> Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-12-30 11:07:35 +03:00
Evgenii Stratonikov	e9ba8931f8	[#1621 ] pilorama: Simplify bucket creation Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru> Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-12-30 11:07:35 +03:00
Evgenii Stratonikov	e5c304536b	[#2161 ] pilorama: Do not apply already existing operations Speeds up synchronization a bit. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2022-12-30 11:07:35 +03:00
Pavel Karpy	923f84722a	Move to frostfs-node Signed-off-by: Pavel Karpy <p.karpy@yadro.com>	2022-12-28 15:04:29 +03:00
Anton Nikiforov	9a20498f34	[#1940 ] Removing all trees by container ID if tree ID is empty in `pilorama.Forest.TreeDrop` Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>	2022-11-19 11:01:04 +03:00
Evgenii Stratonikov	a3e7365cbd	[#1732 ] pilorama: Fill parent mark correctly Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-11-19 11:01:04 +03:00
Evgenii Stratonikov	134f2ba02e	[#1732 ] pilorama: Fix backwards log insertion Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-11-19 11:01:04 +03:00
Evgenii Stratonikov	d8d3588e1b	[#1996 ] engine: Always select proper shard for a tree Currently there is a possibility for modifying operations to fail because of I/O errors and a new tree to be created on another shard. This commit adds existence check for modifying operations. Read operations remain as they are, not to slow things. `TreeDrop` is an exception, because this is a tree removal and trying multiple shards is not an unwanted behaviour. Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-11-03 15:29:23 +03:00
Pavel Karpy	19850ef157	[#1902 ] pilorama: Add `TreeList` method To both `bolt` and `memory` forests; extend `Forest` interface. Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>	2022-10-20 16:17:57 +03:00
Evgenii Stratonikov	d772e35aba	[#1910 ] .golangci.yml: Add `godot` linker Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-10-18 15:08:26 +03:00
Evgenii Stratonikov	a2bb3a2a96	[#1630 ] pilorama: Support dropping trees Signed-off-by: Evgenii Stratonikov <evgeniy@morphbits.ru>	2022-09-12 09:54:15 +03:00
Evgenii Stratonikov	3df62769c0	[#1559 ] local_object_storage: Allow to set mode for all components Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 17:56:06 +03:00
Evgenii Stratonikov	1e786233bf	[#1559 ] local_object_storage: Provide readOnly flag to `Open` We should be able to reopen storage in readonly in runtime. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 17:56:06 +03:00
Evgenii Stratonikov	d62723f038	[#1505 ] pilorama: Provide timeout to `bbolt.Open` Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	26041f18bf	[#1505 ] pilorama: Allow to customize database parameters Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	735931c842	[#1481 ] pilorama: Fix `TreeApply` Current implementation prevents invalid operations to become valid at some later point (consider adding a child to the non-existent parent and then adding the parent). This seems to diverge from the paper algorithm and complicates implementation. Make it simpler. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	ad3038d16d	[#1444 ] pilorama: Fix `TreeMove` in bbolt backend Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	8027b7bb6b	[#1444 ] pilorama: Optimize internal encoding/decoding ``` name old time/op new time/op delta ApplySequential/bbolt-8 55.5µs ± 4% 55.5µs ± 3% ~ (p=1.000 n=10+7) ApplyReorderLast/bbolt-8 108µs ± 6% 112µs ± 8% ~ (p=0.077 n=9+9) name old alloc/op new alloc/op delta ApplySequential/bbolt-8 28.8kB ± 3% 27.7kB ± 6% -3.79% (p=0.005 n=10+10) ApplyReorderLast/bbolt-8 41.4kB ± 5% 38.9kB ± 5% -6.19% (p=0.001 n=10+9) name old allocs/op new allocs/op delta ApplySequential/bbolt-8 262 ± 2% 235 ±10% -10.41% (p=0.000 n=10+10) ApplyReorderLast/bbolt-8 684 ± 6% 616 ± 7% -10.04% (p=0.000 n=10+9) ``` Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	4437cd7113	[#1442 ] pilorama: Generate timestamp based on node position in the container Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	3312924b82	[#1431 ] pilorama: Use `Batch` for write transactions Helps a lot in case of concurrent request flow. ``` name old time/op new time/op delta ApplySequential/bbolt-8 78.0µs ± 9% 59.8µs ± 4% -23.39% (p=0.000 n=10+9) ApplyReorderLast/bbolt-8 143µs ± 5% 113µs ±15% -21.06% (p=0.000 n=10+10) name old alloc/op new alloc/op delta ApplySequential/bbolt-8 56.9kB ± 8% 28.9kB ± 3% -49.22% (p=0.000 n=10+10) ApplyReorderLast/bbolt-8 87.3kB ± 3% 40.9kB ±10% -53.16% (p=0.000 n=10+10) name old allocs/op new allocs/op delta ApplySequential/bbolt-8 224 ±11% 262 ± 5% +16.93% (p=0.000 n=9+10) ApplyReorderLast/bbolt-8 518 ± 4% 674 ±11% +30.09% (p=0.000 n=10+10) ``` Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	f0a67f948d	[#1431 ] pilorama: Cache attributes in the index Currently to find a node by path we iterate over all the children on each level. This is far from optimal and scales badly with the number of nodes on a single level. Thus we introduce "indexed attributes" for which an additional information is stored and which can be use in `*ByPath` operations. Currently this set only includes `FileName` attribute but this may change in future. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	536857ea5a	[#1329 ] services/tree: Implement `GetOpLog` RPC Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	7703dd5d7f	[#1419 ] pilorama: Create new nodes in path if needed Consider a node `{FileName: "dir", Attribute: "xxx"}`. In case we add a new node by path `["dir", "file.txt"]`, create a new intermediate node with a single attribute. `GetByPath` now also considers only nodes with a single attribute while building a path. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	ad48918a97	[#1406 ] pilorama: Return parent from `TreeGetMeta` Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	aea855e8f3	[#1326 ] services/tree: Implement GetSubTree RPC Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00
Evgenii Stratonikov	8cf71b7f1c	[#1324 ] local_object_storage: Implement tree service backend In this commit we implement algorithm for CRDT trees from https://martin.klepmann.com/papers/move-op.pdf Each tree is identified by the ID of a container it belongs to and the tree name itself. Essentially, it is a sequence of operations which should be applied in chronological order to get a usual tree representation. There are 2 backends for now: bbolt database and in-memory. In-memory backend is here for debugging and will eventually act as a memory-cache for the on-disk database. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-07-21 15:08:24 +03:00

43 commits