Move changes from the support branch. #58

Merged

fyrchik merged 25 commits from move-changes into master

2023-02-20 10:53:28 +00:00

Author	SHA1	Message	Date
Anton Nikiforov	50e35de457	[#1868 ] Reload config for pprof and metrics on SIGHUP Signed-off-by: Anton Nikiforov <an.nikiforov@yadro.com>	2023-02-17 12:29:36 +03:00
Evgenii Stratonikov	b679a724ef	[#2260 ] node: Use a separate client cache for PUT service Currently, under a mixed load one failed PUT can lead to closing connection for all concurrent GETs. For PUT it does no harm: we have many other nodes to choose from. For GET we are limited by `REP N` factor, so in case of failover we can close the connection with the only node posessing an object, which leads to failing the whole operation. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	0872d2aa49	[#2260 ] network/cache: Ignore clients only on `Dial` errors The problem is that accidental timeout errors can make us to ignore other nodes for some time. The primary purpose of the whole ignore mechanism is not to degrade in case of failover. For this case, closing connection and limiting the amount of dials is enough. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	c6a5e3f4ee	[#2260 ] network/cache: Ignore `context cancelled` errors Timeouts on client side should node affect inter-node communication. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	9c3a029941	[#2260 ] services/object: Do not assemble object with TTL=1 Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	f1469626f5	[#2234 ] writecache: Fix possible panic in `initFlushMarks` In case we have many small objects in the write-cache, `indices` should not be reused between iterations. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	a09543c0e1	[#2252 ] fstree: Allow concurrent writes Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Pavel Karpy	6c6319fc89	[#2164 ] node: Fix multi-client error reporting Missing `ReportError` method did not allow casing multi-client interface to `errorReporter` interface and dropping broken connections. `replicationClient` embeds that interface, and it is widely used across node's code. Embedded interface does not allow casting its parent structure to `errorReporter` and breaks multi client error reporting logic. Multi-client scheme is extremely hard to maintain, it makes unpredictable casts and does not allow tracking code flow, so it will be refactored in the future anyway. Signed-off-by: Pavel Karpy <p.karpy@yadro.com>	2023-02-17 12:24:23 +03:00
Pavel Karpy	31b011f4ae	[#2244 ] node: Fix subscriptions lock Subscribing without async listening could lead to a dead-lock in the `neo-go` client. Signed-off-by: Pavel Karpy <p.karpy@yadro.com>	2023-02-17 12:24:23 +03:00
Pavel Karpy	6afe96a171	[#2244 ] node: Add object address to WC's operations Signed-off-by: Pavel Karpy <p.karpy@yadro.com>	2023-02-17 12:24:23 +03:00
Pavel Karpy	90476d3bad	[#2244 ] node: Update expired storage ID by WC Previously, node could get an "infinite" small object: it could be expired and thus could not be flushed (update its storage ID) to metabase => could not be marked as flushed => node never removes such object and repeat all the cycle one more time. If object exists and is not marked with GC (meta returns `ErrObjectIsExpired`, not `ObjectNotFound` and not `ObjectAlreadyRemoved`), its ID is safe to update _in the same_ bbolt transaction. Signed-off-by: Pavel Karpy <p.karpy@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	889702b4d5	[#2246 ] node: Allow to configure tombsone lifetime Currently, DELETE service sets tombstone expiration epoch to `current epoch + 5`. This works less than ideal in private networks where an epoch can be e.g. 10 minutes. In this case, after a node is unavailable for more than 1 hour, already deleted objects have a chance to reappear. After this commit tombstone lifetime can be configured. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	64350f7b0f	[#2241 ] metrics: Fix request count metrics names Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	d213461bf8	[#2238 ] engine: Add test for component initialization failures Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	b2159b9608	[#2238 ] engine: Add test for component initialization failures Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	1fe9a650c7	[#2238 ] neofs-node: Gracefully handle shard initialization errors Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	53067a7db0	[#2238 ] shard: Try closing all components Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	9a61e53162	[#2238 ] engine: Make `Open` and `Init` similar 1. Both could initialize shards in parallel. 2. Both should close shards after an error. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	75de3f1c1f	[#2239 ] writecache: Fix possible deadlock LRU `Peek`/`Contains` take LRU mutex _inside_ of a `View` transaction. `View` transaction itself takes `mmapLock` [1], which is lifted after tx finishes (in `tx.Commit()` -> `tx.close()` -> `tx.db.removeTx`) When we evict items from LRU cache mutex order is different: first we take LRU mutex and then execute `Batch` which _does_ take `mmapLock` in case we need to remap. Thus the deadlock. [1] `8f4a7e1f92/db.go (L708)` Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	de344e9223	[#2232 ] pilorama: Merge in-queue batches To achieve high performance we must choose proper values for both batch size and delay. For user operations we want to set low delay. However it would prevent tree synchronization operations to form big enough batches. For these operations, batching gives the most benefit not only in terms of on-CPU execution cost, but also by speeding up transaction persist (`fsync`). In this commit we try merging batches that are already _triggered_, but not yet _started to execute_. This way we can still query batches for execution after the provided delay while also allowing multiple formed batches to execute faster. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	f1e3309ca3	[#2224 ] adm: Use native neo-go sessions in `dump-hashes` If we had lots of domains in one zone, `dump-hashes` for all others can miss some domains, because we need to restrict ourselves with _some_ number. In this commit we use neo-go sessions by default, with a proper failback to in-script iterator unwrapping. Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:23 +03:00
Pavel Karpy	e17bf71d19	[#2213 ] node: Do not return object expired object "Object is expired" means that object is presented in `meta` but it is not `ObjectNotFound` error. Previous implementation made `shard` search for an object without `meta` which was an error. Signed-off-by: Pavel Karpy <p.karpy@yadro.com>	2023-02-17 12:24:23 +03:00
Roman Khimov	7b0708f50b	CHANGELOG: add more fancy glyphs How could you forget adding it? Signed-off-by: Roman Khimov <roman@nspcc.ru>	2023-02-17 12:24:23 +03:00
Roman Khimov	3b38aedb38	CHANGELOG: fix whitespacing errors Signed-off-by: Roman Khimov <roman@nspcc.ru>	2023-02-17 12:24:23 +03:00
Evgenii Stratonikov	1f01a0a71a	[#2212 ] morph: Fix subscription restoration Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>	2023-02-17 12:24:22 +03:00