Commit graph

4781 commits

Author SHA1 Message Date
Roman Khimov
fb09670fd1 rpcclient: extract more detailed server-side on WS connection problem
If available. Fixes #2818.
2022-11-23 12:19:35 +03:00
Roman Khimov
ab0b23625b
Merge pull request #2817 from nspcc-dev/always-return-hash-vub-from-sender
Some actor/waiter interaction fixes
2022-11-23 14:37:38 +07:00
Roman Khimov
fd04b2befd actor: don't abort waiter on "already exists" error
It can happen in many cases of distributed tx generation/submission, we can
just wait normally in this case and there will be some proper result.
2022-11-23 10:07:37 +03:00
Roman Khimov
66ddeccdad
Merge pull request #2813 from nspcc-dev/fix-state-reset
core: fix broken state reset
2022-11-23 13:43:42 +07:00
Anna Shaleva
f3ef2890f0 core: check headers at the proper state on state reset
And fix the comment along the way.
2022-11-23 09:16:33 +03:00
Roman Khimov
cd6bb68246 actor: check for tx after the subscription in wsWaiter, fix #2805
Don't wait for VUB block, solve this race immediately.
2022-11-22 17:28:55 +03:00
Roman Khimov
c95d140113 rpcclient: always return tx hash from sendrawtransaction
Let upper-layer APIs like actor.Send() return it as well. Server can return
"already exists" which is an error and yet at the same time a very special
one, in many cases it means we can proceed with waiting for the TX to settle.
2022-11-22 15:18:37 +03:00
Anna Shaleva
b27a9bcf95 core: adjust info message for proper-stated chains
Make it prettier for those cases when `db reset` command was called
after interrupted reset.
2022-11-22 11:53:39 +03:00
Anna Shaleva
b82374823e core: increase persist batch size for reset storage changes 2022-11-22 11:53:39 +03:00
Anna Shaleva
bdc42cd595 core: reset blocks, txs and AERs in several stages
Sometimes it can be hard to persist all changes at ones, the process
can take almost all RAM and a lot of time. Here's the example of reset
for mainnet from 2.4M to 1:
```
anna@kiwi:~/Documents/GitProjects/nspcc-dev/neo-go$ ./bin/neo-go db reset -m --height 1
2022-11-20T17:16:48.236+0300	INFO	MaxBlockSize is not set or wrong, setting default value	{"MaxBlockSize": 262144}
2022-11-20T17:16:48.236+0300	INFO	MaxBlockSystemFee is not set or wrong, setting default value	{"MaxBlockSystemFee": 900000000000}
2022-11-20T17:16:48.237+0300	INFO	MaxTransactionsPerBlock is not set or wrong, using default value	{"MaxTransactionsPerBlock": 512}
2022-11-20T17:16:48.237+0300	INFO	MaxValidUntilBlockIncrement is not set or wrong, using default value	{"MaxValidUntilBlockIncrement": 5760}
2022-11-20T17:16:48.240+0300	INFO	restoring blockchain	{"version": "0.2.6"}
2022-11-20T17:16:48.297+0300	INFO	initialize state reset	{"target height": 1}
2022-11-20T17:16:48.300+0300	INFO	trying to reset blocks, transactions and AERs
2022-11-20T17:19:29.313+0300	INFO	blocks, transactions ans AERs are reset	{"took": "2m41.015126493s", "keys": 3958420}
...
```
To avoid OOM killer, split blocks reset into multiple stages. It increases
operation time due to intermediate DB persists, but makes things cleaner, the
result for almost the same DB height with the new approach:
```
anna@kiwi:~/Documents/GitProjects/nspcc-dev/neo-go$ ./bin/neo-go db reset -m --height 1
2022-11-20T17:39:42.023+0300	INFO	MaxBlockSize is not set or wrong, setting default value	{"MaxBlockSize": 262144}
2022-11-20T17:39:42.023+0300	INFO	MaxBlockSystemFee is not set or wrong, setting default value	{"MaxBlockSystemFee": 900000000000}
2022-11-20T17:39:42.023+0300	INFO	MaxTransactionsPerBlock is not set or wrong, using default value	{"MaxTransactionsPerBlock": 512}
2022-11-20T17:39:42.023+0300	INFO	MaxValidUntilBlockIncrement is not set or wrong, using default value	{"MaxValidUntilBlockIncrement": 5760}
2022-11-20T17:39:42.026+0300	INFO	restoring blockchain	{"version": "0.2.6"}
2022-11-20T17:39:42.071+0300	INFO	initialize state reset	{"target height": 1}
2022-11-20T17:39:42.073+0300	INFO	trying to reset blocks, transactions and AERs
2022-11-20T17:40:11.735+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 1, "took": "29.66363737s", "keys": 210973}
2022-11-20T17:40:33.574+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 2, "took": "21.839208683s", "keys": 241203}
2022-11-20T17:41:29.325+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 3, "took": "55.750698386s", "keys": 250593}
2022-11-20T17:42:12.532+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 4, "took": "43.205892757s", "keys": 321896}
2022-11-20T17:43:07.978+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 5, "took": "55.445398156s", "keys": 334822}
2022-11-20T17:43:35.603+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 6, "took": "27.625292032s", "keys": 317131}
2022-11-20T17:43:51.747+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 7, "took": "16.144359017s", "keys": 355832}
2022-11-20T17:44:05.176+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 8, "took": "13.428733899s", "keys": 357690}
2022-11-20T17:44:32.895+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 9, "took": "27.718548783s", "keys": 393356}
2022-11-20T17:44:51.814+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 10, "took": "18.917954658s", "keys": 366492}
2022-11-20T17:45:07.208+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 11, "took": "15.392642196s", "keys": 326030}
2022-11-20T17:45:18.776+0300	INFO	intermediate batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 12, "took": "11.568255716s", "keys": 299884}
2022-11-20T17:45:25.862+0300	INFO	last batch of removed blocks, transactions and AERs is persisted	{"batches persisted": 13, "took": "7.086079594s", "keys": 190399}
2022-11-20T17:45:25.862+0300	INFO	blocks, transactions ans AERs are reset	{"took": "5m43.791214084s", "overall persisted keys": 3966301}
...
```
2022-11-22 11:53:39 +03:00
Anna Shaleva
d67f0df516 core: reset block headers together with header height info
We need to keep the headers information consistent with header batches
and headers. This comit fixes the bug with failing blockchain
initialization on recovering from state reset interrupted after the
second stage (blocks/txs/AERs removal):
```
anna@kiwi:~/Documents/GitProjects/nspcc-dev/neo-go$ ./bin/neo-go db reset -t --height 83000
2022-11-20T16:28:29.437+0300	INFO	MaxValidUntilBlockIncrement is not set or wrong, using default value	{"MaxValidUntilBlockIncrement": 5760}
2022-11-20T16:28:29.440+0300	INFO	restoring blockchain	{"version": "0.2.6"}
failed to create Blockchain instance: could not initialize blockchain: could not get header 1898cd356a4a2688ed1c6c7ba1fd6ba7d516959d8add3f8dd26232474d4539bd: key not found
```
2022-11-22 11:53:39 +03:00
Anna Shaleva
283da8f599 core: use DAO-provided block height during during state reset
Don't use cache because it's not yet initialized. Also, perform
safety checks only if state reset wasn't yet started. These fixes
alloww to solve the following problem while recovering from
interrupted state reset:
```
anna@kiwi:~/Documents/GitProjects/nspcc-dev/neo-go$ ./bin/neo-go db reset -t --height 83000
2022-11-20T15:51:31.431+0300	INFO	MaxValidUntilBlockIncrement is not set or wrong, using default value	{"MaxValidUntilBlockIncrement": 5760}
2022-11-20T15:51:31.434+0300	INFO	restoring blockchain	{"version": "0.2.6"}
failed to create Blockchain instance: could not initialize blockchain: current block height is 0, can't reset state to height 83000
```
2022-11-22 11:53:39 +03:00
Anna Shaleva
7d55bf2cc1 core: log persisted storage item batches count during state reset 2022-11-22 11:53:39 +03:00
Anna Shaleva
f52451e582 core: fix state reset with broken contract
Sync up with #2802, bad contract -> no contract ID at all.
2022-11-22 11:53:39 +03:00
Anna Shaleva
ecda07736e core: stop storage items reset after any seek error 2022-11-22 11:53:39 +03:00
Anna Shaleva
bfe7aeae7b core: stop storage items reset after the first persist error
It's a bug, we mustn't continue if something bad had happend on persist,
otherwise this error will be overwritten by subsequent successfull persist.
2022-11-22 11:53:39 +03:00
Anna Shaleva
235518eb6c core: reset batch counter to zero after each persist in resetStateInternal
It's a bug, otherwise we'll persist each storage item after 10K-th one,
that's the reason of abnormous long storage items resetting stage.
2022-11-22 11:53:39 +03:00
Anna Shaleva
9f23fafc03 core: improve logging of resetStateInternal
Inform when starting subsequent stage, inform about keys persisted.
2022-11-22 11:53:36 +03:00
Roman Khimov
0039615ae3
Merge pull request #2816 from nspcc-dev/fix-pointer-serialization
Fix pointer serialization
2022-11-20 22:59:52 +07:00
Roman Khimov
9ba18b5dfa stackitem: serialize/deserialize pointers, fix #2815
They of course can't be serialized, but in protected mode we still need to
handle them somehow.
2022-11-20 16:02:50 +03:00
Roman Khimov
48140320db
Merge pull request #2812 from nspcc-dev/improve-vm-context-handling
Improve vm istack/estack handling
2022-11-20 19:42:35 +07:00
Roman Khimov
ca9fde745b
Merge pull request #2809 from nspcc-dev/fix-subs
rpcsrv: do not block blockchain events receiver by subscription requests
2022-11-18 16:16:41 +07:00
Roman Khimov
8e7f65be17 vm: use proper estack for exception handler
v.estack might be some inner invoked contract and its stack must not be used
for exception handler set up by higher-order contract.
2022-11-18 11:36:38 +03:00
Roman Khimov
cb64957af5 vm: don't use Stack for istack
We don't use all of the Stack functionality for it, so drop useless methods
and avoid some interface conversions. It increases single-node TPS by about
0.9%, so nothing really important there, but not a bad change either. Maybe it
can be reworked again with generics though.
2022-11-18 11:35:29 +03:00
Anna Shaleva
4df9a5e379 rpcsrv: refactor subscribe routine
Move shutdown check after subsCounterLock is taken in the end of
`(s *Server) subscribe` in order to avoid extra locks holding.
2022-11-18 10:54:10 +03:00
Anna Shaleva
b7f19a54d5 services: fix chain locked by WS subscriptions handlers
Blockchain's subscriptions, unsubscriptions and notifications are
handled by a single notificationDispatcher routine. Thus, on attempt
to send the subsequent event to Blockchain's subscribers, dispatcher
can't handle subscriptions\unsubscriptions. Make subscription and
unsubscription to be a non-blocking operation for blockchain on the
server side, otherwise it may cause the dispatcher locks.

To achieve this, use a separate lock for those code that make calls
to blockchain's subscription API and for subscription counters on
the server side.
2022-11-18 09:30:12 +03:00
Roman Khimov
2bcb7bd06f compiler: don't use (*VM).Istack when it's not needed 2022-11-17 20:46:06 +03:00
Roman Khimov
b8c09f509f network: add random slight delay to connection attempts
Small (especially dockerized/virtualized) networks often start all nodes at
ones and then we see a lot of connection flapping in the log. This happens
because nodes try to connect to each other simultaneously, establish two
connections, then each one finds a duplicate and drops it, but this can be
different duplicate connections on other sides, so they retry and it all
happens for some time. Eventually everything settles, but we have a lot of
garbage in the log and a lot of useless attempts.

This random waiting timeout doesn't change the logic much, adds a minimal
delay, but increases chances for both nodes to establish a proper single
connection on both sides to only then see another one and drop it on both
sides as well. It leads to almost no flapping in small networks, doesn't
affect much bigger ones. The delay is close to unnoticeable especially if
there is something in the DB for node to process during startup.
2022-11-17 18:42:43 +03:00
Roman Khimov
075a54192c network: don't try too many connections
Consider mainnet, it has an AttemptConnPeers of 20, so may already have 3
peers and request 20 more, then have 4th connected and attemtp 20 more again,
this leads to a huge number of connections easily.
2022-11-17 18:03:04 +03:00
Roman Khimov
6bce973ac2 network: drop duplicationg check from handleAddrCmd()
It was relevant with the queue-based discoverer, now it's not, discoverer
handles this internally.
2022-11-17 17:42:36 +03:00
Roman Khimov
1c7487b8e4 network: add a timer to check for peers
Consider initial connection phase for public networks:
 * simultaneous connections to seeds
 * very quick handshakes
 * got five handshaked peers and some getaddr requests sent
 * but addr replies won't trigger new connections
 * so we can stay with just five connections until any of them breaks or a
   (long) address checking timer fires

This new timers solves the problem, it's adaptive at the same time. If we have
enough peers we won't be waking up often.
2022-11-17 17:32:05 +03:00
Anna Shaleva
e73c3c7ec4 services: adjust WS waiter test
Make it more stable.
2022-11-17 17:15:01 +03:00
Roman Khimov
23f118a1a9 network: rework discoverer/server interaction
* treat connected/handshaked peers separately in the discoverer, save
   "original" address for connected ones, it can be a name instead of IP and
   it's important to keep it to avoid reconnections
 * store name->IP mapping for seeds if and when they're connected to avoid
   reconnections
 * block seed if it's detected to be our own node (which is often the case for
   small private networks)
 * add an event for handshaked peers in the server, connected but
   non-handshaked ones are not really helpful for MinPeers or GetAddr logic

Fixes #2796.
2022-11-17 17:07:19 +03:00
Roman Khimov
6ba4afc977 network: consider handshaked peers only when comparing with MinPeers
We don't know a lot about non-handshaked ones, so it's safer to try more
connection.
2022-11-17 16:40:29 +03:00
Roman Khimov
ab0ff63ce1
Merge pull request #2804 from nspcc-dev/check-aer-sub
rpc: fix subscribers locking logic and properly drain poll-based waiter receiver
2022-11-17 04:24:35 +07:00
Anna Shaleva
1399496dfb rpcclient: refactor event-based waiting loop
Avoid receiver channels locks.
2022-11-16 23:57:00 +03:00
Anna Shaleva
95e23c8e46 actor: fix event-based tx awaiting
If VUB-th block is received, we still can't guaranty that transaction
wasn't accepted to chain. Back this situation by rolling back to a
poll-based waiter.
2022-11-16 23:44:31 +03:00
Anna Shaleva
6dbae7edc4 rpcclient: fix WS-client unsubscription process
Do not block subscribers until the unsubscription request to RPC server
is completed. Otherwise, another notification may be received from the
RPC server which will block the unsubscription process.

At the same time, fix event-based waiter. We must not block the receiver
channel during unsubscription because there's a chance that subsequent
event will be sent by the server. We need to read this event in order not
to block the WSClient's readloop.
2022-11-16 23:44:30 +03:00
Anna Shaleva
ddaba9e74d rpcsrv: fix "subscribe" parameters handling
If it's a subscription for AERs, we need to check the filter's state only
if it has been provided, otherwise filter is always valid.
2022-11-16 14:05:13 +03:00
Anna Shaleva
d043139b66 rpcsrv: adjust "subscribe" response error
Make it more detailed for better debugging experience.
2022-11-16 13:35:19 +03:00
Anna Shaleva
3f122fd591 rpcclient: adjust WS waiter error formatting
Follow the other errors formatting style.
2022-11-16 12:22:18 +03:00
Roman Khimov
822722bd2e native: ignore decoding errors during cache init
Bad contract -> no contract. Unfortunately we've got a broken
6f1837723768f27a6f6a14452977e3e0e264f2cc contract on the mainnet which can't
be decoded (even though it had been saved successfully), so this is a
temporary fix for #2801 to be able to start mainnet node after shutdown.
2022-11-16 12:00:28 +03:00
Roman Khimov
aef01bf663 vm: fix istack marshaling, fix #2799 2022-11-16 00:40:12 +03:00
Roman Khimov
90582faacd vm: save current stack slice when loading new context
v.estack is used throughout the code to work with estack, while ctx.sc.estack
is (theoretically) just a reference to it that is saved on script load and
restored to v.estack on context unload. The problem is that v.estack can grow
as we use it and can be reallocated away from its original slice (saved in the
ctx.sc.estack), so either ctx.sc.estack should be a pointer or we need to
ensure that it's correct when loading a new script. The second approach is a
bit safer for now and it fixes #2798.
2022-11-15 23:48:02 +03:00
Anna Shaleva
6f3a0a6b4c network: adjust warning for deposit expiration
Provide additional info for better user experience.
2022-11-15 14:16:34 +03:00
Roman Khimov
c67ee54566
Merge pull request #2792 from nspcc-dev/rpcwrapper-arrays
RPC wrapper for simple arrays
2022-11-15 13:08:25 +07:00
Roman Khimov
82c6ce218b rpcbinding: use binding condig to generate code for simple arrays
Part of #2767.
2022-11-14 13:01:13 +03:00
Roman Khimov
b5c79f4be3 unwrap: add a complete set of simple array unwrappers
Arrays of basic types should be covered completely.
2022-11-14 13:01:13 +03:00
Roman Khimov
c405092953 network: pre-filter transactions going into dbft
Drop some load from dbft loop during consensus process.
2022-11-11 15:32:51 +03:00
Roman Khimov
f78231fd9c
Merge pull request #2773 from nspcc-dev/state-reset
core: implement state reset
2022-11-10 22:26:43 +07:00