neoneo-go

Author	SHA1	Message	Date
Roman Khimov	cfb5058018	network: batch getdata replies This is not exactly the protocol-level batching as was tried in #1770 and proposed by neo-project/neo#2365, but it's a TCP-level change in that we now Write() a set of messages and given that Go sets up TCP sockets with TCP_NODELAY by default this is a substantial change, we have less packets generated with the same amount of data. It doesn't change anything on properly connected networks, but the ones with delays benefit from it a lot. This also improves queueing because we no longer generate 32 messages to deliver on transaction's GetData, it's just one stream of bytes with 32 messages inside. Do the same with GetBlocksByIndex, we can have a lot of messages there too. But don't forget about potential peer DoS attacks, if a peer is to request a lot of big blocks we need to flush them before we process the whole set.	2022-10-21 17:16:32 +03:00
Roman Khimov	e1b5ac9b81	network: separate tx handling from msg handling This allows to naturally scale transaction processing if we have some peer that is sending a lot of them while others are mostly silent. It also can help somewhat in the event we have 50 peers that all send transactions. 4+1 scenario benefits a lot from it, while 7+2 slows down a little. Delayed scenarios don't care. Surprisingly, this also makes disconnects (#2744) much more rare, 4-node scenario almost never sees it now. Most probably this is the case where peers affect each other a lot, single-threaded transaction receiver can be slow enough to trigger some timeout in getdata handler of its peer (because it tries to push a number of replies).	2022-10-21 12:11:24 +03:00
Roman Khimov	e003b67418	network: reuse inventory hash list for request hashes Microoptimization, we can do this because we only use them in handleInvCmd().	2022-10-21 11:28:40 +03:00
Roman Khimov	0f625f04f0	Merge pull request #2748 from nspcc-dev/stop-tx-flow network/consensus: use new dbft StopTxFlow callback	2022-10-18 16:29:37 +07:00
Roman Khimov	73ce898e27	network/consensus: use new dbft StopTxFlow callback It makes sense in general (further narrowing down the time window when transactions are processed by consensus thread) and it improves block times a little too, especially in the 7+2 scenario. Related to #2744.	2022-10-18 11:06:20 +03:00
Roman Khimov	2791127ee4	network: add prometheus histogram with cmd processing time It can be useful to detect some performance issues.	2022-10-17 22:51:16 +03:00
Roman Khimov	73079745ab	Merge pull request #2746 from nspcc-dev/optimize-tx-callbacks network: only call tx callback if we're waiting for transactions	2022-10-17 16:39:41 +07:00
Roman Khimov	dce9f80585	Merge pull request #2743 from nspcc-dev/log-fan-out Logarithmic gossip fan out	2022-10-14 23:18:34 +07:00
Roman Khimov	4dd3fd4ac0	network: only call tx callback if we're waiting for transactions Until the consensus process starts for a new block and until it really needs some transactions we can spare some cycles by not delivering transactions to it. In tests this doesn't affect TPS, but makes block delays a bit more stable. Related to #2744, I think it also may cause timeouts during transaction processing (waiting on the consensus process channel while it does something dBFT-related).	2022-10-14 18:45:48 +03:00
Roman Khimov	65f0fadddb	network: register peer only if it's not a duplicate	2022-10-14 15:53:32 +03:00
Roman Khimov	851cbc7dab	network: implement adaptive peer requests When the network is big enough, MinPeers may be suboptimal for good network connectivity, but if we know the network size we can do some estimation on the number of sufficient peers.	2022-10-14 15:53:32 +03:00
Roman Khimov	c17b2afab5	network: add BroadcastFactor to control gossip, fix #2678	2022-10-14 15:53:32 +03:00
Roman Khimov	215e8704f1	network: simplify discoverer, make it almost a lib We already have two basic lists: connected and unconnected nodes, we don't need an additional channel and we don't need a goroutine to handle it.	2022-10-14 15:53:32 +03:00
Roman Khimov	c1ef326183	network: re-add addresses to the pool on UnregisterConnectedAddr That's what we do anyway, but this way we can be a bit more efficient.	2022-10-14 14:12:33 +03:00
Roman Khimov	631f166709	network: broadcast to log-dependent number of nodes Fixes #608.	2022-10-14 14:12:33 +03:00
Roman Khimov	dc62046019	network: add network size estimation metric	2022-10-12 22:29:55 +03:00
Roman Khimov	bcf77c3c42	network: filter out not-yet-ready nodes when broadcasting They can fail right in the getPeers or they can fail later when packet send is attempted. Of course they can complete handshake in-between these events, but most likely they won't and we'll waste more resources on this attempt. So rule out bad peers immediately.	2022-10-12 16:51:01 +03:00
Roman Khimov	137f2cb192	network: deduplicate TCPPeer code a bit context.Background() is never canceled and has no deadline, so we can avoid duplicating some code.	2022-10-12 15:43:31 +03:00
Roman Khimov	104da8caff	network: broadcast messages, enqueue packets Drop EnqueueP2PPacket, replace EnqueueHPPacket with EnqueueHPMessage. We use Enqueue* when we have a specific per-peer message, it makes zero sense duplicating serialization code for it (unlike Broadcast*).	2022-10-12 15:39:20 +03:00
Roman Khimov	d5f2ad86a1	network: drop unused EnqueueMessage interface from Peer	2022-10-12 15:27:08 +03:00
Roman Khimov	b345581c72	network: pings are broadcasted, don't send them to everyone Follow the general rules of broadcasts, even though it's somewhat different from Inv, we just want to get some reply from our neighbors to see if we're behind. We don't strictly need all neighbors for it.	2022-10-12 15:25:03 +03:00
Roman Khimov	e1d5f18ff4	network: fix outdated Peer interface comments	2022-10-12 10:16:07 +03:00
Roman Khimov	8b26d9475b	network: speculatively set GetAddrSent status Otherwise we routinely get "unexpected addr received" error.	2022-10-11 18:42:40 +03:00
Roman Khimov	e80c60a3b9	network: rework broadcast logic We have a number of queues for different purposes: * regular broadcast queue * direct p2p queue * high-priority queue And two basic egress scenarios: * direct p2p messages (replies to requests in Server's handle* methods) * broadcasted messages Low priority broadcasted messages: * transaction inventories * block inventories * notary inventories * non-consensus extensibles High-priority broadcasted messages: * consensus extensibles * getdata transaction requests from consensus process * getaddr requests P2P messages are a bit more complicated, most of the time they use p2p queue, but extensible message requests/replies use HP queue. Server's handle* code is run from Peer's handleIncoming, every peer has this thread that handles incoming messages. When working with the peer it's important to reply to requests and blocking this thread until we send (queue) a reply is fine, if the peer is slow we just won't get anything new from it. The queue used is irrelevant wrt this issue. Broadcasted messages are radically different, we want them to be delivered to many peers, but we don't care about specific ones. If it's delivered to 2/3 of the peers we're fine, if it's delivered to more of them --- it's not an issue. But doing this fairly is not an easy thing, current code tries performing unblocked sends and if this doesn't yield enough results it then blocks (but has a timeout, we can't wait indefinitely). But it does so in sequential manner, once the peer is chosen the code will wait for it (and only it) until timeout happens. What can be done instead is an attempt to push the message to all of the peers simultaneously (or close to that). If they all deliver --- OK, if some block and wait then we can wait until _any_ of them pushes the message through (or global timeout happens, we still can't wait forever). If we have enough deliveries then we can cancel pending ones and it's again not an error if these canceled threads still do their job. This makes the system more dynamic and adds some substantial processing overhead, but it's a networking code, any of this overhead is much lower than the actual packet delivery time. It also allows to spread the load more fairly, if there is any spare queue it'll get the packet and release the broadcaster. On the next broadcast iteration another peer is more likely to be chosen just because it didn't get a message previously (and had some time to deliver already queued messages). It works perfectly in tests, with optimal networking conditions we have much better block times and TPS increases by 5-25%% depending on the scenario. I'd go as far as to say that it fixes the original problem of #2678, because in this particular scenario we have empty queues in ~100% of the cases and this new logic will likely lead to 100% fan out in this case (cancelation just won't happen fast enough). But when the load grows and there is some waiting in the queue it will optimize out the slowest links.	2022-10-11 18:42:40 +03:00
Roman Khimov	dabdad20ad	network: don't wait indefinitely for packet to be sent Peers can be slow, very slow, slow enough to affect node's regular operation. We can't wait for them indefinitely, there has to be a timeout for send operations. This patch uses TimePerBlock as a reference for its timeout. It's relatively big and it doesn't affect tests much, 4+1 scenarios tend to perform a little worse with while 7+2 scenarios work a little better. The difference is in some percents, but all of these tests easily have 10-15% variations from run to run. It's an important step in making our gossip better because we can't have any behavior where neighbors directly block the node forever, refs. #2678 and	2022-10-10 22:15:21 +03:00
Roman Khimov	317dd42513	: use uintSize and SignatureLen constants where appropriate	2022-10-05 10:45:52 +03:00
Roman Khimov	4f3ffe7290	golangci: enable errorlint and fix everything it found	2022-09-02 18:36:23 +03:00
Roman Khimov	779a5c070f	network: wait for exit in discoverer And synchronize other threads with channels instead of mutexes. Overall this scheme is more reliable.	2022-08-19 22:23:47 +03:00
Roman Khimov	eeeb0f6f0e	core: accept two-side channels for sub/unsub, read on unsub Blockchain's notificationDispatcher sends events to channels and these channels must be read from. Unfortunately, regular service shutdown procedure does unsubscription first (outside of the read loop) and only then drains the channel. While it waits for unsubscription request to be accepted notificationDispatcher can try pushing more data into the same channel which will lead to a deadlock. Reading in the same method solves this, any number of events can be pushed until unsub channel accepts the data.	2022-08-19 22:08:40 +03:00
Roman Khimov	dea75a4211	network: wait for the relayer thread to finish on shutdown Unsubscribe and drain first, then return from the Shutdown method. It's important wrt to subsequent chain shutdown process (normally it's closed right after the network server).	2022-08-19 22:08:40 +03:00
Roman Khimov	155089f4e5	network: drop cleanup from TestVerifyNotaryRequest It never runs the server, so `746644a4eb` was a bit wrong with this.	2022-08-19 20:54:06 +03:00
Anna Shaleva	916f2293b8	*: apply go 1.19 formatter heuristics And make manual corrections where needed. See the "Common mistakes and pitfalls" section of https://tip.golang.org/doc/comment.	2022-08-09 15:37:52 +03:00
Anna Shaleva	bb751535d3	*: bump minimum supported go version Close #2497.	2022-08-08 13:59:32 +03:00
Roman Khimov	9b0ea2c21b	network/consensus: always process dBFT messages as high priority Move category definition from consensus to payload, consensus service is the one of its kind (HP), so network.Server can be adjusted accordingly.	2022-08-02 13:07:18 +03:00
Roman Khimov	94a8784dcb	network: allow to drop services and solve concurrency issues Now that services can come and go we need to protect all of the associated fields and allow to deregister them.	2022-08-02 13:05:39 +03:00
Roman Khimov	5a7fa2d3df	cli: restart consensus service on USR2 Fix #1949. Also drop wallet from the ServerConfig since it's not used in any meaningful way after this change.	2022-08-02 13:05:07 +03:00
Roman Khimov	2e27c3d829	metrics: move package to services Where it belongs.	2022-07-21 23:38:23 +03:00
Anna Shaleva	1ae601787d	network: allow to handle GetMPTData with KeepOnlyLatestState on And adjust documentation along the way.	2022-07-14 14:33:20 +03:00
Roman Khimov	dc59dc991b	config: move metrics.Config into config.BasicService Config package should be as lightweight as possible and now it depends on the whole metrics package just to get one structure from it.	2022-07-08 23:30:30 +03:00
Roman Khimov	3fbc1331aa	Merge pull request #2582 from nspcc-dev/fix-server-sync network: adjust the way (*Server).IsInSync() works	2022-07-05 12:28:20 +03:00
Roman Khimov	9f05009d1a	Merge pull request #2580 from nspcc-dev/service-review Service review	2022-07-05 12:23:25 +03:00
Anna Shaleva	0835581fa9	network: adjust the way (*Server).IsInSync() works Always return true if sync was reached once. Fix #2564.	2022-07-05 12:20:31 +03:00
Roman Khimov	3e2eda6752	*: add some comments to service Start/Shutdown methods	2022-07-04 23:03:50 +03:00
Roman Khimov	c26a962b55	*: use localhost address instead of 127.0.0.1, fix #2575	2022-06-30 16:19:07 +03:00
Anna Shaleva	8ab422da66	*: properly unsubscribe from Blockchain events	2022-06-28 19:09:25 +03:00
Roman Khimov	75d06d18c9	Merge pull request #2466 from nspcc-dev/rules-fixes Rules scope fixes	2022-05-06 11:09:39 +03:00
Roman Khimov	bd352daab4	transaction: fix Rules stringer, it's WitnessRules in C# See neo-project/neo#2720.	2022-05-06 10:08:09 +03:00
Elizaveta Chichindaeva	28908aa3cf	[#2442 ] English Check Signed-off-by: Elizaveta Chichindaeva <elizaveta@nspcc.ru>	2022-05-04 19:48:27 +03:00
Roman Khimov	53423b7c37	network: fix panic in blockqueue during shutdown panic: send on closed channel goroutine 116 [running]: github.com/nspcc-dev/neo-go/pkg/network.(blockQueue).putBlock(0xc00011b650, 0xc01e371200) github.com/nspcc-dev/neo-go/pkg/network/blockqueue.go:129 +0x185 github.com/nspcc-dev/neo-go/pkg/network.(Server).handleBlockCmd(0xc0002d3c00, {0xf69b7f?, 0xc001520010?}, 0xc02eb44000?) github.com/nspcc-dev/neo-go/pkg/network/server.go:607 +0x6f github.com/nspcc-dev/neo-go/pkg/network.(Server).handleMessage(0xc0002d3c00, {0x121f4c8?, 0xc001528000?}, 0xc01e35cf80) github.com/nspcc-dev/neo-go/pkg/network/server.go:1160 +0x6c5 github.com/nspcc-dev/neo-go/pkg/network.(TCPPeer).handleIncoming(0xc001528000) github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:189 +0x98 created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:164 +0xcf	2022-04-26 00:31:48 +03:00
Roman Khimov	2593bb0535	network: extend Service with Name, use it to distinguish services	2022-04-26 00:31:48 +03:00
Evgeniy Stratonikov	34b1b52784	network: check compressed payload size in `decompress` Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>	2022-03-24 17:22:55 +03:00
Anna Shaleva	753d604784	network: use net.ErrClosed to check network connection was closed Close #1765.	2022-03-17 19:39:18 +03:00
Anna Shaleva	9bbd94d0fa	network: tune waiting limits in tests Some tests are failing on Windows due to slow runners with errors like the following: ``` 2022-02-09T17:11:20.3127016Z --- FAIL: TestGetData/transaction (1.82s) 2022-02-09T17:11:20.3127385Z server_test.go:500: 2022-02-09T17:11:20.3127878Z Error Trace: server_test.go:500 2022-02-09T17:11:20.3128533Z server_test.go:520 2022-02-09T17:11:20.3128978Z Error: Condition never satisfied 2022-02-09T17:11:20.3129479Z Test: TestGetData/transaction ```	2022-02-10 18:58:50 +03:00
Roman Khimov	e621f746a7	config/core: allow to change the number of validators Fixes #2320.	2022-01-31 23:14:38 +03:00
Roman Khimov	60d6fa1125	network: keep a copy of the config inside of Server Avoid copying the configuration again and again, make things a bit more efficient.	2022-01-24 18:43:01 +03:00
Roman Khimov	89d754da6f	network: don't request blocks we already have in the queue Fixes #2258.	2022-01-18 00:04:41 +03:00
Roman Khimov	03fd91e857	network: use assert.Eventually in bq test Simpler and more efficient (polls more often and completes the test sooner).	2022-01-18 00:04:29 +03:00
Roman Khimov	d52a06a82d	network: move index-position relation into helper Just to make things more clear, no functional changes.	2022-01-18 00:02:16 +03:00
Roman Khimov	bc6d6e58bc	network: always pass transactions to consensus process Consensus can require conflicting transactions and it can require more transactions than mempool can fit, all of this should work. Transactions will be checked anyway using its secondary mempool. See the scenario from #668.	2022-01-14 20:08:40 +03:00
Roman Khimov	746644a4eb	network: decouple it from blockchainer.Blockchainer We don't need all of it.	2022-01-14 19:57:16 +03:00
Roman Khimov	bf1604454c	blockchainer/network: move StateSync interface to the user Only network package cares about it.	2022-01-14 19:57:14 +03:00
Roman Khimov	af87cb082f	network: decouple Server from the notary service	2022-01-14 19:55:53 +03:00
Roman Khimov	508d36f698	network: drop consensus dependency	2022-01-14 19:55:53 +03:00
Roman Khimov	66aafd868b	network: unplug stateroot service from the Server Notice that it makes the node accept Extensible payloads with any category which is the same way C# node works. We're trusting Extensible senders, improper payloads are harmless until they DoS the network, but we have some protections against that too (and spamming with proper category doesn't differ a lot).	2022-01-14 19:55:50 +03:00
Roman Khimov	0ad3ea5944	network/cli: move Oracle service instantiation out of the network	2022-01-14 19:53:45 +03:00
Roman Khimov	5dd4db2c02	network/services: unify service lifecycle management Run with Start, Stop with Shutdown, make behavior uniform.	2022-01-14 19:53:45 +03:00
Roman Khimov	c942402957	blockchainer: drop Policer interface We never use it as a proper interface, so it makes no sense keeping it this way.	2022-01-12 00:58:03 +03:00
Roman Khimov	48de82d902	network: fix data race in TestHandleMPTData, fix #2241	2021-11-15 12:37:01 +03:00
Roman Khimov	fe50f6edc7	Merge pull request #2240 from nspcc-dev/fix-panic-in-network Fix panic on peer disconnect	2021-11-01 12:44:15 +03:00
Roman Khimov	774dee3cd4	network: fix disconnection race between handleConn() and handleIncoming() handleIncoming() winning the race for p.Disconnect() call might lead to nil error passed as the reason for peer unregistration.	2021-11-01 12:20:55 +03:00
Roman Khimov	2eeec73770	network: don't panic if there is no reason for disconnect Although error should always be there, we shouldn't fail like this if it's not: \| panic: runtime error: invalid memory address or nil pointer dereference \| [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc8884c] \| \| goroutine 113 [running]: \| github.com/nspcc-dev/neo-go/pkg/network.(Server).run(0xc000150580) \| github.com/nspcc-dev/neo-go/pkg/network/server.go:396 +0x7ac \| github.com/nspcc-dev/neo-go/pkg/network.(Server).Start(0xc000150580, 0x0) \| github.com/nspcc-dev/neo-go/pkg/network/server.go:294 +0x3fb \| created by github.com/nspcc-dev/neo-go/cli/server.startServer \| github.com/nspcc-dev/neo-go/cli/server/server.go:344 +0x56f	2021-11-01 12:19:00 +03:00
Roman Khimov	8bb1ecb45a	network: remove priority queue from block queue Use circular buffer which is a bit more appropriate. The problem is that priority queue accepts and stores equal items which wastes memory even in normal usage scenario, but it's especially dangerous if the node is stuck for some reason. In this case it'll accept from peers and put into queue the same blocks again and again leaking memory up to OOM condition. Notice that queue length calculation might be wrong in case circular buffer wraps, but it's not very likely to happen (usually blocks not coming from the queue are added by consensus and it's not very fast in doing so).	2021-11-01 11:49:01 +03:00
AnnaShaleva	2d196b3f35	rpc: refactor `calculatenetworkfee` handler Use (Blockchainer).VerifyWitness() to calculate network fee for contract-based witnesses.	2021-10-25 19:07:25 +03:00
Evgeniy Stratonikov	4dd3a0d503	network: request headers in parallel, fix #2158 Do this similarly to how blocks are requested. See also `4aa1a37`. Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>	2021-10-06 15:25:54 +03:00
Evgeniy Stratonikov	7fa6c8dcf6	config: fix duration parameter types These parameters denote seconds and are thus unitless integers, not durations. Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>	2021-09-25 13:13:51 +03:00
Anna Shaleva	6357af0bb0	network: fix race in TestHandleGetMPTData Init server config before server start. Fixes the following data race: ``` WARNING: DATA RACE Write at 0x00c00032ef20 by goroutine 26: github.com/nspcc-dev/neo-go/pkg/network.TestHandleGetMPTData.func2() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server_test.go:755 +0x10a testing.tRunner() /usr/local/go/src/testing/testing.go:1193 +0x202 Previous read at 0x00c00032ef20 by goroutine 24: github.com/nspcc-dev/neo-go/internal/fakechain.(FakeChain).GetConfig() /go/src/github.com/nspcc-dev/neo-go/internal/fakechain/fakechain.go:167 +0x6f github.com/nspcc-dev/neo-go/pkg/network.(Server).initStaleMemPools() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:1433 +0x89 github.com/nspcc-dev/neo-go/pkg/network.(Server).Start() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:284 +0x288 github.com/nspcc-dev/neo-go/pkg/network.startWithChannel.func1() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server_test.go:91 +0x44 Goroutine 26 (running) created at: testing.(T).Run() /usr/local/go/src/testing/testing.go:1238 +0x5d7 github.com/nspcc-dev/neo-go/pkg/network.TestHandleGetMPTData() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server_test.go:752 +0x8c testing.tRunner() /usr/local/go/src/testing/testing.go:1193 +0x202 Goroutine 24 (running) created at: github.com/nspcc-dev/neo-go/pkg/network.startWithChannel() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server_test.go:90 +0x78 github.com/nspcc-dev/neo-go/pkg/network.startTestServer() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server_test.go:384 +0xbd github.com/nspcc-dev/neo-go/pkg/network.TestHandleGetMPTData.func2() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server_test.go:753 +0x55 testing.tRunner() /usr/local/go/src/testing/testing.go:1193 +0x202 ```	2021-09-13 11:45:48 +03:00
Anna Shaleva	29ef076f4b	network: fix race in TestTryInitStateSync Register peers properly. Fixes the following data race: ``` Read at 0x00c001184ac8 by goroutine 116: github.com/nspcc-dev/neo-go/pkg/network.(localPeer).EnqueueHPPacket() /go/src/github.com/nspcc-dev/neo-go/pkg/network/helper_test.go:127 +0x1f2 github.com/nspcc-dev/neo-go/pkg/network.(localPeer).EnqueuePacket() /go/src/github.com/nspcc-dev/neo-go/pkg/network/helper_test.go:114 +0xac github.com/nspcc-dev/neo-go/pkg/network.(localPeer).EnqueueMessage() /go/src/github.com/nspcc-dev/neo-go/pkg/network/helper_test.go:111 +0xc1 github.com/nspcc-dev/neo-go/pkg/network.(localPeer).SendPing() /go/src/github.com/nspcc-dev/neo-go/pkg/network/helper_test.go:159 +0x88 github.com/nspcc-dev/neo-go/pkg/network.(Server).runProto() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:446 +0x409 Previous write at 0x00c001184ac8 by goroutine 102: github.com/nspcc-dev/neo-go/pkg/network.newLocalPeer() /go/src/github.com/nspcc-dev/neo-go/pkg/network/helper_test.go:83 +0x476 github.com/nspcc-dev/neo-go/pkg/network.TestTryInitStateSync.func3() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server_test.go:1064 +0x40f testing.tRunner() /usr/local/go/src/testing/testing.go:1123 +0x202 Goroutine 116 (running) created at: github.com/nspcc-dev/neo-go/pkg/network.(Server).run() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:358 +0x69 github.com/nspcc-dev/neo-go/pkg/network.(Server).Start() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:292 +0x488 github.com/nspcc-dev/neo-go/pkg/network.startWithChannel.func1() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server_test.go:91 +0x44 Goroutine 102 (running) created at: testing.(T).Run() /usr/local/go/src/testing/testing.go:1168 +0x5bb github.com/nspcc-dev/neo-go/pkg/network.TestTryInitStateSync() /go/src/github.com/nspcc-dev/neo-go/pkg/network/server_test.go:1056 +0xbb testing.tRunner() /usr/local/go/src/testing/testing.go:1123 +0x202 ```	2021-09-13 11:45:48 +03:00
Anna Shaleva	0fa48691f7	network: do not duplicate MPT nodes in GetMPTNodes response Also tests are added.	2021-09-08 14:25:54 +03:00
Anna Shaleva	3b7807e897	network: request unknown MPT nodes In this commit: 1. Request unknown MPT nodes from peers. Note, that StateSync module itself shouldn't be responsible for nodes requests, that's a server duty. 2. Do not request the same node twice, check if it is in storage already. If so, then the only thing remaining is to update refcounter.	2021-09-07 19:43:27 +03:00
Anna Shaleva	d67ff30704	core: implement statesync module And support GetMPTData and MPTData P2P commands.	2021-09-07 19:43:27 +03:00
Roman Khimov	7808762ba0	transaction: avoid reencoding and reading what can't be read name old time/op new time/op delta DecodeFromBytes-8 1.79µs ± 2% 1.46µs ± 4% -18.44% (p=0.000 n=10+10) name old alloc/op new alloc/op delta DecodeFromBytes-8 800B ± 0% 624B ± 0% -22.00% (p=0.000 n=10+10) name old allocs/op new allocs/op delta DecodeFromBytes-8 10.0 ± 0% 8.0 ± 0% -20.00% (p=0.000 n=10+10)	2021-08-23 21:41:38 +03:00
Roman Khimov	5aff82aef4	Merge pull request #2119 from nspcc-dev/states-exchange/insole core, network: prepare basis for Insole module	2021-08-12 10:35:02 +03:00
Anna Shaleva	72e654332e	core: refactor block queue It requires only two methods from Blockchainer: AddBlock and BlockHeight. New interface will allow to easily reuse the block queue for state exchange purposes.	2021-08-10 13:47:13 +03:00
Roman Khimov	0a2bbf3c04	Merge pull request #2118 from nspcc-dev/neopt2 Networking improvements	2021-08-10 13:29:40 +03:00
Anna Shaleva	6ca7983be8	network: fix typo in error message	2021-08-10 11:00:39 +03:00
Evgeniy Stratonikov	c74de9a579	network: preallocate buffer for message ``` name old time/op new time/op delta MessageBytes-8 740ns ± 0% 684ns ± 2% -7.58% (p=0.000 n=10+10) name old alloc/op new alloc/op delta MessageBytes-8 1.39kB ± 0% 1.20kB ± 0% -13.79% (p=0.000 n=10+10) name old allocs/op new allocs/op delta MessageBytes-8 11.0 ± 0% 10.0 ± 0% -9.09% (p=0.000 n=10+10) ``` Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>	2021-08-10 09:33:52 +03:00
Roman Khimov	7bb82f1f99	network: merge two loops in iteratePeersWithSendMsg, send to 2/3 Refactor code and be fine with sending to just 2/3 of proper peers. Previously it was an edge case, but it can be a normal thing to do also as broadcasting to everyone is obviously too expensive and excessive (hi, #608). Baseline (four node, 10 workers): RPS 8180.760 8137.822 7858.358 7820.011 8051.076 ≈ 8010 ± 2.04% TPS 7819.831 7521.172 7519.023 7242.965 7426.000 ≈ 7506 ± 2.78% CPU % 41.983 38.775 40.606 39.375 35.537 ≈ 39.3 ± 6.15% Mem MB 2947.189 2743.658 2896.688 2813.276 2863.108 ≈ 2853 ± 2.74% Patched: RPS 9714.567 9676.102 9358.609 9371.408 9301.372 ≈ 9484 ± 2.05% ↑ 18.40% TPS 8809.796 8796.854 8534.754 8661.158 8426.162 ≈ 8646 ± 1.92% ↑ 15.19% CPU % 44.980 45.018 33.640 29.645 43.830 ≈ 39.4 ± 18.41% ↑ 0.25% Mem MB 2989.078 2976.577 2306.185 2351.929 2910.479 ≈ 2707 ± 12.80% ↓ 5.12% There is a nuance with this patch however. While typically it works the way outlined above, sometimes it works like this: RPS ≈ 6734.368 TPS ≈ 6299.332 CPU ≈ 25.552% Mem ≈ 2706.046MB And that's because the log looks like this: DeltaTime, TransactionsCount, TPS 5014, 44212, 8817.710 5163, 49690, 9624.249 5166, 49523, 9586.334 5189, 49693, 9576.604 5198, 49339, 9491.920 5147, 49559, 9628.716 5192, 49680, 9568.567 5163, 49750, 9635.871 5183, 49189, 9490.450 5159, 49653, 9624.540 5167, 47945, 9279.079 5179, 2051, 396.022 5015, 4, 0.798 5004, 0, 0.000 5003, 0, 0.000 5003, 0, 0.000 5003, 0, 0.000 5003, 0, 0.000 5004, 0, 0.000 5003, 2925, 584.649 5040, 49099, 9741.865 5161, 49718, 9633.404 5170, 49228, 9521.857 5179, 49773, 9610.543 5167, 47253, 9145.152 5202, 49788, 9570.934 5177, 47704, 9214.603 5209, 46610, 8947.975 5249, 49156, 9364.831 5163, 18284, 3541.352 5072, 174, 34.306 On a network with 4 CNs and 1 RPC node there is 1/256 probability that a block won't be broadcasted to RPC node, so it won't see it until ping timeout kicks in. While it doesn't see a block it can't accept new incoming transactions so the bench gets stuck basically. To me that's an acceptable trade-off because normal networks are much larger than that and the effect of this patch is way more important there, but still that's what we have and we need to take into account.	2021-08-06 21:10:34 +03:00
Roman Khimov	966a16e80e	network: keep track of dead peers in iteratePeersWithSendMsg() send() can return errStateMismatch, errGone and errBusy. errGone means the peer is dead and it won't ever be active again, it doesn't make sense retrying sends to it. errStateMismatch is technically "not yet ready", but we can't wait for it either, no one knows how much will it take to complete handshake. So only errBusy means we can retry. So keep track of dead peers and adjust tries counting appropriately.	2021-08-06 21:10:34 +03:00
Roman Khimov	80f3ec2312	network: move peer filtering to getPeers() It doesn't change much, we can't magically get more valid peers and if some die while we're iterating we'd detect that by an error returned from send().	2021-08-06 21:10:34 +03:00
Roman Khimov	de6f4987f6	network: microoptimize iteratePeersWithSendMsg() Now that s.getPeers() returns a slice we can use slice for `success` too, maps are more expensive.	2021-08-06 21:10:34 +03:00
Roman Khimov	d51db20405	network: randomize peer iteration order While iterating over map in getPeers() is non-deterministic it's not really random enough for our purposes (usually maps have 2-3 paths through them), we need to fill our peers queues more uniformly. Believe it or not, but it does affect performance metrics, baseline (four nodes, 10 workers): RPS ≈ 7791.675 7996.559 7834.504 7746.705 7891.614 ≈ 7852 ± 1.10% TPS ≈ 7241.497 7711.765 7520.211 7425.890 7334.443 ≈ 7447 ± 2.17% CPU % 29.853 39.936 39.945 36.371 39.999 ≈ 37.2 ± 10.57% Mem MB 2749.635 2791.609 2828.610 2910.431 2863.344 ≈ 2829 ± 1.97% Patched: RPS 8180.760 8137.822 7858.358 7820.011 8051.076 ≈ 8010 ± 2.04% ↑ 2.01% TPS 7819.831 7521.172 7519.023 7242.965 7426.000 ≈ 7506 ± 2.78% ↑ 0.79% CPU % 41.983 38.775 40.606 39.375 35.537 ≈ 39.3 ± 6.15% ↑ 5.65% Mem MB 2947.189 2743.658 2896.688 2813.276 2863.108 ≈ 2853 ± 2.74% ↑ 0.85%	2021-08-06 21:10:34 +03:00
Roman Khimov	b55c75d59d	network: hide Peers, make it return a slice Slice is a bit more efficient, we don't need a map for Peers() users and it's not really interesting to outside users, so better hide this method.	2021-08-06 21:10:34 +03:00
Roman Khimov	119b4200ac	network: add fail-fast route for tx double processing When transaction spreads through the network many nodes are likely to get it in roughly the same time. They will rebroadcast it also in roughly the same time. As we have a number of peers it's quite likely that we'd get an Inv with the same transaction from multiple peers simultaneously. We will ask them for this transaction (independently!) and again we're likely to get it in roughly the same time. So we can easily end up with multiple threads processing the same transaction. Only one will succeed, but we can actually easily avoid doing it in the first place saving some CPU cycles for other things. Notice that we can't do it _before_ receiving a transaction because nothing guarantees that the peer will respond to our transaction request, so communication overhead is unavoidable at the moment, but saving on processing already gives quite interesting results. Baseline, four nodes with 10 workers: RPS 7176.784 7014.511 6139.663 7191.280 7080.852 ≈ 6921 ± 5.72% TPS 6945.409 6562.756 5927.050 6681.187 6821.794 ≈ 6588 ± 5.38% CPU % 44.400 43.842 40.418 49.211 49.370 ≈ 45.4 ± 7.53% Mem MB 2693.414 2640.602 2472.007 2731.482 2707.879 ≈ 2649 ± 3.53% Patched: RPS ≈ 7791.675 7996.559 7834.504 7746.705 7891.614 ≈ 7852 ± 1.10% ↑ 13.45% TPS ≈ 7241.497 7711.765 7520.211 7425.890 7334.443 ≈ 7447 ± 2.17% ↑ 13.04% CPU % 29.853 39.936 39.945 36.371 39.999 ≈ 37.2 ± 10.57% ↓ 18.06% Mem MB 2749.635 2791.609 2828.610 2910.431 2863.344 ≈ 2829 ± 1.97% ↑ 6.80%	2021-08-06 21:10:25 +03:00
Roman Khimov	7fc153ed2a	network: only ask mempool for intersections with received Inv Most of the time on healthy network we see new transactions appearing that are not present in the mempool. Once they get into mempool we don't ask for them again when some other peer sends an Inv with them. Then these transactions are usually added into block, removed from mempool and no one actually sends them again to us. Some stale nodes can do that, but it's not very likely to happen. At the receiving end at the same time it's quite expensive to do full chain HasTransaction() query, so if we can avoid doing that it's always good. Here it technically allows resending old transaction that will be re-requested and an attempt to add it to mempool will be made. But it'll inevitably fail because the same HasTransaction() check is done there too. One can try to maliciously flood the node with stale transactions but it doesn't differ from flooding it with any other invalid transactions, so there is no new attack vector added. Baseline, 4 nodes with 10 workers: RPS 6902.296 6465.662 6856.044 6785.515 6157.024 ≈ 6633 ± 4.26% TPS 6468.431 6218.867 6610.565 6288.596 5790.556 ≈ 6275 ± 4.44% CPU % 50.231 42.925 49.481 48.396 42.662 ≈ 46.7 ± 7.01% Mem MB 2856.841 2684.103 2756.195 2733.485 2422.787 ≈ 2691 ± 5.40% Patched: RPS 7176.784 7014.511 6139.663 7191.280 7080.852 ≈ 6921 ± 5.72% ↑ 4.34% TPS 6945.409 6562.756 5927.050 6681.187 6821.794 ≈ 6588 ± 5.38% ↑ 4.99% CPU % 44.400 43.842 40.418 49.211 49.370 ≈ 45.4 ± 7.53% ↓ 2.78% Mem MB 2693.414 2640.602 2472.007 2731.482 2707.879 ≈ 2649 ± 3.53% ↓ 1.56%	2021-08-06 20:53:02 +03:00
Roman Khimov	f78bd6474f	network: handle incoming message in a separate goroutine Network communication takes time. Handling some messages (like transaction) also takes time. We can share this time by making handler a separate goroutine. So while message is being handled receiver can already get and parse the next one. It doesn't improve metrics a lot, but still I think it makes sense and in some scenarios this can be more beneficial than this. `e41fc2fd1b`, 4 nodes, 10 workers RPS 6732.979 6396.160 6759.624 6246.398 6589.841 ≈ 6545 ± 3.02% TPS 6491.062 5984.190 6275.652 5867.477 6360.797 ≈ 6196 ± 3.77% CPU % 42.053 43.515 44.768 40.344 44.112 ≈ 43.0 ± 3.69% Mem MB 2564.130 2744.236 2636.267 2589.505 2765.926 ≈ 2660 ± 3.06% Patched: RPS 6902.296 6465.662 6856.044 6785.515 6157.024 ≈ 6633 ± 4.26% ↑ 1.34% TPS 6468.431 6218.867 6610.565 6288.596 5790.556 ≈ 6275 ± 4.44% ↑ 1.28% CPU % 50.231 42.925 49.481 48.396 42.662 ≈ 46.7 ± 7.01% ↑ 8.60% Mem MB 2856.841 2684.103 2756.195 2733.485 2422.787 ≈ 2691 ± 5.40% ↑ 1.17%	2021-08-06 19:37:37 +03:00
Roman Khimov	f9663a97a1	network: fix Ping messages * NewPing() accepts block index first and nonce then. * Block height should be used, it'll be important for state exchanging nodes	2021-08-06 11:28:09 +03:00
Roman Khimov	1b186e046b	network: use optimized decoder for transactions NewTransactionFromBytes() works a bit faster and uses less memory.	2021-08-04 23:49:07 +03:00
Evgeniy Stratonikov	451b02122a	*: increase GAS for verification Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>	2021-07-14 10:27:09 +03:00
Roman Khimov	b8192d0958	network: optimize waiting in test require.Eventually polls more often reducing average waiting time.	2021-07-08 11:14:35 +03:00
Roman Khimov	9cc4f42a71	network: fix discoverer test Asynchronous tryAddress() routines may get dial result AFTER the switch to another test, so we need to ensure that they'll get the result intended for this particular call. Fixes: 2021-07-07T20:25:40.1624521Z === RUN TestDefaultDiscoverer 2021-07-07T20:25:40.1625316Z discovery_test.go:159: timeout expecting for transport dial; i: 2, j: 1 2021-07-07T20:25:40.1626319Z --- FAIL: TestDefaultDiscoverer (1.19s)	2021-07-08 11:03:30 +03:00

1 2 3 4 5 ...

599 commits