dstepanov-yadro/neoneo-go

Author	SHA1	Message	Date
Roman Khimov	73ce898e27	network/consensus: use new dbft StopTxFlow callback It makes sense in general (further narrowing down the time window when transactions are processed by consensus thread) and it improves block times a little too, especially in the 7+2 scenario. Related to #2744.	2022-10-18 11:06:20 +03:00
Roman Khimov	73079745ab	Merge pull request #2746 from nspcc-dev/optimize-tx-callbacks network: only call tx callback if we're waiting for transactions	2022-10-17 16:39:41 +07:00
Roman Khimov	dce9f80585	Merge pull request #2743 from nspcc-dev/log-fan-out Logarithmic gossip fan out	2022-10-14 23:18:34 +07:00
Roman Khimov	4dd3fd4ac0	network: only call tx callback if we're waiting for transactions Until the consensus process starts for a new block and until it really needs some transactions we can spare some cycles by not delivering transactions to it. In tests this doesn't affect TPS, but makes block delays a bit more stable. Related to #2744, I think it also may cause timeouts during transaction processing (waiting on the consensus process channel while it does something dBFT-related).	2022-10-14 18:45:48 +03:00
Roman Khimov	65f0fadddb	network: register peer only if it's not a duplicate	2022-10-14 15:53:32 +03:00
Roman Khimov	851cbc7dab	network: implement adaptive peer requests When the network is big enough, MinPeers may be suboptimal for good network connectivity, but if we know the network size we can do some estimation on the number of sufficient peers.	2022-10-14 15:53:32 +03:00
Roman Khimov	c17b2afab5	network: add BroadcastFactor to control gossip, fix #2678	2022-10-14 15:53:32 +03:00
Roman Khimov	215e8704f1	network: simplify discoverer, make it almost a lib We already have two basic lists: connected and unconnected nodes, we don't need an additional channel and we don't need a goroutine to handle it.	2022-10-14 15:53:32 +03:00
Roman Khimov	c1ef326183	network: re-add addresses to the pool on UnregisterConnectedAddr That's what we do anyway, but this way we can be a bit more efficient.	2022-10-14 14:12:33 +03:00
Roman Khimov	631f166709	network: broadcast to log-dependent number of nodes Fixes #608.	2022-10-14 14:12:33 +03:00
Roman Khimov	bcf77c3c42	network: filter out not-yet-ready nodes when broadcasting They can fail right in the getPeers or they can fail later when packet send is attempted. Of course they can complete handshake in-between these events, but most likely they won't and we'll waste more resources on this attempt. So rule out bad peers immediately.	2022-10-12 16:51:01 +03:00
Roman Khimov	104da8caff	network: broadcast messages, enqueue packets Drop EnqueueP2PPacket, replace EnqueueHPPacket with EnqueueHPMessage. We use Enqueue* when we have a specific per-peer message, it makes zero sense duplicating serialization code for it (unlike Broadcast*).	2022-10-12 15:39:20 +03:00
Roman Khimov	b345581c72	network: pings are broadcasted, don't send them to everyone Follow the general rules of broadcasts, even though it's somewhat different from Inv, we just want to get some reply from our neighbors to see if we're behind. We don't strictly need all neighbors for it.	2022-10-12 15:25:03 +03:00
Roman Khimov	8b26d9475b	network: speculatively set GetAddrSent status Otherwise we routinely get "unexpected addr received" error.	2022-10-11 18:42:40 +03:00
Roman Khimov	e80c60a3b9	network: rework broadcast logic We have a number of queues for different purposes: * regular broadcast queue * direct p2p queue * high-priority queue And two basic egress scenarios: * direct p2p messages (replies to requests in Server's handle* methods) * broadcasted messages Low priority broadcasted messages: * transaction inventories * block inventories * notary inventories * non-consensus extensibles High-priority broadcasted messages: * consensus extensibles * getdata transaction requests from consensus process * getaddr requests P2P messages are a bit more complicated, most of the time they use p2p queue, but extensible message requests/replies use HP queue. Server's handle* code is run from Peer's handleIncoming, every peer has this thread that handles incoming messages. When working with the peer it's important to reply to requests and blocking this thread until we send (queue) a reply is fine, if the peer is slow we just won't get anything new from it. The queue used is irrelevant wrt this issue. Broadcasted messages are radically different, we want them to be delivered to many peers, but we don't care about specific ones. If it's delivered to 2/3 of the peers we're fine, if it's delivered to more of them --- it's not an issue. But doing this fairly is not an easy thing, current code tries performing unblocked sends and if this doesn't yield enough results it then blocks (but has a timeout, we can't wait indefinitely). But it does so in sequential manner, once the peer is chosen the code will wait for it (and only it) until timeout happens. What can be done instead is an attempt to push the message to all of the peers simultaneously (or close to that). If they all deliver --- OK, if some block and wait then we can wait until _any_ of them pushes the message through (or global timeout happens, we still can't wait forever). If we have enough deliveries then we can cancel pending ones and it's again not an error if these canceled threads still do their job. This makes the system more dynamic and adds some substantial processing overhead, but it's a networking code, any of this overhead is much lower than the actual packet delivery time. It also allows to spread the load more fairly, if there is any spare queue it'll get the packet and release the broadcaster. On the next broadcast iteration another peer is more likely to be chosen just because it didn't get a message previously (and had some time to deliver already queued messages). It works perfectly in tests, with optimal networking conditions we have much better block times and TPS increases by 5-25%% depending on the scenario. I'd go as far as to say that it fixes the original problem of #2678, because in this particular scenario we have empty queues in ~100% of the cases and this new logic will likely lead to 100% fan out in this case (cancelation just won't happen fast enough). But when the load grows and there is some waiting in the queue it will optimize out the slowest links.	2022-10-11 18:42:40 +03:00
Roman Khimov	dabdad20ad	network: don't wait indefinitely for packet to be sent Peers can be slow, very slow, slow enough to affect node's regular operation. We can't wait for them indefinitely, there has to be a timeout for send operations. This patch uses TimePerBlock as a reference for its timeout. It's relatively big and it doesn't affect tests much, 4+1 scenarios tend to perform a little worse with while 7+2 scenarios work a little better. The difference is in some percents, but all of these tests easily have 10-15% variations from run to run. It's an important step in making our gossip better because we can't have any behavior where neighbors directly block the node forever, refs. #2678 and	2022-10-10 22:15:21 +03:00
Roman Khimov	4f3ffe7290	golangci: enable errorlint and fix everything it found	2022-09-02 18:36:23 +03:00
Roman Khimov	eeeb0f6f0e	core: accept two-side channels for sub/unsub, read on unsub Blockchain's notificationDispatcher sends events to channels and these channels must be read from. Unfortunately, regular service shutdown procedure does unsubscription first (outside of the read loop) and only then drains the channel. While it waits for unsubscription request to be accepted notificationDispatcher can try pushing more data into the same channel which will lead to a deadlock. Reading in the same method solves this, any number of events can be pushed until unsub channel accepts the data.	2022-08-19 22:08:40 +03:00
Roman Khimov	dea75a4211	network: wait for the relayer thread to finish on shutdown Unsubscribe and drain first, then return from the Shutdown method. It's important wrt to subsequent chain shutdown process (normally it's closed right after the network server).	2022-08-19 22:08:40 +03:00
Anna Shaleva	916f2293b8	*: apply go 1.19 formatter heuristics And make manual corrections where needed. See the "Common mistakes and pitfalls" section of https://tip.golang.org/doc/comment.	2022-08-09 15:37:52 +03:00
Roman Khimov	9b0ea2c21b	network/consensus: always process dBFT messages as high priority Move category definition from consensus to payload, consensus service is the one of its kind (HP), so network.Server can be adjusted accordingly.	2022-08-02 13:07:18 +03:00
Roman Khimov	94a8784dcb	network: allow to drop services and solve concurrency issues Now that services can come and go we need to protect all of the associated fields and allow to deregister them.	2022-08-02 13:05:39 +03:00
Anna Shaleva	1ae601787d	network: allow to handle GetMPTData with KeepOnlyLatestState on And adjust documentation along the way.	2022-07-14 14:33:20 +03:00
Roman Khimov	3fbc1331aa	Merge pull request #2582 from nspcc-dev/fix-server-sync network: adjust the way (*Server).IsInSync() works	2022-07-05 12:28:20 +03:00
Anna Shaleva	0835581fa9	network: adjust the way (*Server).IsInSync() works Always return true if sync was reached once. Fix #2564.	2022-07-05 12:20:31 +03:00
Roman Khimov	3e2eda6752	*: add some comments to service Start/Shutdown methods	2022-07-04 23:03:50 +03:00
Anna Shaleva	8ab422da66	*: properly unsubscribe from Blockchain events	2022-06-28 19:09:25 +03:00
Elizaveta Chichindaeva	28908aa3cf	[#2442 ] English Check Signed-off-by: Elizaveta Chichindaeva <elizaveta@nspcc.ru>	2022-05-04 19:48:27 +03:00
Roman Khimov	2593bb0535	network: extend Service with Name, use it to distinguish services	2022-04-26 00:31:48 +03:00
Roman Khimov	e621f746a7	config/core: allow to change the number of validators Fixes #2320.	2022-01-31 23:14:38 +03:00
Roman Khimov	60d6fa1125	network: keep a copy of the config inside of Server Avoid copying the configuration again and again, make things a bit more efficient.	2022-01-24 18:43:01 +03:00
Roman Khimov	89d754da6f	network: don't request blocks we already have in the queue Fixes #2258.	2022-01-18 00:04:41 +03:00
Roman Khimov	bc6d6e58bc	network: always pass transactions to consensus process Consensus can require conflicting transactions and it can require more transactions than mempool can fit, all of this should work. Transactions will be checked anyway using its secondary mempool. See the scenario from #668.	2022-01-14 20:08:40 +03:00
Roman Khimov	746644a4eb	network: decouple it from blockchainer.Blockchainer We don't need all of it.	2022-01-14 19:57:16 +03:00
Roman Khimov	bf1604454c	blockchainer/network: move StateSync interface to the user Only network package cares about it.	2022-01-14 19:57:14 +03:00
Roman Khimov	af87cb082f	network: decouple Server from the notary service	2022-01-14 19:55:53 +03:00
Roman Khimov	508d36f698	network: drop consensus dependency	2022-01-14 19:55:53 +03:00
Roman Khimov	66aafd868b	network: unplug stateroot service from the Server Notice that it makes the node accept Extensible payloads with any category which is the same way C# node works. We're trusting Extensible senders, improper payloads are harmless until they DoS the network, but we have some protections against that too (and spamming with proper category doesn't differ a lot).	2022-01-14 19:55:50 +03:00
Roman Khimov	0ad3ea5944	network/cli: move Oracle service instantiation out of the network	2022-01-14 19:53:45 +03:00
Roman Khimov	5dd4db2c02	network/services: unify service lifecycle management Run with Start, Stop with Shutdown, make behavior uniform.	2022-01-14 19:53:45 +03:00
Roman Khimov	c942402957	blockchainer: drop Policer interface We never use it as a proper interface, so it makes no sense keeping it this way.	2022-01-12 00:58:03 +03:00
Roman Khimov	2eeec73770	network: don't panic if there is no reason for disconnect Although error should always be there, we shouldn't fail like this if it's not: \| panic: runtime error: invalid memory address or nil pointer dereference \| [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xc8884c] \| \| goroutine 113 [running]: \| github.com/nspcc-dev/neo-go/pkg/network.(Server).run(0xc000150580) \| github.com/nspcc-dev/neo-go/pkg/network/server.go:396 +0x7ac \| github.com/nspcc-dev/neo-go/pkg/network.(Server).Start(0xc000150580, 0x0) \| github.com/nspcc-dev/neo-go/pkg/network/server.go:294 +0x3fb \| created by github.com/nspcc-dev/neo-go/cli/server.startServer \| github.com/nspcc-dev/neo-go/cli/server/server.go:344 +0x56f	2021-11-01 12:19:00 +03:00
AnnaShaleva	2d196b3f35	rpc: refactor `calculatenetworkfee` handler Use (Blockchainer).VerifyWitness() to calculate network fee for contract-based witnesses.	2021-10-25 19:07:25 +03:00
Evgeniy Stratonikov	4dd3a0d503	network: request headers in parallel, fix #2158 Do this similarly to how blocks are requested. See also `4aa1a37`. Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>	2021-10-06 15:25:54 +03:00
Anna Shaleva	0fa48691f7	network: do not duplicate MPT nodes in GetMPTNodes response Also tests are added.	2021-09-08 14:25:54 +03:00
Anna Shaleva	3b7807e897	network: request unknown MPT nodes In this commit: 1. Request unknown MPT nodes from peers. Note, that StateSync module itself shouldn't be responsible for nodes requests, that's a server duty. 2. Do not request the same node twice, check if it is in storage already. If so, then the only thing remaining is to update refcounter.	2021-09-07 19:43:27 +03:00
Anna Shaleva	d67ff30704	core: implement statesync module And support GetMPTData and MPTData P2P commands.	2021-09-07 19:43:27 +03:00
Roman Khimov	5aff82aef4	Merge pull request #2119 from nspcc-dev/states-exchange/insole core, network: prepare basis for Insole module	2021-08-12 10:35:02 +03:00
Anna Shaleva	6ca7983be8	network: fix typo in error message	2021-08-10 11:00:39 +03:00
Roman Khimov	7bb82f1f99	network: merge two loops in iteratePeersWithSendMsg, send to 2/3 Refactor code and be fine with sending to just 2/3 of proper peers. Previously it was an edge case, but it can be a normal thing to do also as broadcasting to everyone is obviously too expensive and excessive (hi, #608). Baseline (four node, 10 workers): RPS 8180.760 8137.822 7858.358 7820.011 8051.076 ≈ 8010 ± 2.04% TPS 7819.831 7521.172 7519.023 7242.965 7426.000 ≈ 7506 ± 2.78% CPU % 41.983 38.775 40.606 39.375 35.537 ≈ 39.3 ± 6.15% Mem MB 2947.189 2743.658 2896.688 2813.276 2863.108 ≈ 2853 ± 2.74% Patched: RPS 9714.567 9676.102 9358.609 9371.408 9301.372 ≈ 9484 ± 2.05% ↑ 18.40% TPS 8809.796 8796.854 8534.754 8661.158 8426.162 ≈ 8646 ± 1.92% ↑ 15.19% CPU % 44.980 45.018 33.640 29.645 43.830 ≈ 39.4 ± 18.41% ↑ 0.25% Mem MB 2989.078 2976.577 2306.185 2351.929 2910.479 ≈ 2707 ± 12.80% ↓ 5.12% There is a nuance with this patch however. While typically it works the way outlined above, sometimes it works like this: RPS ≈ 6734.368 TPS ≈ 6299.332 CPU ≈ 25.552% Mem ≈ 2706.046MB And that's because the log looks like this: DeltaTime, TransactionsCount, TPS 5014, 44212, 8817.710 5163, 49690, 9624.249 5166, 49523, 9586.334 5189, 49693, 9576.604 5198, 49339, 9491.920 5147, 49559, 9628.716 5192, 49680, 9568.567 5163, 49750, 9635.871 5183, 49189, 9490.450 5159, 49653, 9624.540 5167, 47945, 9279.079 5179, 2051, 396.022 5015, 4, 0.798 5004, 0, 0.000 5003, 0, 0.000 5003, 0, 0.000 5003, 0, 0.000 5003, 0, 0.000 5004, 0, 0.000 5003, 2925, 584.649 5040, 49099, 9741.865 5161, 49718, 9633.404 5170, 49228, 9521.857 5179, 49773, 9610.543 5167, 47253, 9145.152 5202, 49788, 9570.934 5177, 47704, 9214.603 5209, 46610, 8947.975 5249, 49156, 9364.831 5163, 18284, 3541.352 5072, 174, 34.306 On a network with 4 CNs and 1 RPC node there is 1/256 probability that a block won't be broadcasted to RPC node, so it won't see it until ping timeout kicks in. While it doesn't see a block it can't accept new incoming transactions so the bench gets stuck basically. To me that's an acceptable trade-off because normal networks are much larger than that and the effect of this patch is way more important there, but still that's what we have and we need to take into account.	2021-08-06 21:10:34 +03:00

1 2 3 4 5 ...

258 commits