neo-go

mirror of https://github.com/nspcc-dev/neo-go.git synced 2024-11-23 03:38:35 +00:00

Author	SHA1	Message	Date
Roman Khimov	9d6b18adec	network: drop minPoolCount magic constant We have AttemptConnPeers that is closely related, the more we have there the bigger the network supposedly is, so it's much better than magic minPoolCount.	2022-10-24 14:36:10 +03:00
Roman Khimov	af24051bf5	network: sleep a bit before retrying reconnects If Dial() is to exit quickly we can end up in a retry loop eating CPU.	2022-10-24 14:34:48 +03:00
Roman Khimov	f42b8e78fc	Merge pull request #2758 from nspcc-dev/check-inflight-tx-invs network: check inv against currently processed transactions	2022-10-24 14:16:33 +07:00
Roman Khimov	e26055190e	network: check inv against currently processed transactions Sometimes we already have it, but it's not yet processed, so we can save on getdata request. It only affects very high-speed networks like 4-1 scenario and it doesn't affect it a lot, but still we can do it.	2022-10-21 21:16:18 +03:00
Roman Khimov	cfb5058018	network: batch getdata replies This is not exactly the protocol-level batching as was tried in #1770 and proposed by neo-project/neo#2365, but it's a TCP-level change in that we now Write() a set of messages and given that Go sets up TCP sockets with TCP_NODELAY by default this is a substantial change, we have less packets generated with the same amount of data. It doesn't change anything on properly connected networks, but the ones with delays benefit from it a lot. This also improves queueing because we no longer generate 32 messages to deliver on transaction's GetData, it's just one stream of bytes with 32 messages inside. Do the same with GetBlocksByIndex, we can have a lot of messages there too. But don't forget about potential peer DoS attacks, if a peer is to request a lot of big blocks we need to flush them before we process the whole set.	2022-10-21 17:16:32 +03:00
Roman Khimov	e1b5ac9b81	network: separate tx handling from msg handling This allows to naturally scale transaction processing if we have some peer that is sending a lot of them while others are mostly silent. It also can help somewhat in the event we have 50 peers that all send transactions. 4+1 scenario benefits a lot from it, while 7+2 slows down a little. Delayed scenarios don't care. Surprisingly, this also makes disconnects (#2744) much more rare, 4-node scenario almost never sees it now. Most probably this is the case where peers affect each other a lot, single-threaded transaction receiver can be slow enough to trigger some timeout in getdata handler of its peer (because it tries to push a number of replies).	2022-10-21 12:11:24 +03:00
Roman Khimov	e003b67418	network: reuse inventory hash list for request hashes Microoptimization, we can do this because we only use them in handleInvCmd().	2022-10-21 11:28:40 +03:00
Roman Khimov	0f625f04f0	Merge pull request #2748 from nspcc-dev/stop-tx-flow network/consensus: use new dbft StopTxFlow callback	2022-10-18 16:29:37 +07:00
Roman Khimov	73ce898e27	network/consensus: use new dbft StopTxFlow callback It makes sense in general (further narrowing down the time window when transactions are processed by consensus thread) and it improves block times a little too, especially in the 7+2 scenario. Related to #2744.	2022-10-18 11:06:20 +03:00
Roman Khimov	2791127ee4	network: add prometheus histogram with cmd processing time It can be useful to detect some performance issues.	2022-10-17 22:51:16 +03:00
Roman Khimov	73079745ab	Merge pull request #2746 from nspcc-dev/optimize-tx-callbacks network: only call tx callback if we're waiting for transactions	2022-10-17 16:39:41 +07:00
Roman Khimov	dce9f80585	Merge pull request #2743 from nspcc-dev/log-fan-out Logarithmic gossip fan out	2022-10-14 23:18:34 +07:00
Roman Khimov	4dd3fd4ac0	network: only call tx callback if we're waiting for transactions Until the consensus process starts for a new block and until it really needs some transactions we can spare some cycles by not delivering transactions to it. In tests this doesn't affect TPS, but makes block delays a bit more stable. Related to #2744, I think it also may cause timeouts during transaction processing (waiting on the consensus process channel while it does something dBFT-related).	2022-10-14 18:45:48 +03:00
Roman Khimov	65f0fadddb	network: register peer only if it's not a duplicate	2022-10-14 15:53:32 +03:00
Roman Khimov	851cbc7dab	network: implement adaptive peer requests When the network is big enough, MinPeers may be suboptimal for good network connectivity, but if we know the network size we can do some estimation on the number of sufficient peers.	2022-10-14 15:53:32 +03:00
Roman Khimov	c17b2afab5	network: add BroadcastFactor to control gossip, fix #2678	2022-10-14 15:53:32 +03:00
Roman Khimov	215e8704f1	network: simplify discoverer, make it almost a lib We already have two basic lists: connected and unconnected nodes, we don't need an additional channel and we don't need a goroutine to handle it.	2022-10-14 15:53:32 +03:00
Roman Khimov	c1ef326183	network: re-add addresses to the pool on UnregisterConnectedAddr That's what we do anyway, but this way we can be a bit more efficient.	2022-10-14 14:12:33 +03:00
Roman Khimov	631f166709	network: broadcast to log-dependent number of nodes Fixes #608.	2022-10-14 14:12:33 +03:00
Roman Khimov	dc62046019	network: add network size estimation metric	2022-10-12 22:29:55 +03:00
Roman Khimov	bcf77c3c42	network: filter out not-yet-ready nodes when broadcasting They can fail right in the getPeers or they can fail later when packet send is attempted. Of course they can complete handshake in-between these events, but most likely they won't and we'll waste more resources on this attempt. So rule out bad peers immediately.	2022-10-12 16:51:01 +03:00
Roman Khimov	137f2cb192	network: deduplicate TCPPeer code a bit context.Background() is never canceled and has no deadline, so we can avoid duplicating some code.	2022-10-12 15:43:31 +03:00
Roman Khimov	104da8caff	network: broadcast messages, enqueue packets Drop EnqueueP2PPacket, replace EnqueueHPPacket with EnqueueHPMessage. We use Enqueue* when we have a specific per-peer message, it makes zero sense duplicating serialization code for it (unlike Broadcast*).	2022-10-12 15:39:20 +03:00
Roman Khimov	d5f2ad86a1	network: drop unused EnqueueMessage interface from Peer	2022-10-12 15:27:08 +03:00
Roman Khimov	b345581c72	network: pings are broadcasted, don't send them to everyone Follow the general rules of broadcasts, even though it's somewhat different from Inv, we just want to get some reply from our neighbors to see if we're behind. We don't strictly need all neighbors for it.	2022-10-12 15:25:03 +03:00
Roman Khimov	e1d5f18ff4	network: fix outdated Peer interface comments	2022-10-12 10:16:07 +03:00
Roman Khimov	8b26d9475b	network: speculatively set GetAddrSent status Otherwise we routinely get "unexpected addr received" error.	2022-10-11 18:42:40 +03:00
Roman Khimov	e80c60a3b9	network: rework broadcast logic We have a number of queues for different purposes: * regular broadcast queue * direct p2p queue * high-priority queue And two basic egress scenarios: * direct p2p messages (replies to requests in Server's handle* methods) * broadcasted messages Low priority broadcasted messages: * transaction inventories * block inventories * notary inventories * non-consensus extensibles High-priority broadcasted messages: * consensus extensibles * getdata transaction requests from consensus process * getaddr requests P2P messages are a bit more complicated, most of the time they use p2p queue, but extensible message requests/replies use HP queue. Server's handle* code is run from Peer's handleIncoming, every peer has this thread that handles incoming messages. When working with the peer it's important to reply to requests and blocking this thread until we send (queue) a reply is fine, if the peer is slow we just won't get anything new from it. The queue used is irrelevant wrt this issue. Broadcasted messages are radically different, we want them to be delivered to many peers, but we don't care about specific ones. If it's delivered to 2/3 of the peers we're fine, if it's delivered to more of them --- it's not an issue. But doing this fairly is not an easy thing, current code tries performing unblocked sends and if this doesn't yield enough results it then blocks (but has a timeout, we can't wait indefinitely). But it does so in sequential manner, once the peer is chosen the code will wait for it (and only it) until timeout happens. What can be done instead is an attempt to push the message to all of the peers simultaneously (or close to that). If they all deliver --- OK, if some block and wait then we can wait until _any_ of them pushes the message through (or global timeout happens, we still can't wait forever). If we have enough deliveries then we can cancel pending ones and it's again not an error if these canceled threads still do their job. This makes the system more dynamic and adds some substantial processing overhead, but it's a networking code, any of this overhead is much lower than the actual packet delivery time. It also allows to spread the load more fairly, if there is any spare queue it'll get the packet and release the broadcaster. On the next broadcast iteration another peer is more likely to be chosen just because it didn't get a message previously (and had some time to deliver already queued messages). It works perfectly in tests, with optimal networking conditions we have much better block times and TPS increases by 5-25%% depending on the scenario. I'd go as far as to say that it fixes the original problem of #2678, because in this particular scenario we have empty queues in ~100% of the cases and this new logic will likely lead to 100% fan out in this case (cancelation just won't happen fast enough). But when the load grows and there is some waiting in the queue it will optimize out the slowest links.	2022-10-11 18:42:40 +03:00
Roman Khimov	dabdad20ad	network: don't wait indefinitely for packet to be sent Peers can be slow, very slow, slow enough to affect node's regular operation. We can't wait for them indefinitely, there has to be a timeout for send operations. This patch uses TimePerBlock as a reference for its timeout. It's relatively big and it doesn't affect tests much, 4+1 scenarios tend to perform a little worse with while 7+2 scenarios work a little better. The difference is in some percents, but all of these tests easily have 10-15% variations from run to run. It's an important step in making our gossip better because we can't have any behavior where neighbors directly block the node forever, refs. #2678 and	2022-10-10 22:15:21 +03:00
Roman Khimov	317dd42513	: use uintSize and SignatureLen constants where appropriate	2022-10-05 10:45:52 +03:00
Roman Khimov	4f3ffe7290	golangci: enable errorlint and fix everything it found	2022-09-02 18:36:23 +03:00
Roman Khimov	779a5c070f	network: wait for exit in discoverer And synchronize other threads with channels instead of mutexes. Overall this scheme is more reliable.	2022-08-19 22:23:47 +03:00
Roman Khimov	eeeb0f6f0e	core: accept two-side channels for sub/unsub, read on unsub Blockchain's notificationDispatcher sends events to channels and these channels must be read from. Unfortunately, regular service shutdown procedure does unsubscription first (outside of the read loop) and only then drains the channel. While it waits for unsubscription request to be accepted notificationDispatcher can try pushing more data into the same channel which will lead to a deadlock. Reading in the same method solves this, any number of events can be pushed until unsub channel accepts the data.	2022-08-19 22:08:40 +03:00
Roman Khimov	dea75a4211	network: wait for the relayer thread to finish on shutdown Unsubscribe and drain first, then return from the Shutdown method. It's important wrt to subsequent chain shutdown process (normally it's closed right after the network server).	2022-08-19 22:08:40 +03:00
Roman Khimov	155089f4e5	network: drop cleanup from TestVerifyNotaryRequest It never runs the server, so `746644a4eb` was a bit wrong with this.	2022-08-19 20:54:06 +03:00
Anna Shaleva	916f2293b8	*: apply go 1.19 formatter heuristics And make manual corrections where needed. See the "Common mistakes and pitfalls" section of https://tip.golang.org/doc/comment.	2022-08-09 15:37:52 +03:00
Anna Shaleva	bb751535d3	*: bump minimum supported go version Close #2497.	2022-08-08 13:59:32 +03:00
Roman Khimov	9b0ea2c21b	network/consensus: always process dBFT messages as high priority Move category definition from consensus to payload, consensus service is the one of its kind (HP), so network.Server can be adjusted accordingly.	2022-08-02 13:07:18 +03:00
Roman Khimov	94a8784dcb	network: allow to drop services and solve concurrency issues Now that services can come and go we need to protect all of the associated fields and allow to deregister them.	2022-08-02 13:05:39 +03:00
Roman Khimov	5a7fa2d3df	cli: restart consensus service on USR2 Fix #1949. Also drop wallet from the ServerConfig since it's not used in any meaningful way after this change.	2022-08-02 13:05:07 +03:00
Roman Khimov	2e27c3d829	metrics: move package to services Where it belongs.	2022-07-21 23:38:23 +03:00
Anna Shaleva	1ae601787d	network: allow to handle GetMPTData with KeepOnlyLatestState on And adjust documentation along the way.	2022-07-14 14:33:20 +03:00
Roman Khimov	dc59dc991b	config: move metrics.Config into config.BasicService Config package should be as lightweight as possible and now it depends on the whole metrics package just to get one structure from it.	2022-07-08 23:30:30 +03:00
Roman Khimov	3fbc1331aa	Merge pull request #2582 from nspcc-dev/fix-server-sync network: adjust the way (*Server).IsInSync() works	2022-07-05 12:28:20 +03:00
Roman Khimov	9f05009d1a	Merge pull request #2580 from nspcc-dev/service-review Service review	2022-07-05 12:23:25 +03:00
Anna Shaleva	0835581fa9	network: adjust the way (*Server).IsInSync() works Always return true if sync was reached once. Fix #2564.	2022-07-05 12:20:31 +03:00
Roman Khimov	3e2eda6752	*: add some comments to service Start/Shutdown methods	2022-07-04 23:03:50 +03:00
Roman Khimov	c26a962b55	*: use localhost address instead of 127.0.0.1, fix #2575	2022-06-30 16:19:07 +03:00
Anna Shaleva	8ab422da66	*: properly unsubscribe from Blockchain events	2022-06-28 19:09:25 +03:00
Roman Khimov	75d06d18c9	Merge pull request #2466 from nspcc-dev/rules-fixes Rules scope fixes	2022-05-06 11:09:39 +03:00

1 2 3 4 5 ...

553 commits