neo-go

mirror of https://github.com/nspcc-dev/neo-go.git synced 2025-02-28 03:56:26 +00:00

Author	SHA1	Message	Date
Anna Shaleva	28927228f0	*: adjust subscription-related doc Add a warning about received events modification where applicable.	2023-01-17 17:11:19 +03:00
Anna Shaleva	9b364aa7ee	network: do not allow to request invalid block count The problem is in peer disconnection due to invalid GetBlockByIndex payload (the logs are from some patched neo-go version): ``` дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.490Z INFO new peer connected {"addr": "10.78.69.115:50846", "peerCount": 3} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.490Z WARN peer disconnected {"addr": "10.78.69.115:50846", "error": "invalid block count", "peerCount": 2} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.490Z INFO started protocol {"addr": "10.78.69.115:50846", "userAgent": "/NEO-GO:1.0.0/", "startHeight": 0, "id": 1339571820} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.491Z INFO new peer connected {"addr": "10.78.69.115:50856", "peerCount": 3} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.492Z WARN peer disconnected {"addr": "10.78.69.115:50856", "error": "invalid block count", "peerCount": 2} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.492Z INFO started protocol {"addr": "10.78.69.115:50856", "userAgent": "/NEO-GO:1.0.0/", "startHeight": 0, "id": 1339571820} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.492Z INFO new peer connected {"addr": "10.78.69.115:50858", "peerCount": 3} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.493Z INFO started protocol {"addr": "10.78.69.115:50858", "userAgent": "/NEO-GO:1.0.0/", "startHeight": 0, "id": 1339571820} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.493Z WARN peer disconnected {"addr": "10.78.69.115:50858", "error": "invalid block count", "peerCount": 2} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.494Z INFO new peer connected {"addr": "10.78.69.115:50874", "peerCount": 3} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.494Z INFO started protocol {"addr": "10.78.69.115:50874", "userAgent": "/NEO-GO:1.0.0/", "startHeight": 0, "id": 1339571820} дек 15 16:02:39 glagoli neo-go[928530]: 2022-12-15T16:02:39.494Z WARN peer disconnected {"addr": "10.78.69.115:50874", "error": "invalid block count", "peerCount": 2} ``` GetBlockByIndex payload can't be decoded, and the only possible cause is zero (or <-1, but it's probably not the case) block count requested. Error is improved as far.	2022-12-28 13:04:56 +03:00
Anna Shaleva	c0a453a53b	network: adjust requestBlocs logic If the lastQueued block index is the same as the one we'd like to request in payload, then we need to increment the payload's count.	2022-12-28 12:50:30 +03:00
Roman Khimov	e79dec15f9	*: use zap.Stringer instead of zap.String where it can be used It's a bit more efficient in case we're not logging the message (mostly for debug), makes the code somewhat simpler as well.	2022-12-13 12:44:54 +03:00
Roman Khimov	7589733017	config: add a special Blockchain type to configure Blockchain And include some node-specific configurations there with backwards compatibility. Note that in the future we'll remove Ledger's fields from the ProtocolConfiguration and it'll be possible to access them in Blockchain directly (not via .Ledger). The other option tried was using two configuration types separately, but that incurs more changes to the codebase, single structure that behaves almost like the old one is better for backwards compatibility. Fixes #2676.	2022-12-07 17:35:53 +03:00
Anna Shaleva	82221b0ca7	*: fix Neo and NeoGo misuses	2022-12-07 17:29:09 +03:00
Anna Shaleva	54c2aa8582	config: move P2P options to a separate config section And convert time-related settings to a Duration format along the way.	2022-12-07 13:06:05 +03:00
Anna Shaleva	9cf6cc61f4	network: allow multiple bind addresses for server And replace Transporter.Address() with Transporter.HostPort() along the way.	2022-12-07 13:06:03 +03:00
Roman Khimov	c2adbf768b	config: add TimePerBlock to replace SecondsPerBlock It's more generic and convenient than MillisecondsPerBlock. This setting is made in backwards-compatible fashion, but it'll override SecondsPerBlock if both are used. Configurations are specifically not changed here, it's important to check compatibility. Fixes #2675.	2022-12-02 19:52:14 +03:00
Roman Khimov	0ad6e295ea	core: make GetHeaderHash accept uint32 It should've always been this way because block indexes are uint32.	2022-11-25 14:30:51 +03:00
Roman Khimov	b8c09f509f	network: add random slight delay to connection attempts Small (especially dockerized/virtualized) networks often start all nodes at ones and then we see a lot of connection flapping in the log. This happens because nodes try to connect to each other simultaneously, establish two connections, then each one finds a duplicate and drops it, but this can be different duplicate connections on other sides, so they retry and it all happens for some time. Eventually everything settles, but we have a lot of garbage in the log and a lot of useless attempts. This random waiting timeout doesn't change the logic much, adds a minimal delay, but increases chances for both nodes to establish a proper single connection on both sides to only then see another one and drop it on both sides as well. It leads to almost no flapping in small networks, doesn't affect much bigger ones. The delay is close to unnoticeable especially if there is something in the DB for node to process during startup.	2022-11-17 18:42:43 +03:00
Roman Khimov	075a54192c	network: don't try too many connections Consider mainnet, it has an AttemptConnPeers of 20, so may already have 3 peers and request 20 more, then have 4th connected and attemtp 20 more again, this leads to a huge number of connections easily.	2022-11-17 18:03:04 +03:00
Roman Khimov	6bce973ac2	network: drop duplicationg check from handleAddrCmd() It was relevant with the queue-based discoverer, now it's not, discoverer handles this internally.	2022-11-17 17:42:36 +03:00
Roman Khimov	1c7487b8e4	network: add a timer to check for peers Consider initial connection phase for public networks: * simultaneous connections to seeds * very quick handshakes * got five handshaked peers and some getaddr requests sent * but addr replies won't trigger new connections * so we can stay with just five connections until any of them breaks or a (long) address checking timer fires This new timers solves the problem, it's adaptive at the same time. If we have enough peers we won't be waking up often.	2022-11-17 17:32:05 +03:00
Roman Khimov	23f118a1a9	network: rework discoverer/server interaction * treat connected/handshaked peers separately in the discoverer, save "original" address for connected ones, it can be a name instead of IP and it's important to keep it to avoid reconnections * store name->IP mapping for seeds if and when they're connected to avoid reconnections * block seed if it's detected to be our own node (which is often the case for small private networks) * add an event for handshaked peers in the server, connected but non-handshaked ones are not really helpful for MinPeers or GetAddr logic Fixes #2796.	2022-11-17 17:07:19 +03:00
Roman Khimov	6ba4afc977	network: consider handshaked peers only when comparing with MinPeers We don't know a lot about non-handshaked ones, so it's safer to try more connection.	2022-11-17 16:40:29 +03:00
Anna Shaleva	6f3a0a6b4c	network: adjust warning for deposit expiration Provide additional info for better user experience.	2022-11-15 14:16:34 +03:00
Roman Khimov	c405092953	network: pre-filter transactions going into dbft Drop some load from dbft loop during consensus process.	2022-11-11 15:32:51 +03:00
Roman Khimov	e19d867d4e	Merge pull request #2761 from nspcc-dev/fancy-getaddr Fancy getaddr	2022-10-25 16:51:38 +07:00
Roman Khimov	28f54d352a	network: do getaddr requests periodically, fix #2745 Every 1000 blocks seems to be OK for big networks (that only had done some initial requests previously and then effectively never requested addresses again because there was a sufficient number of addresses), won't hurt smaller ones as well (that effectively keep doing this on every connect/disconnect, peer changes are very rare there, but when they happen we want to have some quick reaction to these changes).	2022-10-24 15:10:51 +03:00
Roman Khimov	9efc110058	network: it is 42 32 is a very good number, but we all know 42 is a better one. And it can even be proven by tests with higher peaking TPS values. You may wonder why is it so good? Because we're using packet-switching networks mostly and a packet is a packet almost irrespectively of how bit it is. Yet a packet has some maximum possible size (hi, MTU) and this size most of the time is 1500 (or a little less than that, hi VPN). Subtract IP header (20 for IPv4 or 40 for IPv6 not counting options), TCP header (another 20) and Neo message/payload headers (~8 for this case) and we have just a little more than 1400 bytes for our dear hashes. Which means that in a single packet most of the time we can have 42-44 of them, maybe 45. Choosing between these numbers is not hard then.	2022-10-24 14:44:19 +03:00
Roman Khimov	9d6b18adec	network: drop minPoolCount magic constant We have AttemptConnPeers that is closely related, the more we have there the bigger the network supposedly is, so it's much better than magic minPoolCount.	2022-10-24 14:36:10 +03:00
Roman Khimov	af24051bf5	network: sleep a bit before retrying reconnects If Dial() is to exit quickly we can end up in a retry loop eating CPU.	2022-10-24 14:34:48 +03:00
Roman Khimov	f42b8e78fc	Merge pull request #2758 from nspcc-dev/check-inflight-tx-invs network: check inv against currently processed transactions	2022-10-24 14:16:33 +07:00
Roman Khimov	e26055190e	network: check inv against currently processed transactions Sometimes we already have it, but it's not yet processed, so we can save on getdata request. It only affects very high-speed networks like 4-1 scenario and it doesn't affect it a lot, but still we can do it.	2022-10-21 21:16:18 +03:00
Roman Khimov	cfb5058018	network: batch getdata replies This is not exactly the protocol-level batching as was tried in #1770 and proposed by neo-project/neo#2365, but it's a TCP-level change in that we now Write() a set of messages and given that Go sets up TCP sockets with TCP_NODELAY by default this is a substantial change, we have less packets generated with the same amount of data. It doesn't change anything on properly connected networks, but the ones with delays benefit from it a lot. This also improves queueing because we no longer generate 32 messages to deliver on transaction's GetData, it's just one stream of bytes with 32 messages inside. Do the same with GetBlocksByIndex, we can have a lot of messages there too. But don't forget about potential peer DoS attacks, if a peer is to request a lot of big blocks we need to flush them before we process the whole set.	2022-10-21 17:16:32 +03:00
Roman Khimov	e1b5ac9b81	network: separate tx handling from msg handling This allows to naturally scale transaction processing if we have some peer that is sending a lot of them while others are mostly silent. It also can help somewhat in the event we have 50 peers that all send transactions. 4+1 scenario benefits a lot from it, while 7+2 slows down a little. Delayed scenarios don't care. Surprisingly, this also makes disconnects (#2744) much more rare, 4-node scenario almost never sees it now. Most probably this is the case where peers affect each other a lot, single-threaded transaction receiver can be slow enough to trigger some timeout in getdata handler of its peer (because it tries to push a number of replies).	2022-10-21 12:11:24 +03:00
Roman Khimov	e003b67418	network: reuse inventory hash list for request hashes Microoptimization, we can do this because we only use them in handleInvCmd().	2022-10-21 11:28:40 +03:00
Roman Khimov	0f625f04f0	Merge pull request #2748 from nspcc-dev/stop-tx-flow network/consensus: use new dbft StopTxFlow callback	2022-10-18 16:29:37 +07:00
Roman Khimov	73ce898e27	network/consensus: use new dbft StopTxFlow callback It makes sense in general (further narrowing down the time window when transactions are processed by consensus thread) and it improves block times a little too, especially in the 7+2 scenario. Related to #2744.	2022-10-18 11:06:20 +03:00
Roman Khimov	2791127ee4	network: add prometheus histogram with cmd processing time It can be useful to detect some performance issues.	2022-10-17 22:51:16 +03:00
Roman Khimov	73079745ab	Merge pull request #2746 from nspcc-dev/optimize-tx-callbacks network: only call tx callback if we're waiting for transactions	2022-10-17 16:39:41 +07:00
Roman Khimov	dce9f80585	Merge pull request #2743 from nspcc-dev/log-fan-out Logarithmic gossip fan out	2022-10-14 23:18:34 +07:00
Roman Khimov	4dd3fd4ac0	network: only call tx callback if we're waiting for transactions Until the consensus process starts for a new block and until it really needs some transactions we can spare some cycles by not delivering transactions to it. In tests this doesn't affect TPS, but makes block delays a bit more stable. Related to #2744, I think it also may cause timeouts during transaction processing (waiting on the consensus process channel while it does something dBFT-related).	2022-10-14 18:45:48 +03:00
Roman Khimov	65f0fadddb	network: register peer only if it's not a duplicate	2022-10-14 15:53:32 +03:00
Roman Khimov	851cbc7dab	network: implement adaptive peer requests When the network is big enough, MinPeers may be suboptimal for good network connectivity, but if we know the network size we can do some estimation on the number of sufficient peers.	2022-10-14 15:53:32 +03:00
Roman Khimov	c17b2afab5	network: add BroadcastFactor to control gossip, fix #2678	2022-10-14 15:53:32 +03:00
Roman Khimov	215e8704f1	network: simplify discoverer, make it almost a lib We already have two basic lists: connected and unconnected nodes, we don't need an additional channel and we don't need a goroutine to handle it.	2022-10-14 15:53:32 +03:00
Roman Khimov	c1ef326183	network: re-add addresses to the pool on UnregisterConnectedAddr That's what we do anyway, but this way we can be a bit more efficient.	2022-10-14 14:12:33 +03:00
Roman Khimov	631f166709	network: broadcast to log-dependent number of nodes Fixes #608.	2022-10-14 14:12:33 +03:00
Roman Khimov	dc62046019	network: add network size estimation metric	2022-10-12 22:29:55 +03:00
Roman Khimov	bcf77c3c42	network: filter out not-yet-ready nodes when broadcasting They can fail right in the getPeers or they can fail later when packet send is attempted. Of course they can complete handshake in-between these events, but most likely they won't and we'll waste more resources on this attempt. So rule out bad peers immediately.	2022-10-12 16:51:01 +03:00
Roman Khimov	137f2cb192	network: deduplicate TCPPeer code a bit context.Background() is never canceled and has no deadline, so we can avoid duplicating some code.	2022-10-12 15:43:31 +03:00
Roman Khimov	104da8caff	network: broadcast messages, enqueue packets Drop EnqueueP2PPacket, replace EnqueueHPPacket with EnqueueHPMessage. We use Enqueue* when we have a specific per-peer message, it makes zero sense duplicating serialization code for it (unlike Broadcast*).	2022-10-12 15:39:20 +03:00
Roman Khimov	d5f2ad86a1	network: drop unused EnqueueMessage interface from Peer	2022-10-12 15:27:08 +03:00
Roman Khimov	b345581c72	network: pings are broadcasted, don't send them to everyone Follow the general rules of broadcasts, even though it's somewhat different from Inv, we just want to get some reply from our neighbors to see if we're behind. We don't strictly need all neighbors for it.	2022-10-12 15:25:03 +03:00
Roman Khimov	e1d5f18ff4	network: fix outdated Peer interface comments	2022-10-12 10:16:07 +03:00
Roman Khimov	8b26d9475b	network: speculatively set GetAddrSent status Otherwise we routinely get "unexpected addr received" error.	2022-10-11 18:42:40 +03:00
Roman Khimov	e80c60a3b9	network: rework broadcast logic We have a number of queues for different purposes: * regular broadcast queue * direct p2p queue * high-priority queue And two basic egress scenarios: * direct p2p messages (replies to requests in Server's handle* methods) * broadcasted messages Low priority broadcasted messages: * transaction inventories * block inventories * notary inventories * non-consensus extensibles High-priority broadcasted messages: * consensus extensibles * getdata transaction requests from consensus process * getaddr requests P2P messages are a bit more complicated, most of the time they use p2p queue, but extensible message requests/replies use HP queue. Server's handle* code is run from Peer's handleIncoming, every peer has this thread that handles incoming messages. When working with the peer it's important to reply to requests and blocking this thread until we send (queue) a reply is fine, if the peer is slow we just won't get anything new from it. The queue used is irrelevant wrt this issue. Broadcasted messages are radically different, we want them to be delivered to many peers, but we don't care about specific ones. If it's delivered to 2/3 of the peers we're fine, if it's delivered to more of them --- it's not an issue. But doing this fairly is not an easy thing, current code tries performing unblocked sends and if this doesn't yield enough results it then blocks (but has a timeout, we can't wait indefinitely). But it does so in sequential manner, once the peer is chosen the code will wait for it (and only it) until timeout happens. What can be done instead is an attempt to push the message to all of the peers simultaneously (or close to that). If they all deliver --- OK, if some block and wait then we can wait until _any_ of them pushes the message through (or global timeout happens, we still can't wait forever). If we have enough deliveries then we can cancel pending ones and it's again not an error if these canceled threads still do their job. This makes the system more dynamic and adds some substantial processing overhead, but it's a networking code, any of this overhead is much lower than the actual packet delivery time. It also allows to spread the load more fairly, if there is any spare queue it'll get the packet and release the broadcaster. On the next broadcast iteration another peer is more likely to be chosen just because it didn't get a message previously (and had some time to deliver already queued messages). It works perfectly in tests, with optimal networking conditions we have much better block times and TPS increases by 5-25%% depending on the scenario. I'd go as far as to say that it fixes the original problem of #2678, because in this particular scenario we have empty queues in ~100% of the cases and this new logic will likely lead to 100% fan out in this case (cancelation just won't happen fast enough). But when the load grows and there is some waiting in the queue it will optimize out the slowest links.	2022-10-11 18:42:40 +03:00
Roman Khimov	dabdad20ad	network: don't wait indefinitely for packet to be sent Peers can be slow, very slow, slow enough to affect node's regular operation. We can't wait for them indefinitely, there has to be a timeout for send operations. This patch uses TimePerBlock as a reference for its timeout. It's relatively big and it doesn't affect tests much, 4+1 scenarios tend to perform a little worse with while 7+2 scenarios work a little better. The difference is in some percents, but all of these tests easily have 10-15% variations from run to run. It's an important step in making our gossip better because we can't have any behavior where neighbors directly block the node forever, refs. #2678 and	2022-10-10 22:15:21 +03:00

1 2 3 4 5 ...

574 commits