neo-go

mirror of https://github.com/nspcc-dev/neo-go.git synced 2024-11-29 23:33:37 +00:00

Author	SHA1	Message	Date
Roman Khimov	cfb5058018	network: batch getdata replies This is not exactly the protocol-level batching as was tried in #1770 and proposed by neo-project/neo#2365, but it's a TCP-level change in that we now Write() a set of messages and given that Go sets up TCP sockets with TCP_NODELAY by default this is a substantial change, we have less packets generated with the same amount of data. It doesn't change anything on properly connected networks, but the ones with delays benefit from it a lot. This also improves queueing because we no longer generate 32 messages to deliver on transaction's GetData, it's just one stream of bytes with 32 messages inside. Do the same with GetBlocksByIndex, we can have a lot of messages there too. But don't forget about potential peer DoS attacks, if a peer is to request a lot of big blocks we need to flush them before we process the whole set.	2022-10-21 17:16:32 +03:00
Roman Khimov	137f2cb192	network: deduplicate TCPPeer code a bit context.Background() is never canceled and has no deadline, so we can avoid duplicating some code.	2022-10-12 15:43:31 +03:00
Roman Khimov	104da8caff	network: broadcast messages, enqueue packets Drop EnqueueP2PPacket, replace EnqueueHPPacket with EnqueueHPMessage. We use Enqueue* when we have a specific per-peer message, it makes zero sense duplicating serialization code for it (unlike Broadcast*).	2022-10-12 15:39:20 +03:00
Roman Khimov	d5f2ad86a1	network: drop unused EnqueueMessage interface from Peer	2022-10-12 15:27:08 +03:00
Roman Khimov	b345581c72	network: pings are broadcasted, don't send them to everyone Follow the general rules of broadcasts, even though it's somewhat different from Inv, we just want to get some reply from our neighbors to see if we're behind. We don't strictly need all neighbors for it.	2022-10-12 15:25:03 +03:00
Roman Khimov	e80c60a3b9	network: rework broadcast logic We have a number of queues for different purposes: * regular broadcast queue * direct p2p queue * high-priority queue And two basic egress scenarios: * direct p2p messages (replies to requests in Server's handle* methods) * broadcasted messages Low priority broadcasted messages: * transaction inventories * block inventories * notary inventories * non-consensus extensibles High-priority broadcasted messages: * consensus extensibles * getdata transaction requests from consensus process * getaddr requests P2P messages are a bit more complicated, most of the time they use p2p queue, but extensible message requests/replies use HP queue. Server's handle* code is run from Peer's handleIncoming, every peer has this thread that handles incoming messages. When working with the peer it's important to reply to requests and blocking this thread until we send (queue) a reply is fine, if the peer is slow we just won't get anything new from it. The queue used is irrelevant wrt this issue. Broadcasted messages are radically different, we want them to be delivered to many peers, but we don't care about specific ones. If it's delivered to 2/3 of the peers we're fine, if it's delivered to more of them --- it's not an issue. But doing this fairly is not an easy thing, current code tries performing unblocked sends and if this doesn't yield enough results it then blocks (but has a timeout, we can't wait indefinitely). But it does so in sequential manner, once the peer is chosen the code will wait for it (and only it) until timeout happens. What can be done instead is an attempt to push the message to all of the peers simultaneously (or close to that). If they all deliver --- OK, if some block and wait then we can wait until _any_ of them pushes the message through (or global timeout happens, we still can't wait forever). If we have enough deliveries then we can cancel pending ones and it's again not an error if these canceled threads still do their job. This makes the system more dynamic and adds some substantial processing overhead, but it's a networking code, any of this overhead is much lower than the actual packet delivery time. It also allows to spread the load more fairly, if there is any spare queue it'll get the packet and release the broadcaster. On the next broadcast iteration another peer is more likely to be chosen just because it didn't get a message previously (and had some time to deliver already queued messages). It works perfectly in tests, with optimal networking conditions we have much better block times and TPS increases by 5-25%% depending on the scenario. I'd go as far as to say that it fixes the original problem of #2678, because in this particular scenario we have empty queues in ~100% of the cases and this new logic will likely lead to 100% fan out in this case (cancelation just won't happen fast enough). But when the load grows and there is some waiting in the queue it will optimize out the slowest links.	2022-10-11 18:42:40 +03:00
Roman Khimov	dabdad20ad	network: don't wait indefinitely for packet to be sent Peers can be slow, very slow, slow enough to affect node's regular operation. We can't wait for them indefinitely, there has to be a timeout for send operations. This patch uses TimePerBlock as a reference for its timeout. It's relatively big and it doesn't affect tests much, 4+1 scenarios tend to perform a little worse with while 7+2 scenarios work a little better. The difference is in some percents, but all of these tests easily have 10-15% variations from run to run. It's an important step in making our gossip better because we can't have any behavior where neighbors directly block the node forever, refs. #2678 and	2022-10-10 22:15:21 +03:00
Roman Khimov	4f3ffe7290	golangci: enable errorlint and fix everything it found	2022-09-02 18:36:23 +03:00
Elizaveta Chichindaeva	28908aa3cf	[#2442 ] English Check Signed-off-by: Elizaveta Chichindaeva <elizaveta@nspcc.ru>	2022-05-04 19:48:27 +03:00
Roman Khimov	60d6fa1125	network: keep a copy of the config inside of Server Avoid copying the configuration again and again, make things a bit more efficient.	2022-01-24 18:43:01 +03:00
Roman Khimov	774dee3cd4	network: fix disconnection race between handleConn() and handleIncoming() handleIncoming() winning the race for p.Disconnect() call might lead to nil error passed as the reason for peer unregistration.	2021-11-01 12:20:55 +03:00
Anna Shaleva	d67ff30704	core: implement statesync module And support GetMPTData and MPTData P2P commands.	2021-09-07 19:43:27 +03:00
Roman Khimov	f78bd6474f	network: handle incoming message in a separate goroutine Network communication takes time. Handling some messages (like transaction) also takes time. We can share this time by making handler a separate goroutine. So while message is being handled receiver can already get and parse the next one. It doesn't improve metrics a lot, but still I think it makes sense and in some scenarios this can be more beneficial than this. `e41fc2fd1b`, 4 nodes, 10 workers RPS 6732.979 6396.160 6759.624 6246.398 6589.841 ≈ 6545 ± 3.02% TPS 6491.062 5984.190 6275.652 5867.477 6360.797 ≈ 6196 ± 3.77% CPU % 42.053 43.515 44.768 40.344 44.112 ≈ 43.0 ± 3.69% Mem MB 2564.130 2744.236 2636.267 2589.505 2765.926 ≈ 2660 ± 3.06% Patched: RPS 6902.296 6465.662 6856.044 6785.515 6157.024 ≈ 6633 ± 4.26% ↑ 1.34% TPS 6468.431 6218.867 6610.565 6288.596 5790.556 ≈ 6275 ± 4.44% ↑ 1.28% CPU % 50.231 42.925 49.481 48.396 42.662 ≈ 46.7 ± 7.01% ↑ 8.60% Mem MB 2856.841 2684.103 2756.195 2733.485 2422.787 ≈ 2691 ± 5.40% ↑ 1.17%	2021-08-06 19:37:37 +03:00
Roman Khimov	601841ef35	*: drop unused structure fields Found by structcheck: `good` is unused (structcheck) and alike.	2021-05-12 19:41:23 +03:00
Roman Khimov	0888cf9ed2	network: drop Network from Message It's not used any more.	2021-03-26 13:45:18 +03:00
Evgenii Stratonikov	84a3474fc5	network: set timeout on write Fix a bug occuring under high load when node hangs during this write.	2020-12-25 14:36:53 +03:00
Evgenii Stratonikov	0a5049658f	network: support non-blocking broadcast Right now a single slow peer can slow down whole network. Do broadcast in 2 parts: 1. Perform non-blocking send to all peers if possible. 2. Perform blocking sends until message is sent to 2/3 of good peers.	2020-12-25 14:36:52 +03:00
Roman Khimov	2ce3c8b75f	network: treat unsolicited addr commands as errors See neo-project/neo#2097.	2020-11-25 13:34:38 +03:00
Evgenii Stratonikov	1869d6d460	core: allow to use state root in header	2020-11-20 17:16:32 +03:00
Roman Khimov	c8cc91eeee	network: request blocks when there is a ping with bigger than ours height Turns out, C# node no longer broadcasts an Inv when it's creating a block, instead it sends a ping and if we're not paying attention to the height specified there we're technically missing a new block. Of course we'll get it later after ping timer expiration and regular ping/pong sequence, but that's delaying it for no good reason.	2020-08-14 16:22:15 +03:00
Roman Khimov	0e2784cd2c	always wrap errors when creating new ones with fmt.Errorf() It doesn't really change anything in most of the cases, but it's a useful habit anyway. Fix #350.	2020-08-07 12:21:52 +03:00
Anna Shaleva	6c8accf18c	core, network: request blocks instead of headers Closes #1192 1. We now have CMDGetBlockByIndex, so there's no need to request headers first when we can just ask for blocks. 2. We don't ask for headers (i.e. we don't send CMDGetHeaders), consequently, we shouldn't react on CMDHeaders. 3. But we still keep on reacting on CMDGetHeaders command as there could be a node which needs headers.	2020-08-04 17:52:34 +03:00
Roman Khimov	b483c38593	block/transaction: add network magic into the hash We make it explicit in the appropriate Block/Transaction structures, not via a singleton as C# node does. I think this approach has a bit more potential and allows better packages reuse for different purposes.	2020-06-18 12:39:50 +03:00
Anna Shaleva	8c5c248e79	protocol: add capabilities to address payload Part of #871	2020-05-27 19:02:25 +03:00
Anna Shaleva	c590cc02f4	protocol: add capabilities to version payload closes #871	2020-05-27 19:01:14 +03:00
Anna Shaleva	3bcc56bdcf	protocol: switch to binary MessageCommand closes #888	2020-05-21 13:57:49 +03:00
Roman Khimov	e41d434a49	*: move all packages from CityOfZion to nspcc-dev	2020-03-03 17:21:42 +03:00
Roman Khimov	d1a2296939	network: change the disconnect procedure We can still lock the (Server).run with dead peers: Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: goroutine 40 [select, 871 minutes]: Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(TCPPeer).putPacketIntoQueue(0xc030ab5320, 0xc02f251f20, 0xc00af0dcc0, 0x18, 0x40, 0x100000000000000, 0xffffffffffffffff) Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:82 +0xf4 Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(TCPPeer).EnqueueHPPacket(0xc030ab5320, 0xc00af0dcc0, 0x18, 0x40, 0x1367240, 0xc03090ef98) Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:124 +0x52 Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(Server).iteratePeersWithSendMsg(0xc0000ca000, 0xc00af35800, 0xcb2a58, 0x0) Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:720 +0x12a Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(Server).broadcastHPMessage(...) Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:731 Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(Server).run(0xc0000ca000) Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:203 +0xee4 Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(Server).Start(0xc0000ca000, 0xc000072ba0) Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:173 +0x2ec Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: created by github.com/CityOfZion/neo-go/cli/server.startServer Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/cli/server/server.go:331 +0x476 ... Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: goroutine 2199 [chan send, 870 minutes]: Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(TCPPeer).Disconnect.func1() Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:366 +0x85 Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: sync.(Once).Do(0xc030ab403c, 0xc02f262788) Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/usr/local/go/src/sync/once.go:44 +0xb3 Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(TCPPeer).Disconnect(0xc030ab4000, 0xd92440, 0xc000065a00) Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:365 +0x6d Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(TCPPeer).SendPing.func1() Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:394 +0x42 Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: created by time.goFunc Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/usr/local/go/src/time/sleep.go:169 +0x44 ... Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: goroutine 3448 [chan send, 854 minutes]: Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(TCPPeer).handleConn(0xc01ed203f0) Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:143 +0x6c Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: created by github.com/CityOfZion/neo-go/pkg/network.(*TCPTransport).Accept Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_transport.go:62 +0x44c ... The problem is that the select in putPacketIntoQueue() only works the way it was intended to after the `close(p.done)`, but that happens only after successful unregistration request send. Thus, do disconnects the other way around, first unblock queueing and exit goroutines, then destroy the connection (if it wasn't previously destroyed) and only after that signal to the Server.	2020-02-13 16:24:46 +03:00
Roman Khimov	7ee8f9c5d8	network: fix networking stalls caused by stale peers We can leak sending goroutines and stall broadcasts because of already gone peers that happened to be cached by some s.Peers() user (more than 800 of these can be seen in nodoka log along with (Server).run blocking on CMDGetAddr send): Feb 10 16:35:15 nodoka neo-go[1563]: goroutine 41 [chan send, 3320 minutes]: Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(TCPPeer).putPacketIntoQueue(...) Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:81 Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(TCPPeer).EnqueueHPPacket(0xc0083d57a0, 0xc017206100, 0x18, 0x40, 0x136a240, 0xc018ef9720) Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:119 +0x98 Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(Server).iteratePeersWithSendMsg(0xc0000ca000, 0xc0001848a0, 0xcb4550, 0x0) Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:720 +0x12a Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(Server).broadcastHPMessage(...) Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:731 Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(Server).run(0xc0000ca000) Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:203 +0xee4 Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*Server).Start(0xc0000ca000, 0xc000072c60) Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:173 +0x2ec Feb 10 16:35:15 nodoka neo-go[1563]: created by github.com/CityOfZion/neo-go/cli/server.startServer Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/cli/server/server.go:331 +0x476	2020-02-10 18:47:52 +03:00
Roman Khimov	fdbaac7a30	network: prevent broadcast queue starving, share time with p2p Blocked broadcast queue of one peer may affect broadcasting capabilities of the server, so prevent total blocking of it by p2p queue.	2020-01-30 14:03:52 +03:00
Roman Khimov	b2c4587dad	network: fix PeerAddr() for not-yet-handshaked case If we have already got Version message, we don't need the rest of handshake to complete before being able to properly answer the PeerAddr() requests. Fixes some duplicate connections between machines.	2020-01-30 14:03:52 +03:00
Roman Khimov	9eafec0d1d	network: introduce peer-to-peer message queue This one is designed to give more priority to direct nodes communication, that is that their messaging would have more priority than generic broadcasts. It should improve consensus process under TX pressure and allow to handle pings in time (preventing disconnects).	2020-01-30 14:03:52 +03:00
Roman Khimov	1c28dd2567	network: add message type to disconnect error message If it was caused by message processing, but only after the handshake to preserve errIdenticalID and other handshaking errors.	2020-01-30 14:03:52 +03:00
Roman Khimov	06c3fbe455	network: rework ping sends, fix overpinging Our node was too pingy because of wrong timer setups (that divided timeout Duration by time.Second), it also was wrong in its time calculations (using UTC time to calculate intervals). At the same time missing block is a server-wide problem, so it's better solved with server-wide protocol loop.	2020-01-28 17:39:52 +03:00
Roman Khimov	99dfdc19e7	network: drop now useless addrReq queue from the server Just broadcast a high-priority message to everyone.	2020-01-22 11:28:59 +03:00
Roman Khimov	34b863d645	network: introduce Server's MkMsg() That wraps NewMessage() for a configured network.	2020-01-21 17:31:51 +03:00
Roman Khimov	1f672e0da7	network: move SendVersion() to the Peer Only leave server-specific `getVersionMsg()` in the Server, all the other logic is peer-related.	2020-01-21 17:26:08 +03:00
Roman Khimov	2c4ace022e	network/config: redesign ping timeout handling a bit 1) Make timeout a timeout, don't do magic ping counts. 2) Drop additional timer from the main peer's protocol loop, create it dynamically and make it disconnect the peer. 3) Don't expose the ping counter to the outside, handle more logic inside the Peer. Relates to #430.	2020-01-20 19:37:17 +03:00
Roman Khimov	62092c703d	network: use local timestamp to decide when to ping We don't and we won't have synchronized clocks in the network so the only timestamp that we can compare our local time with is the one made ourselves. What this ping mechanism is used for is to recover from missing the block broadcast, thus it's appropriate for it to trigger after X seconds of the local time since the last block received. Relates to #430.	2020-01-20 19:37:17 +03:00
Roman Khimov	a8252ecc05	network: remove wrong ping condition In reality it will never be true exactly in the case where we want this ping mechanism to work --- when the node failed to get a block from the net. It won't get the header either and thus its block height will be equal to header height. The only moment when this condition is met is when the node does initial synchronization and this synchronization works just fine without any pings. Relates to #430.	2020-01-20 19:37:17 +03:00
Roman Khimov	247cfa4165	network: either request blocks or ping a peer, but not both It makes to sense to do both actions, pings are made for a different purpose. Relates to #430.	2020-01-20 19:37:17 +03:00
Roman Khimov	0ba6b2a754	network: introduce peer sending queues Two queues for high-priority and ordinary messages. Fixes #590. These queues are deliberately made small to avoid buffer bloat problem, there is gonna be another queueing layer above them to compensate for that. The queues are designed to be synchronous in enqueueing, async capabilities are to be added layer above later.	2020-01-20 17:23:26 +03:00
Roman Khimov	7f0882767c	network: remove useless Done() method from the peer It's internal state of the peer that no one should care about.	2020-01-20 17:23:26 +03:00
Roman Khimov	f39d5d5a10	network: fix unregistration on peer Disconnect It should always signal to the server, not duplicating this send and not missing it like it happened in the Server.run().	2020-01-20 17:23:26 +03:00
Roman Khimov	907a236285	network: move per-peer goroutines into the TCPPeer As they're directly tied to it.	2020-01-20 17:23:26 +03:00
Vsevolod Brekelov	4e6ed9021c	network: add ping pong processing add pingInterval same as used in ref C# implementation with the same logic add pingTimeout which is used to check whether pong received. If not -- drop the peer. add pingLimit which is hardcoded to 4 in TCPPeer. It's limit for unsuccessful ping/pong calls (where pong wasn't received in pingTimeout interval)	2020-01-17 13:24:14 +03:00
Evgenii Stratonikov	e3098ed0f8	network: write messages atomically Right now message can be written in several Write's so concurrent calls of writeMsg() can in theory interleave. This commit fixes it. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2019-11-18 09:31:00 +03:00
Roman Khimov	d7f747fa9a	network: wait for both Version messages before ACKing Otherwise the node might crash in `startProtocol` because of missing Version field in the peer. And it also keeps the sequence correct, Version MUST be sent first and ACKs can only follow it.	2019-11-06 18:05:50 +03:00
Roman Khimov	ec76ed23a5	network: rework peer handshaking, fix #458 This allows to start handshaking from both client and server (mainnet/testnet nodes were seen to not care about string ordering for it), but still maintains some sane checks in the process. It also makes functions thread-safe because we have two goroutines servicing read and write side of the Peer connection, so they can clash on access to the struct fields. Add a test for it also.	2019-11-06 15:29:58 +03:00
Roman Khimov	e859e03240	network: split Peer's NetAddr into RemoteAddr and PeerAddr As they are different things used for different purposes.	2019-11-06 15:26:24 +03:00

1 2

61 commits