We can leak sending goroutines and stall broadcasts because of already gone
peers that happened to be cached by some s.Peers() user (more than 800 of
these can be seen in nodoka log along with (*Server).run blocking on
CMDGetAddr send):
Feb 10 16:35:15 nodoka neo-go[1563]: goroutine 41 [chan send, 3320 minutes]:
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*TCPPeer).putPacketIntoQueue(...)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:81
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*TCPPeer).EnqueueHPPacket(0xc0083d57a0, 0xc017206100, 0x18, 0x40, 0x136a240, 0xc018ef9720)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:119 +0x98
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*Server).iteratePeersWithSendMsg(0xc0000ca000, 0xc0001848a0, 0xcb4550, 0x0)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:720 +0x12a
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*Server).broadcastHPMessage(...)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:731
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*Server).run(0xc0000ca000)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:203 +0xee4
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*Server).Start(0xc0000ca000, 0xc000072c60)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:173 +0x2ec
Feb 10 16:35:15 nodoka neo-go[1563]: created by github.com/CityOfZion/neo-go/cli/server.startServer
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/cli/server/server.go:331 +0x476
Tesnet sync failed with:
Feb 07 00:04:19 nodoka neo-go[1747]: 2020-02-07T00:04:19.838+0300 WARN blockQueue: failed adding block into the blockchain {"error": "failed to store notifications: not supported", "blockHeight": 713984, "nextIndex": 713985}
because some (not so) smart contract emitted a notification with an
InteropItem inside.
Seeing some
blockQueue: failed adding block into the blockchain {"error": "failed to store notifications: not supported", "blockHeight": 713984, "nextIndex": 713985}
in logs is not very helpful.
Fix mempool and chain locking
This allows us easily make 1000 Tx/s in 4-nodes privnet, fixes potential
double spends and improves mempool testing coverage.
Fixes GolangCI:
Error return value of
(*github.com/CityOfZion/neo-go/pkg/core/mempool.Pool).Add is not checked
(from errcheck)
and allows us to almost completely forget about mempool here.
Our mempool only contains valid verified transactions all the time, it never
has any unverified ones. Unverified pool made some sense for quick unverifying
after the new block acceptance (and gradual background reverification), but
reverification needs some non-trivial locking between blockchain and mempool
and internal mempool state locking (reverifying tx and moving it between
unverified and verified pools must be atomic). But our current reverification
is fast enough (and has all the appropriate locks), so bothering with
unverified pool makes little sense.
We not only need to remove transactions stored in the block, but also
invalidate some potential double spends caused by these transactions. Usually
new block contains a substantial number of transactions from the pool, so it's
easier to make one pass over it only keeping valid items rather than remove
them one by one and make an additional pass to recheck inputs/witnesses.
It doesn't harm as we have transactions naturally ordered by fee anyway and it
makes managing them a little easier. This also makes slices store item itself
instead of pointers to it which reduces the pressure on the memory subsystem.
They shouldn't depend on the chain state and for the same transaction they
should always produce the same result. Thus, it makes no sense recalculating
them over and over again.
We can only add one block of the given height and we have two competing
goroutines to do that --- consensus and block queue. Whomever adds the block
first shouldn't trigger an error in another one.
Fix block relaying for blocks added via the block queue also, previously one
consensus-generated blocks were broadcasted.
Eliminate races between tx checks and adding them to the mempool, ensure the
chain doesn't change while we're working with the new tx. Ensure only one
block addition attempt could be in progress.