It could be the case that checks are performed simultaneosly and
peers connections goes down from 2 to 0. We must take such case into
account and register address as good in discovery.
If the node is to start with seeds unavailable it will try connecting to each
of them three times, blacklist them and then sit forever waiting for
something. It's not a good behavior, it should always try connecting to seeds
if nothing else works.
We can't lock them (or there will be a deadlock), but we need to fix this:
fatal error: concurrent map iteration and map write
goroutine 1 [running]:
runtime.throw(0xdec086, 0x26)
/usr/lib64/go/1.12/src/runtime/panic.go:617 +0x72 fp=0xc02fec2bf8 sp=0xc02fec2bc8 pc=0x42d932
runtime.mapiternext(0xc02fec2d40)
/usr/lib64/go/1.12/src/runtime/map.go:860 +0x597 fp=0xc02fec2c80 sp=0xc02fec2bf8 pc=0x40efe7
github.com/nspcc-dev/neo-go/pkg/network.(*Server).Shutdown(0xc0000fc160)
/home/rik/dev/neo-go2/pkg/network/server.go:194 +0x238 fp=0xc02fec2db0 sp=0xc02fec2c80 pc=0xa89da8
github.com/nspcc-dev/neo-go/cli/server.startServer(0xc0000fcc60, 0x0, 0x0)
/home/rik/dev/neo-go2/cli/server/server.go:399 +0x7a9 fp=0xc02fec3820 sp=0xc02fec2db0 pc=0xae2079
...
Get new blocks directly from the Blockchain. It may lead to some duplications
(as we'll also receive our own blocks), but at the same time it's more
correct, because technically we can also get blocks via other means besides
network server like RPC (submitblock call). And it simplifies network server
at the same time.
Close transport and disconnect peers right in the Shutdown(), so that no new
connections would be accepted and so that all the peers would be disconnected
correctly (avoiding the same deadlock as in e2116e4c3f).
Implement mempool and consensus block creation policies, almost the same as
SimplePolicy plugin for C# node provides with two caveats:
* HighPriorityTxType is not configured and hardcoded to ClaimType
* BlockedAccounts are not supported
Other than that it allows us to run successfuly as testnet CN, previously our
proposals were rejected because we were proposing blocks with oversized
transactions (that are rejected by PoolTx() now).
Mainnet and testnet configuration files are updated accordingly, but privnet
is left as is with no limits.
Configuration is currently attached to the Blockchain and so is the code that
does policying, it may be moved somewhere in the future, but it works for
now.
We can only add one block of the given height and we have two competing
goroutines to do that --- consensus and block queue. Whomever adds the block
first shouldn't trigger an error in another one.
Fix block relaying for blocks added via the block queue also, previously one
consensus-generated blocks were broadcasted.
Eliminate races between tx checks and adding them to the mempool, ensure the
chain doesn't change while we're working with the new tx. Ensure only one
block addition attempt could be in progress.
It can lead to some goroutine explosion, but supposedly it's better than
stalling other processing and eventually all of these goroutines should finish
their sends. Note that this doesn't change the behavior for RPC-relayed
transactions that are still waiting for the broadcast to finish ensuring
proper transaction distribution before returning the result to the client.
This one is designed to give more priority to direct nodes communication, that
is that their messaging would have more priority than generic broadcasts. It
should improve consensus process under TX pressure and allow to handle
pings in time (preventing disconnects).
They have the opposite order, height first and nonce second. It was done wrong
in 4e6ed902 and never fixed since. Fixes sending wrong peer state leading to
useless getheaders messages (and disconnects when the other side is lagging
behind).
We can have more than one connection attempt in progress and not yet completed
the handshake, so if there is a Version already received we should look it.
Our node was too pingy because of wrong timer setups (that divided timeout
Duration by time.Second), it also was wrong in its time calculations (using
UTC time to calculate intervals). At the same time missing block is a
server-wide problem, so it's better solved with server-wide protocol loop.
1) Make timeout a timeout, don't do magic ping counts.
2) Drop additional timer from the main peer's protocol loop, create it
dynamically and make it disconnect the peer.
3) Don't expose the ping counter to the outside, handle more logic inside the
Peer.
Relates to #430.
We don't and we won't have synchronized clocks in the network so the only
timestamp that we can compare our local time with is the one made
ourselves. What this ping mechanism is used for is to recover from missing the
block broadcast, thus it's appropriate for it to trigger after X seconds of
the local time since the last block received.
Relates to #430.