We can't lock them (or there will be a deadlock), but we need to fix this:
fatal error: concurrent map iteration and map write
goroutine 1 [running]:
runtime.throw(0xdec086, 0x26)
/usr/lib64/go/1.12/src/runtime/panic.go:617 +0x72 fp=0xc02fec2bf8 sp=0xc02fec2bc8 pc=0x42d932
runtime.mapiternext(0xc02fec2d40)
/usr/lib64/go/1.12/src/runtime/map.go:860 +0x597 fp=0xc02fec2c80 sp=0xc02fec2bf8 pc=0x40efe7
github.com/nspcc-dev/neo-go/pkg/network.(*Server).Shutdown(0xc0000fc160)
/home/rik/dev/neo-go2/pkg/network/server.go:194 +0x238 fp=0xc02fec2db0 sp=0xc02fec2c80 pc=0xa89da8
github.com/nspcc-dev/neo-go/cli/server.startServer(0xc0000fcc60, 0x0, 0x0)
/home/rik/dev/neo-go2/cli/server/server.go:399 +0x7a9 fp=0xc02fec3820 sp=0xc02fec2db0 pc=0xae2079
...
GetValidators without parameter is called upon DBFT initialization and it
should receive validators for the next block (that will create it),
parameterized GetValidators is used for NextConsensus calculation where we
need a list for the current state of the chain.
In order to avoid dependency cycle at the next commits:
imports github.com/nspcc-dev/neo-go/pkg/config
imports github.com/nspcc-dev/neo-go/pkg/wallet
imports github.com/nspcc-dev/neo-go/pkg/vm
imports github.com/nspcc-dev/neo-go/pkg/smartcontract/nef
imports github.com/nspcc-dev/neo-go/pkg/config
There is no such thing as high/low priority transactions, as there are
no free transactions anymore and they are ordered by fees contained
in transaction itself.
Closes#1063.
We make it explicit in the appropriate Block/Transaction structures, not via a
singleton as C# node does. I think this approach has a bit more potential and
allows better packages reuse for different purposes.
Turns out, Neo uses block compression and not a streaming lz4 format. And it
doesn't contain uncompressed payload size which makes guessing it somewhat
suboptimal.
They actually use the same types as for messages. Fixes
2020-05-29T00:06:17.593+0300 WARN peer disconnected {"addr": "168.62.167.190:20333", "reason": "handling CMDInv message: invalid inventory type", "peerCount": 3}
Get new blocks directly from the Blockchain. It may lead to some duplications
(as we'll also receive our own blocks), but at the same time it's more
correct, because technically we can also get blocks via other means besides
network server like RPC (submitblock call). And it simplifies network server
at the same time.
Frequently one needs to check if struct serializes/deserializes
properly. This commit implements helpers for such cases including:
1. JSON
2. io.Serializable interface
Why a deadlock can occur:
1. (*DefaultDiscovery).run() has a for loop over requestCh channel.
2. (*DefaultDiscovery).RequestRemote() send to this channel while
holding a mutex.
3. (*DefaultDiscovery).RegisterBadAddr() tries to take mutex for write.
4. Second select-case can't take mutex for read because of (3).
Close transport and disconnect peers right in the Shutdown(), so that no new
connections would be accepted and so that all the peers would be disconnected
correctly (avoiding the same deadlock as in e2116e4c3f).
Implement mempool and consensus block creation policies, almost the same as
SimplePolicy plugin for C# node provides with two caveats:
* HighPriorityTxType is not configured and hardcoded to ClaimType
* BlockedAccounts are not supported
Other than that it allows us to run successfuly as testnet CN, previously our
proposals were rejected because we were proposing blocks with oversized
transactions (that are rejected by PoolTx() now).
Mainnet and testnet configuration files are updated accordingly, but privnet
is left as is with no limits.
Configuration is currently attached to the Blockchain and so is the code that
does policying, it may be moved somewhere in the future, but it works for
now.
We can still lock the (*Server).run with dead peers:
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: goroutine 40 [select, 871 minutes]:
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(*TCPPeer).putPacketIntoQueue(0xc030ab5320, 0xc02f251f20, 0xc00af0dcc0, 0x18, 0x40, 0x100000000000000, 0xffffffffffffffff)
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:82 +0xf4
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(*TCPPeer).EnqueueHPPacket(0xc030ab5320, 0xc00af0dcc0, 0x18, 0x40, 0x1367240, 0xc03090ef98)
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:124 +0x52
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(*Server).iteratePeersWithSendMsg(0xc0000ca000, 0xc00af35800, 0xcb2a58, 0x0)
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:720 +0x12a
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(*Server).broadcastHPMessage(...)
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:731
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(*Server).run(0xc0000ca000)
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:203 +0xee4
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(*Server).Start(0xc0000ca000, 0xc000072ba0)
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:173 +0x2ec
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: created by github.com/CityOfZion/neo-go/cli/server.startServer
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/cli/server/server.go:331 +0x476
...
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: goroutine 2199 [chan send, 870 minutes]:
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(*TCPPeer).Disconnect.func1()
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:366 +0x85
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: sync.(*Once).Do(0xc030ab403c, 0xc02f262788)
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/usr/local/go/src/sync/once.go:44 +0xb3
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(*TCPPeer).Disconnect(0xc030ab4000, 0xd92440, 0xc000065a00)
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:365 +0x6d
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(*TCPPeer).SendPing.func1()
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:394 +0x42
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: created by time.goFunc
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/usr/local/go/src/time/sleep.go:169 +0x44
...
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: goroutine 3448 [chan send, 854 minutes]:
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: github.com/CityOfZion/neo-go/pkg/network.(*TCPPeer).handleConn(0xc01ed203f0)
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:143 +0x6c
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: created by github.com/CityOfZion/neo-go/pkg/network.(*TCPTransport).Accept
Feb 13 16:14:50 neo-go-node-2 neo-go[9448]: #011/go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_transport.go:62 +0x44c
...
The problem is that the select in putPacketIntoQueue() only works the way it
was intended to after the `close(p.done)`, but that happens only after
successful unregistration request send. Thus, do disconnects the other way
around, first unblock queueing and exit goroutines, then destroy the
connection (if it wasn't previously destroyed) and only after that signal to
the Server.
We can leak sending goroutines and stall broadcasts because of already gone
peers that happened to be cached by some s.Peers() user (more than 800 of
these can be seen in nodoka log along with (*Server).run blocking on
CMDGetAddr send):
Feb 10 16:35:15 nodoka neo-go[1563]: goroutine 41 [chan send, 3320 minutes]:
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*TCPPeer).putPacketIntoQueue(...)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:81
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*TCPPeer).EnqueueHPPacket(0xc0083d57a0, 0xc017206100, 0x18, 0x40, 0x136a240, 0xc018ef9720)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/tcp_peer.go:119 +0x98
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*Server).iteratePeersWithSendMsg(0xc0000ca000, 0xc0001848a0, 0xcb4550, 0x0)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:720 +0x12a
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*Server).broadcastHPMessage(...)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:731
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*Server).run(0xc0000ca000)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:203 +0xee4
Feb 10 16:35:15 nodoka neo-go[1563]: github.com/CityOfZion/neo-go/pkg/network.(*Server).Start(0xc0000ca000, 0xc000072c60)
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/pkg/network/server.go:173 +0x2ec
Feb 10 16:35:15 nodoka neo-go[1563]: created by github.com/CityOfZion/neo-go/cli/server.startServer
Feb 10 16:35:15 nodoka neo-go[1563]: /go/src/github.com/CityOfZion/neo-go/cli/server/server.go:331 +0x476
We can only add one block of the given height and we have two competing
goroutines to do that --- consensus and block queue. Whomever adds the block
first shouldn't trigger an error in another one.
Fix block relaying for blocks added via the block queue also, previously one
consensus-generated blocks were broadcasted.
Eliminate races between tx checks and adding them to the mempool, ensure the
chain doesn't change while we're working with the new tx. Ensure only one
block addition attempt could be in progress.
It can lead to some goroutine explosion, but supposedly it's better than
stalling other processing and eventually all of these goroutines should finish
their sends. Note that this doesn't change the behavior for RPC-relayed
transactions that are still waiting for the broadcast to finish ensuring
proper transaction distribution before returning the result to the client.
If we have already got Version message, we don't need the rest of handshake to
complete before being able to properly answer the PeerAddr() requests. Fixes
some duplicate connections between machines.
This one is designed to give more priority to direct nodes communication, that
is that their messaging would have more priority than generic broadcasts. It
should improve consensus process under TX pressure and allow to handle
pings in time (preventing disconnects).
They have the opposite order, height first and nonce second. It was done wrong
in 4e6ed902 and never fixed since. Fixes sending wrong peer state leading to
useless getheaders messages (and disconnects when the other side is lagging
behind).
We can have more than one connection attempt in progress and not yet completed
the handshake, so if there is a Version already received we should look it.
Our node was too pingy because of wrong timer setups (that divided timeout
Duration by time.Second), it also was wrong in its time calculations (using
UTC time to calculate intervals). At the same time missing block is a
server-wide problem, so it's better solved with server-wide protocol loop.
1) Make timeout a timeout, don't do magic ping counts.
2) Drop additional timer from the main peer's protocol loop, create it
dynamically and make it disconnect the peer.
3) Don't expose the ping counter to the outside, handle more logic inside the
Peer.
Relates to #430.
We don't and we won't have synchronized clocks in the network so the only
timestamp that we can compare our local time with is the one made
ourselves. What this ping mechanism is used for is to recover from missing the
block broadcast, thus it's appropriate for it to trigger after X seconds of
the local time since the last block received.
Relates to #430.
In reality it will never be true exactly in the case where we want this ping
mechanism to work --- when the node failed to get a block from the net. It
won't get the header either and thus its block height will be equal to header
height. The only moment when this condition is met is when the node does
initial synchronization and this synchronization works just fine without any
pings.
Relates to #430.
Two queues for high-priority and ordinary messages. Fixes#590. These queues
are deliberately made small to avoid buffer bloat problem, there is gonna be
another queueing layer above them to compensate for that. The queues are
designed to be synchronous in enqueueing, async capabilities are to be added
layer above later.
add pingInterval same as used in ref C# implementation with the same logic
add pingTimeout which is used to check whether pong received. If not -- drop the peer.
add pingLimit which is hardcoded to 4 in TCPPeer. It's limit for unsuccessful ping/pong calls (where pong wasn't received in pingTimeout interval)
It wasn't actually requesting transactions but rather sending an inventory
message telling everyone that we have them which is completely wrong and
easily leads to ChangeView that could be avoided.
go vet is not happy about them:
pkg/io/binaryReader.go:92:21: method ReadByte() byte should have signature ReadByte() (byte, error)
pkg/io/binaryWriter.go:75:21: method WriteByte(u8 byte) should have signature WriteByte(byte) error
This seriously improves the serialization/deserialization performance for
several reasons:
* no time spent in `binary` reflection
* no memory allocations being made on every read/write
* uses fast ReadBytes everywhere it's appropriate
It also makes Fixed8 Serializable just for convenience.
add dao which takes care about all CRUD operations on storage
remove blockchain state since everything is stored on change
remove storage operations from structs(entities)
move structs to entities package
It reduces heap pressure a little for these elements as we don't have to
allocate/free them individually. And they're directly tied to transactions or
block, not being shared or anything like that, so it makes little sense for
them to be pointer-based. It only makes building transactions a little easier,
but that's obviously a minor usecase.
Before this patch on block import we could easily be spending more than 6
seconds out of 30 in Uint256 encoding for UnspentBalance, now it's completely
off the radar.
Only request headers from the other peer if his height is bigger than
ours. Otherwise we routinely ask 0-height newcomers for some random headers
that they know nothing about.
This one is essential for the consensus nodes as otherwise they won't give out
the blocks they generate making their generation almost useless. It also makes
our networking part more complete.
We have a race between reader and writer goroutines for the same connection
that leads to handshake failures when reader is faster to read the incoming
version (and try to reply to it) than writer is to write our own Version:
WARN[0000] peer disconnected addr="172.200.0.4:20334" peerCount=5 reason="invalid handshake: tried to send VersionAck, but didn't send Version yet
Fix it by moving Version sending before the reader loop starts.
add processing of validators while block persist;
add validator structure with decoding/encoding;
add validator get from store;
add EnrollmentTX and StateTX processing;
add pubkey decode bytes, unique and contains functions;
Right now message can be written in several Write's so
concurrent calls of writeMsg() can in theory interleave.
This commit fixes it.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
Otherwise the node might crash in `startProtocol` because of missing Version
field in the peer. And it also keeps the sequence correct, Version MUST be
sent first and ACKs can only follow it.
Our node didn't respect the MaxPeers setting, fix it with a drop of random
connection when this limit is reached (to give a chance for newcomers to
communicate), but also introduce AttemptConnPeers setting to tune the number
of attempted connections.
This also raises the default MaxPeers for testnet/mainnet to 100, because
neo-go nodes love making friends.
This allows to start handshaking from both client and server (mainnet/testnet
nodes were seen to not care about string ordering for it), but still maintains
some sane checks in the process. It also makes functions thread-safe because
we have two goroutines servicing read and write side of the Peer connection,
so they can clash on access to the struct fields.
Add a test for it also.
Extend Blockchainer with one more method to spawn a VM for test runs and use
it to run scripts. Gas consumption is not counted or limited in any way at the
moment (see #424).
Keeping run() as the owner of all maps would mean adding at least three more
channels to keep address getters with thread-safety. But then there also is a
race between requestToWork() and run() which is way harder to solve with
channels because there are lots of possibilities for deadlocks. So rework all
of this with good old mutexes.
While at it, fix `requestCh` handling in the inner select of run, it will waste
one loop to handle it, so we should add one to the `requested`.
Fixes#445.
Goreport:
neo-go/pkg/core/contract_state_test.go
Line 21: warning: "Contracto" is a misspelling of "Contraction" (misspell)
Line 64: warning: "Contracto" is a misspelling of "Contraction" (misspell)
neo-go/pkg/core/interop_neo.go
Line 420: warning: "succeedes" is a misspelling of "succeeds" (misspell)
neo-go/pkg/network/discovery.go
Line 118: warning: "succeded" is a misspelling of "succeeded" (misspell)
Line 128: warning: "successfuly" is a misspelling of "successfully" (misspell)
golint suggests:
pkg/network/payload/address.go:48:12: should omit type net.IP from declaration of var netip; it will be inferred from the right-hand side
It's a temporary stub until proper encoding/decoding is implemented. It's
useful for testnet/mainnet connections because without it consensus message
receival leads to peer disconnection.