We can't lock them (or there will be a deadlock), but we need to fix this:
fatal error: concurrent map iteration and map write
goroutine 1 [running]:
runtime.throw(0xdec086, 0x26)
/usr/lib64/go/1.12/src/runtime/panic.go:617 +0x72 fp=0xc02fec2bf8 sp=0xc02fec2bc8 pc=0x42d932
runtime.mapiternext(0xc02fec2d40)
/usr/lib64/go/1.12/src/runtime/map.go:860 +0x597 fp=0xc02fec2c80 sp=0xc02fec2bf8 pc=0x40efe7
github.com/nspcc-dev/neo-go/pkg/network.(*Server).Shutdown(0xc0000fc160)
/home/rik/dev/neo-go2/pkg/network/server.go:194 +0x238 fp=0xc02fec2db0 sp=0xc02fec2c80 pc=0xa89da8
github.com/nspcc-dev/neo-go/cli/server.startServer(0xc0000fcc60, 0x0, 0x0)
/home/rik/dev/neo-go2/cli/server/server.go:399 +0x7a9 fp=0xc02fec3820 sp=0xc02fec2db0 pc=0xae2079
...
Fixes#1144. It's quite simple approach, we just update balance info right
upon contract migration. It will slow down migration transactions, but it
takes about 1-2 seconds to Seek through balances at mainnet's 3.8M, so the
approach should still work good enough. The other idea was to make lazy
updates (maintaining contract migration map), but it's more complicated to
implement (and implies that a balance get might also do a write).
There also is a concern about memory usage, it can give a spike of some tens
of megabytes, but that also is considered to be acceptable.
When CN is not up to date with the network is synchonizes blocks first and
only then starts consensus process. But while synchronizing it receives
consensus payloads and tries to process them even though messages reader
routine is not started yet. This leads to lots of goroutines waiting to send
their messages:
Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 1639919 [chan send, 4 minutes]:
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc005bd7680)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc00507d170, 0xc005bdd560, 0x0, 0x0)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc00507d170)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294
Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad
Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 1639181 [chan send, 10 minutes]:
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc013bb6600)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc01361ee10, 0xc01342c780, 0x0, 0x0)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc01361ee10)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294
Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad
Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 39454 [chan send, 32 minutes]:
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc014fea680)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc0140b2ea0, 0xc014fe0ed0, 0x0, 0x0)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc0140b2ea0)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294
Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad
Luckily it doesn't break synchronization completely as eventually connection
timers fire, the node breaks all connections, create new ones and these new
ones request blocks successfully until another consensus payload stalls them
too. In the end the node reaches synchronization, message processing loop
starts and releases all of these waiting goroutines, but it's better for us to
avoid this happening at all.
This also makes double-starting a no-op which is a nice property.
This error message makes no sense when shutting down the server:
2020-06-25T19:29:53.251+0300 ERROR failed to start RPC server {"error": "http: Server closed"}
And ListenAndServer is documented to always return non-nil error one of which
is http.ErrServerClosed. This should also fix the following test failure:
==================
WARNING: DATA RACE
Read at 0x00c000254243 by goroutine 49:
testing.(*common).logDepth()
/usr/local/go/src/testing/testing.go:665 +0xa1
testing.(*common).Logf()
/usr/local/go/src/testing/testing.go:658 +0x8f
testing.(*T).Logf()
<autogenerated>:1 +0x75
go.uber.org/zap/zaptest.testingWriter.Write()
/go/pkg/mod/go.uber.org/zap@v1.10.0/zaptest/logger.go:130 +0x11f
go.uber.org/zap/zaptest.(*testingWriter).Write()
<autogenerated>:1 +0xa9
go.uber.org/zap/zapcore.(*ioCore).Write()
/go/pkg/mod/go.uber.org/zap@v1.10.0/zapcore/core.go:90 +0x1c3
go.uber.org/zap/zapcore.(*CheckedEntry).Write()
/go/pkg/mod/go.uber.org/zap@v1.10.0/zapcore/entry.go:215 +0x1e7
go.uber.org/zap.(*Logger).Error()
/go/pkg/mod/go.uber.org/zap@v1.10.0/logger.go:203 +0x95
github.com/nspcc-dev/neo-go/pkg/rpc/server.(*Server).Start()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server.go:179 +0x5c5
Previous write at 0x00c000254243 by goroutine 44:
testing.tRunner.func1()
/usr/local/go/src/testing/testing.go:900 +0x353
testing.tRunner()
/usr/local/go/src/testing/testing.go:913 +0x1bb
Goroutine 49 (running) created at:
github.com/nspcc-dev/neo-go/pkg/rpc/server.initClearServerWithInMemoryChain()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server_helper_test.go:69 +0x305
github.com/nspcc-dev/neo-go/pkg/rpc/server.initServerWithInMemoryChain()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server_helper_test.go:78 +0x3c
github.com/nspcc-dev/neo-go/pkg/rpc/server.testRPCProtocol()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server_test.go:805 +0x53
github.com/nspcc-dev/neo-go/pkg/rpc/server.TestRPC.func1()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server_test.go:793 +0x44
testing.tRunner()
/usr/local/go/src/testing/testing.go:909 +0x199
Goroutine 44 (finished) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:960 +0x651
github.com/nspcc-dev/neo-go/pkg/rpc/server.TestRPC()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server_test.go:792 +0x5d
testing.tRunner()
/usr/local/go/src/testing/testing.go:909 +0x199
==================
Pick up the following changes:
* recovery message sending fix
* tx re-requests on sendRecoveryRequest
* proposed block checks fix
* timeout calculation fix for InitializeConsensus
* removal of TxPerBlock which is irrelevant for neo-go (policying is done by
the node, not dbft library)