When CN is not up to date with the network is synchonizes blocks first and
only then starts consensus process. But while synchronizing it receives
consensus payloads and tries to process them even though messages reader
routine is not started yet. This leads to lots of goroutines waiting to send
their messages:
Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 1639919 [chan send, 4 minutes]:
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc005bd7680)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc00507d170, 0xc005bdd560, 0x0, 0x0)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc00507d170)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294
Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad
Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 1639181 [chan send, 10 minutes]:
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc013bb6600)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc01361ee10, 0xc01342c780, 0x0, 0x0)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc01361ee10)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294
Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad
Jun 25 23:55:53 nodoka neo-go[32733]: goroutine 39454 [chan send, 32 minutes]:
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/consensus.(*service).OnPayload(0xc0000ecb40, 0xc014fea680)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/consensus/consensus.go:329 +0x31b
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleConsensusCmd(...)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:687
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*Server).handleMessage(0xc0000ba160, 0x1053260, 0xc0140b2ea0, 0xc014fe0ed0, 0x0, 0x0)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/server.go:806 +0xd58
Jun 25 23:55:53 nodoka neo-go[32733]: github.com/nspcc-dev/neo-go/pkg/network.(*TCPPeer).handleConn(0xc0140b2ea0)
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_peer.go:160 +0x294
Jun 25 23:55:53 nodoka neo-go[32733]: created by github.com/nspcc-dev/neo-go/pkg/network.(*TCPTransport).Dial
Jun 25 23:55:53 nodoka neo-go[32733]: #011/go/src/github.com/nspcc-dev/neo-go/pkg/network/tcp_transport.go:38 +0x1ad
Luckily it doesn't break synchronization completely as eventually connection
timers fire, the node breaks all connections, create new ones and these new
ones request blocks successfully until another consensus payload stalls them
too. In the end the node reaches synchronization, message processing loop
starts and releases all of these waiting goroutines, but it's better for us to
avoid this happening at all.
This also makes double-starting a no-op which is a nice property.
This error message makes no sense when shutting down the server:
2020-06-25T19:29:53.251+0300 ERROR failed to start RPC server {"error": "http: Server closed"}
And ListenAndServer is documented to always return non-nil error one of which
is http.ErrServerClosed. This should also fix the following test failure:
==================
WARNING: DATA RACE
Read at 0x00c000254243 by goroutine 49:
testing.(*common).logDepth()
/usr/local/go/src/testing/testing.go:665 +0xa1
testing.(*common).Logf()
/usr/local/go/src/testing/testing.go:658 +0x8f
testing.(*T).Logf()
<autogenerated>:1 +0x75
go.uber.org/zap/zaptest.testingWriter.Write()
/go/pkg/mod/go.uber.org/zap@v1.10.0/zaptest/logger.go:130 +0x11f
go.uber.org/zap/zaptest.(*testingWriter).Write()
<autogenerated>:1 +0xa9
go.uber.org/zap/zapcore.(*ioCore).Write()
/go/pkg/mod/go.uber.org/zap@v1.10.0/zapcore/core.go:90 +0x1c3
go.uber.org/zap/zapcore.(*CheckedEntry).Write()
/go/pkg/mod/go.uber.org/zap@v1.10.0/zapcore/entry.go:215 +0x1e7
go.uber.org/zap.(*Logger).Error()
/go/pkg/mod/go.uber.org/zap@v1.10.0/logger.go:203 +0x95
github.com/nspcc-dev/neo-go/pkg/rpc/server.(*Server).Start()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server.go:179 +0x5c5
Previous write at 0x00c000254243 by goroutine 44:
testing.tRunner.func1()
/usr/local/go/src/testing/testing.go:900 +0x353
testing.tRunner()
/usr/local/go/src/testing/testing.go:913 +0x1bb
Goroutine 49 (running) created at:
github.com/nspcc-dev/neo-go/pkg/rpc/server.initClearServerWithInMemoryChain()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server_helper_test.go:69 +0x305
github.com/nspcc-dev/neo-go/pkg/rpc/server.initServerWithInMemoryChain()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server_helper_test.go:78 +0x3c
github.com/nspcc-dev/neo-go/pkg/rpc/server.testRPCProtocol()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server_test.go:805 +0x53
github.com/nspcc-dev/neo-go/pkg/rpc/server.TestRPC.func1()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server_test.go:793 +0x44
testing.tRunner()
/usr/local/go/src/testing/testing.go:909 +0x199
Goroutine 44 (finished) created at:
testing.(*T).Run()
/usr/local/go/src/testing/testing.go:960 +0x651
github.com/nspcc-dev/neo-go/pkg/rpc/server.TestRPC()
/go/src/github.com/nspcc-dev/neo-go/pkg/rpc/server/server_test.go:792 +0x5d
testing.tRunner()
/usr/local/go/src/testing/testing.go:909 +0x199
==================
Pick up the following changes:
* recovery message sending fix
* tx re-requests on sendRecoveryRequest
* proposed block checks fix
* timeout calculation fix for InitializeConsensus
* removal of TxPerBlock which is irrelevant for neo-go (policying is done by
the node, not dbft library)
NEO VM does not distinguish between empty and nil slices. Supporting
this is not easy and requires changing lots of other opcodes.
Pointers are not supported anyway and slices can be checked for
emptiness by inspecting their `len`.
Set default value also for complex (struct, map) types.
Note: because VM does not distinguish empty array and nil array,
comparison 'a == nil' can't be handled properly without substantial
effort. This will be fixed in NEO3.
Using view number from the recovery message is just plain wrong, it's gonna be
higher than our current view and these messages will be treated as coming from
the future, even though they have their original view number included.
Implement (*Param).GetBoolean() for converting parameter to bool value.
It is used for verbosity flag and is false iff it is either zero number
or empty sting.
We need to compact our in-memory MPT from time to time, otherwise it quickly
fills up all available memory. This raises two obvious quesions --- when to do
that and to what level do that.
As for 'when', I think it's quite easy to use our regular persistence interval
as an anchor (and it also frees up some memory), but we can't do that in the
persistence routine itself because of synchronization issues (adding some
synchronization primitives would add some cost that I'd also like to avoid),
so do it indirectly by comparing persisted and current height in `storeBlock`.
Choosing proper level is another problem, but if we're to roughly estimate one
full branch node to use 1K of memory (usually it's way less than that) then we
can easily store 1K of these nodes and that gives us a depth of 10 for our
trie.
Items were serialized several times if there were several successful
transactions in a block, prevent that by using State field as a bitfield (as
it almost was intended to) and adding one more bit. It also eliminates useless
duplicate MPT traversions.
Confirmed to not break storage changes up to 3.3M on testnet.
This was differing from C# notion of PrevHash. It's not a previous root, but
rather a hash of the previous serialized MPTRoot structure (that is to be
signed by CNs).
Implement secp256k1 and secp256r1 recover interops, closes#1003.
Note:
We have to implement Koblitz-related math to recover keys properly
with Neo.Cryptography.Secp256k1Recover interop as far as standard
go elliptic package supports short-form Weierstrass curve with a=-3
only (see https://github.com/golang/go/issues/26776 for details).
However, it's not the best choise to have a lot of such math in our
project, so it would be better to use ready-made solution for
Koblitz-related cryptography.
Because trie size is rather big, it can't be stored in memory.
Thus some form of caching should also be implemented. To avoid
marshaling/unmarshaling of items which are close to root and are used
very frequenly we can save them across the persists.
This commit implements pruning items at the specified depth,
replacing them by hash nodes.
There is nothing wrong with iterators being implemented in other parts
of code (e.g. Storage.Find). In this case type assertions can
prevent bugs at compile-time.
Reproduce behavior of the reference realization:
- if item was Put in cache after it was encountered during
Storage.Find, it must appear twice
- checking if item is in cache must be performed in real-time
during `Iterator.Next()`
bafdb916a0 change was wrong (probably brought
from neo-vm 3.0 at the state at which it existed back then), neo-vm 2.x
doesn't allow PICKITEM for arbitrary types.
The order in which storage.Find items are returns depends on what items
were processed in previous transactions of the same block.
The easiest way to implement this sort of caching is to cache operations
with storage, flushing the only in `Persist()`.
It's useless. Even though there is Neo.Transaction.GetUnspentCoins syscall
that can be used, its return type is an interop structure that's not accepted
by any other syscall, so you can't really do anything with it. And there is no
such interface for the .net Framework.
Previously we could generate dynamic appcall with a kludge of
AppCall([]byte{/* 20 zeroes */, realScriptHash, args...)
Now there is a separate function for this.
This syscall should only work for contracts created by current transaction and
that is what is supposed to be checked here. Do so by looking at the
differences between ic.dao and original lower DAO.
Our block.Block was JSONized in a bit different fashion than result.Block in
its Nonce and NextConsensus fields. It's not good for notifications because
third-party clients would probably expect to see the same format. Also, using
completely different Block representation in result is probably making our
client a bit weaker as this representation is harder to use with other neo-go
components.
So use the same approach we took for Transactions and wrap block.Base which is
to be serialized in proper way.
Getting batch, updating Prometheus metrics and pushing events doesn't require
any locking: batch is a local cache batch that no one outside cares about,
Prometheus metrics are not critical to be in perfect sync and events are
asynchronous anyway.
Note that the protocol differs a bit from #895 in its notifications format,
to avoid additional server-side processing we're omitting some metadata like:
* block size and confirmations
* transaction fees, confirmations, block hash and timestamp
* application execution doesn't have ScriptHash populated
Some block fields may also differ in encoding compared to `getblock` results
(like nonce field).
I think these differences are unnoticieable for most use cases, so we can
leave them as is, but it can be changed in the future.
We actually have to do that in order to answer getapplicationlog requests for
transactions that leave some interop items on the stack. It follows the same
logic our binary serializer/deserializes does leaving the type and stripping
the value (whatever that is).
It will be important for proper subscription testing and it doesn't hurt even
though technically we've got two http servers listening after this change (one
is a regular Server's http.Server and one is httptest's Server). Reusing
rpc.Server would be nice, but it requires some changes to Start sequence to
start Listener with net.Listen and then communicate back its resulting
Addr. It's not very convenient especially given that no other code needs it,
so doing these changes just for a bit cleaner testing seems like and
overkill.
Update config appropriately. Update Start comment along the way.
Get new blocks directly from the Blockchain. It may lead to some duplications
(as we'll also receive our own blocks), but at the same time it's more
correct, because technically we can also get blocks via other means besides
network server like RPC (submitblock call). And it simplifies network server
at the same time.