Commit graph

1531 commits

Author SHA1 Message Date
Roman Khimov
5c65d33439 native: move required balance check to token contracts
Which duplicates the check, but deduplicates error path. This check forced
double balance deserialization which is quite costly operation, so we better
do it once.

It's hardly noticeable as of TPS metrics though, maybe some 1-2%%.
2021-08-03 17:59:38 +03:00
Roman Khimov
3c1325035e fee: use array for opcodes
Use less memory and have faster access.

name       old time/op  new time/op  delta
Opcode1-8  22.4ns ± 6%   3.0ns ± 6%  -86.63%  (p=0.000 n=10+10)
2021-08-02 20:18:33 +03:00
Roman Khimov
dfc514eda0
Merge pull request #2102 from nspcc-dev/store4
Improve (*MemCachedStore).Persist
2021-08-02 20:10:05 +03:00
Roman Khimov
82f481e143
Merge pull request #2105 from nspcc-dev/json-restrict
native/std: restrint amount of items in JSON deserialization
2021-08-02 19:41:54 +03:00
Evgeniy Stratonikov
bdb9748c1b native/std: restrict amount of items in JSON deserialization
Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>
2021-08-02 18:57:47 +03:00
Roman Khimov
f8174ca64c core: ensure data logged is from persistent store
Using bc.dao here is wrong, it can contain unpersisted data.
2021-08-02 16:33:09 +03:00
Roman Khimov
8277b7a19a core: don't spawn goroutine for persist function
It doesn't make any sense, in some situations it leads to a number of
goroutines created that will Persist one after another (as we can't Persist
concurrently). We can manage it better in a single thread.

This doesn't change performance in any way, but somewhat reduces resource
consumption. It was tested neo-bench (single node, 10 workers, LevelDB) on two
machines and block dump processing (RC4 testnet up to 62800 with VerifyBlocks
set to false) on i7-8565U.

Reference (b9be892bf9):

Ryzen 9 5950X:
RPS     27747.349 27407.726 27520.210  ≈ 27558   ± 0.63%
TPS     26992.010 26993.468 27010.966  ≈ 26999   ± 0.04%
CPU %      28.928    28.096    29.105  ≈    28.7 ± 1.88%
Mem MB    760.385   726.320   756.118  ≈   748   ± 2.48%

Core i7-8565U:
RPS     7783.229 7628.409 7542.340  ≈ 7651   ± 1.60%
TPS     7708.436 7607.397 7489.459  ≈ 7602   ± 1.44%
CPU %     74.899   71.020   72.697  ≈   72.9 ± 2.67%
Mem MB   438.047  436.967  416.350  ≈  430   ± 2.84%

DB restore:
real    0m20.838s 0m21.895s 0m21.794s  ≈ 21.51 ± 2.71%
user    0m39.091s 0m40.565s 0m41.493s  ≈ 40.38 ± 3.00%
sys      0m3.184s  0m2.923s  0m3.062s  ≈  3.06 ± 4.27%

Patched:

Ryzen 9 5950X:
RPS     27636.957 27246.911 27462.036  ≈ 27449   ±  0.71%  ↓ 0.40%
TPS     27003.672 26993.468 27011.696  ≈ 27003   ±  0.03%  ↑ 0.01%
CPU %      28.562    28.475    28.012  ≈    28.3 ±  1.04%  ↓ 1.39%
Mem MB    627.007   648.110   794.895  ≈   690   ± 13.25%  ↓ 7.75%

Core i7-8565U:
RPS     7497.210 7527.797 7897.532  ≈ 7641   ±  2.92%  ↓ 0.13%
TPS     7461.128 7482.678 7841.723  ≈ 7595   ±  2.81%  ↓ 0.09%
CPU %     71.559   73.423   69.005  ≈   71.3 ±  3.11%  ↓ 2.19%
Mem MB   393.090  395.899  482.264  ≈  424   ± 11.96%  ↓ 1.40%

DB restore:
real    0m20.773s 0m21.583s 0m20.522s  ≈ 20.96 ±  2.65%  ↓ 2.56%
user    0m39.322s 0m42.268s 0m38.626s  ≈ 40.07 ±  4.82%  ↓ 0.77%
sys      0m3.006s  0m3.597s  0m3.042s  ≈  3.22 ± 10.31%  ↑ 5.23%
2021-08-02 16:33:00 +03:00
Roman Khimov
b9be892bf9 storage: allow accessing MemCachedStore during Persist
Persist by its definition doesn't change MemCachedStore visible state, all KV
pairs that were acessible via it before Persist remain accessible after
Persist. The only thing it does is flushing of the current set of KV pairs
from memory to peristent store. To do that it needs read-only access to the
current KV pair set, but technically it then replaces maps, so we have to use
full write lock which makes MemCachedStore inaccessible for the duration of
Persist. And Persist can take a lot of time, it's about disk access for
regular DBs.

What we do here is we create new in-memory maps for MemCachedStore before
flushing old ones to the persistent store. Then a fake persistent store is
created which actually is a MemCachedStore with old maps, so it has exactly
the same visible state. This Store is never accessed for writes, so we can
read it without taking any internal locks and at the same time we no longer
need write locks for original MemCachedStore, we're not using it. All of this
makes it possible to use MemCachedStore as normally reads are handled going
down to whatever level is needed and writes are handled by new maps. So while
Persist for (*Blockchain).dao does its most time-consuming work we can process
other blocks (reading data for transactions and persisting storeBlock caches
to (*Blockchain).dao).

The change was tested for performance with neo-bench (single node, 10 workers,
LevelDB) on two machines and block dump processing (RC4 testnet up to 62800
with VerifyBlocks set to false) on i7-8565U.

Reference results (bbe4e9cd7b):

Ryzen 9 5950X:
RPS     23616.969 22817.086 23222.378  ≈ 23218   ± 1.72%
TPS     23047.316 22608.578 22735.540  ≈ 22797   ± 0.99%
CPU %      23.434    25.553    23.848  ≈    24.3 ± 4.63%
Mem MB    600.636   503.060   582.043  ≈   562   ± 9.22%

Core i7-8565U:
RPS     6594.007 6499.501 6572.902  ≈ 6555   ± 0.76%
TPS     6561.680 6444.545 6510.120  ≈ 6505   ± 0.90%
CPU %     58.452   60.568   62.474    ≈ 60.5 ± 3.33%
Mem MB   234.893  285.067  269.081   ≈ 263   ± 9.75%

DB restore:
real    0m22.237s 0m23.471s 0m23.409s  ≈ 23.04 ± 3.02%
user    0m35.435s 0m38.943s 0m39.247s  ≈ 37.88 ± 5.59%
sys      0m3.085s  0m3.360s  0m3.144s  ≈  3.20 ± 4.53%

After the change:

Ryzen 9 5950X:
RPS     27747.349 27407.726 27520.210  ≈ 27558   ± 0.63%  ↑ 18.69%
TPS     26992.010 26993.468 27010.966  ≈ 26999   ± 0.04%  ↑ 18.43%
CPU %      28.928    28.096    29.105  ≈    28.7 ± 1.88%  ↑ 18.1%
Mem MB    760.385   726.320   756.118  ≈   748   ± 2.48%  ↑ 33.10%

Core i7-8565U:
RPS     7783.229 7628.409 7542.340  ≈ 7651   ± 1.60%  ↑ 16.72%
TPS     7708.436 7607.397 7489.459  ≈ 7602   ± 1.44%  ↑ 16.85%
CPU %     74.899   71.020   72.697  ≈   72.9 ± 2.67%  ↑ 20.50%
Mem MB   438.047  436.967  416.350  ≈  430   ± 2.84%  ↑ 63.50%

DB restore:
real    0m20.838s 0m21.895s 0m21.794s  ≈ 21.51 ± 2.71%  ↓ 6.64%
user    0m39.091s 0m40.565s 0m41.493s  ≈ 40.38 ± 3.00%  ↑ 6.60%
sys      0m3.184s  0m2.923s  0m3.062s  ≈  3.06 ± 4.27%  ↓ 4.38%

It obviously uses more memory now and utilizes CPU more aggressively, but at
the same time it allows to improve all relevant metrics and finally reach a
situation where we process 50K transactions in less than second on Ryzen 9
5950X (going higher than 25K TPS). The other observation is much more stable
block time, on Ryzen 9 it's as close to 1 second as it could be.
2021-08-02 16:33:00 +03:00
Roman Khimov
3cebd2b129 interop: use non-Cached wrapped DAO
Cached only caches NEP-17 tracking data now, it makes no sense here.
2021-07-30 15:45:17 +03:00
Roman Khimov
fa7314ea90 dao: drop dropNEP17Cache from Cached
It's not used now.
2021-07-30 15:45:17 +03:00
Roman Khimov
49be753850 core: spread storeBlock actions to three goroutines
Block processing consists of:
 * saving block/transactions to the DB
 * executing blocks/transactions
 * processing notifications/saving AERs
 * updating MPT
 * atomically updating Blockchain state

Of these the first one is completely independent of others, it can be done in
a separate routine easily. The third one technically depends on the second,
it just doesn't have data until something is executed. At the same time it
doesn't affect future executions in any way, so we can offload
AER/notification processing to separate goroutine (while the main thread
proceeds with other transactions).

MPT update depends on all executions, so it can't be offloaded, but it can be
done concurrently to AER processing. And only the last thing actually needs
all previous ones to be finished, so it's a natural synchronization point.

So we spawn two additional routines and let the main one execute transactions
and update MPT as fast as it can. While technically all of these routines
could share single DAO (they are working with different KV sets) benchmarking
shows that using separate DAOs and then persisting them to lower one actually
works about 7-8%% better. At the same time we can simplify DAOs used, Cached
one is only relevant for AER processing because it caches NEP-17 tracking
data, everything else can do just fine with Simple.

The change was tested for performance with neo-bench (single node, 10 workers,
LevelDB) on two machines and block dump processing (RC4 testnet up to 50825
with VerifyBlocks set to false) on i7-8565U. neo-bench creates huge blocks
with lots of transactions while RC4 dump mostly consists of empty blocks.

Reference results (06c3dda5d1):

Ryzen 9 5950X:
RPS ≈ 20059.569   21186.328   20158.983   ≈ 20468   ±  3.05%
TPS ≈ 19544.993   20585.450   19658.338   ≈ 19930   ±  2.86%
CPU ≈    18.682%     23.877%     22.852%  ≈    21.8 ± 12.62%
Mem ≈   618.981MB   559.246MB   541.539MB ≈   573   ±  7.08%

Core i7-8565U:
RPS ≈ 5927.082   6526.739   6372.115   ≈ 6275   ± 4.96%
TPS ≈ 5899.531   6477.187   6329.515   ≈ 6235   ± 4.81%
CPU ≈   56.346%    61.955%    58.125%  ≈   58.8 ± 4.87%
Mem ≈  212.191MB  224.974MB  205.479MB ≈  214   ± 4.62%

DB restore:
real    0m12.683s 0m13.222s 0m13.382s  ≈ 13.096 ±  2.80%
user    0m18.501s 0m19.163s 0m19.489s  ≈ 19.051 ±  2.64%
sys      0m1.404s  0m1.396s  0m1.666s  ≈  1.489 ± 10.32%

After the change:

Ryzen 9 5950X:
RPS ≈ 23056.899   22822.015   23006.543   ≈ 22962   ± 0.54%
TPS ≈ 22594.785   22292.071   22800.857   ≈ 22562   ± 1.13%
CPU ≈    24.262%     23.185%     25.921%  ≈    24.5 ± 5.65%
Mem ≈   614.254MB   613.204MB   555.491MB ≈   594   ± 5.66%

Core i7-8565U:
RPS ≈ 6378.702   6423.927   6363.788      ≈ 6389   ± 0.49%
TPS ≈ 6327.072   6372.552   6311.179      ≈ 6337   ± 0.50%
CPU ≈   57.599%    58.622%    59.737%     ≈   58.7 ± 1.82%
Mem ≈  198.697MB  188.746MB  200.235MB    ≈  196   ± 3.18%

DB restore:
real    0m13.576s 0m13.334s 0m12.757s  ≈  13.222 ±  3.18%
user    0m19.113s 0m19.490s 0m20.197s  ≈  19.600 ±  2.81%
sys      0m2.211s  0m1.558s  0m1.559s  ≈   1.776 ± 21.21%

On Ryzen 9 we've got 12% better RPS, 13% better TPS with 12% CPU and 3% RAM
more used. Core i7-8565U changes don't seem to be statistically significant:
1.8% more RPS, 1.6% more TPS with about the same CPU and 8.5% less RAM
used. It also is 1% worse in DB restore time.

The result is somewhat expected, on a powerful machine with lots of spare
cores we get 10%+ better results while on average resource-constrained laptop it
doesn't change much (the machine is already saturated). Overall, this seems to
be worthwhile.
2021-07-30 15:45:17 +03:00
Roman Khimov
06c3dda5d1
Merge pull request #2093 from nspcc-dev/states-exchange/drop-nep17-balance-state
core: implement dynamic NEP17 balances tracking
2021-07-29 19:08:42 +03:00
Anna Shaleva
a30e48ff90 core: increment the DB version
DB scheme has been changed.
2021-07-29 10:23:13 +03:00
Anna Shaleva
e8bed184d5 core: implement dynamic NEP17 balances tracking
Request NEP17 balances from a set of NEP17 contracts instead of getting
them from storage. LastUpdatedBlock tracking remains untouched, because
there's no way to retrieve it dynamically.
2021-07-29 10:23:01 +03:00
Anna Shaleva
e46d76d7aa core: rename state.NEP17Balances to state.NEP17TransferInfo
Balances are to be removed from state.NEP17TransferInfo, so the remnant
fields are NextTransferBatch, NewBatch and a map of LastUpdatedBlocks.
These fields are more staff-related.

Also rename dao.[Get, Put, put]NEP17Balances and STNEP17Balances
preffix.

Also rename NEP17TransferInfo.Trackers to LastUpdatedBlockTrackers
because NEP17TransferInfo.Balances are to be removed.
2021-07-28 13:22:53 +03:00
Anna Shaleva
c0a2c74e0c core: maintain a set of NEP17-compliant contracts 2021-07-28 13:22:53 +03:00
Roman Khimov
50d99464e0
Merge pull request #2064 from nspcc-dev/fix-remove-stale-hang
mempool: send events in a separate goroutine
2021-07-23 18:16:14 +03:00
Evgeniy Stratonikov
3507f52c32 notary: process new transactions in a separate goroutine
Related #2063.

Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>
2021-07-23 14:48:00 +03:00
Roman Khimov
1d8ad5b84a *: simplify some error messages
Log:
2021-07-23T09:59:18.948+0300    WARN    contract invocation failed      {"tx": "de3e3c1f1d37e4528990f894dea5583fd320485ad3862a95eb0e8823eecf4a5f", "block": 9643, "error": "error encountered at instruction 1 (SYSCALL): error during call from native: error encountered at instruction 745 (CAT): invalid conversion: Map/ByteString"}

The word "error" appears 4 times here.
2021-07-23 10:08:09 +03:00
Roman Khimov
002ad9dfee go.mod: update miniredis to 2.15.1
It's only used for testing purposes and this version doesn't change anything
for us, but still better be current.
2021-07-21 23:28:26 +03:00
Roman Khimov
df07ba505a config: update mainnet magic
It's NEO3, see neo-project/neo-node#795.
2021-07-21 14:42:26 +03:00
Roman Khimov
35c2c3ae8e
Merge pull request #2078 from nspcc-dev/configurable-initial-gas
config: add InitialGASSupply, fix #2073
2021-07-20 17:10:25 +03:00
Roman Khimov
36d486a664 config: add InitialGASSupply, fix #2073
We now have 52M by default.
2021-07-20 16:59:54 +03:00
Roman Khimov
0583f252ab *: create real temporary dirs and files in tests
Improve reliability.
2021-07-20 12:51:11 +03:00
Roman Khimov
3b19b34122 storage: fix memcached test with boltdb store
Everything was wrong here, wrong file used, wrong cleanup procedure, the net
result is this (and some failing tests from time to time):

  $ ls -l /tmp/test_bolt_db* | wc -l
  30939
2021-07-20 12:35:24 +03:00
Roman Khimov
c88ebaede9
Merge pull request #2075 from nspcc-dev/small-refactoring
Array util refactoring and naming improvement
2021-07-20 11:29:59 +03:00
Roman Khimov
7477e2cd9f
Merge pull request #2074 from nspcc-dev/fix-oracle-service-behaviour
Fix oracle service behaviour
2021-07-20 11:29:39 +03:00
Roman Khimov
19717dd9a8 slice: introduce common Copy helper
It's a bit more convenient to use.
2021-07-19 22:57:55 +03:00
Roman Khimov
a54e3516d1 slice: add Reverse function, deduplicate code a bit 2021-07-19 22:57:55 +03:00
Roman Khimov
3a93977b7b oracle: only process new requests after initial sync
If an oracle node is resynchronized from the genesis the service receives all
requests from all blocks via AddRequests() invoked from the native
contract. Almost all of them are long obsolete and need to be removed, native
oracle contract will try to do that with RemoveRequests() calls, but they
won't change anything.

So queue up all "initial" requests in special map and manage it directly
before the module is Run() which happens after synchronization
completion. Then process any requests that are still active and work with new
blocks as usual.
2021-07-19 22:52:59 +03:00
Roman Khimov
588f3fbbd3 state: drop State from NEOBalance and NEP17Balance
We're in the `state` package already.
2021-07-19 22:01:07 +03:00
Roman Khimov
f4a9139a05
Merge pull request #2071 from nspcc-dev/fix-more-vm-bugs
Fix more VM bugs
2021-07-19 22:00:17 +03:00
Roman Khimov
79b1bf72aa native: check that oracle request GAS IsInt64()
We use int64 value down below, so check for IsInt64() while MinimumResponseGas
check still covers for values less than zero.
2021-07-19 15:42:42 +03:00
Roman Khimov
5f13de3a76 *: simplify some integer checks with IsUint64()
If we need a positive number we can do `IsUint64()` instead of checking that
`Int64()` is `> 0`.
2021-07-19 15:42:42 +03:00
Roman Khimov
233307aca5 stackitem: completely drop MaxArraySize
Turns out C# VM doesn't have it since preview2, so our limiting of
MaxArraySize in incompatible with it. Removing this limit shouldn't be a
problem with the reference counter we have, both APPEND and SETITEM add things
to reference counter and we can't exceed MaxStackSize. PACK on the other hand
can't get more than MaxStackSize-1 of input elements.

Unify NEWSTRUCT with NEWARRAY* and use better integer checks at the same time.

Multisig limit is still 1024.
2021-07-19 15:42:42 +03:00
Roman Khimov
aab18c3083 stackitem: introduce Convertible interface
We have a lot of native contract types that are converted to stack items
before serialization, then deserialized as stack items and converted back to
regular structures. stackitem.Convertible allows to remove a lot of repetitive
io.Serializable code.

This also introduces to/from converter in testserdes which unfortunately
required to change util tests to avoid circular references.
2021-07-19 15:42:42 +03:00
Roman Khimov
2d993d0da5 state: store notary deposit as stackitem
Which is less effective, but makes it more similar to other native contracts
that are supposed to be contracts anyway.
2021-07-19 15:42:42 +03:00
Roman Khimov
70ddbf7180 native: reuse stackitem.(De)Serialize more for data structures
Less code bloat, no functional changes.
2021-07-19 15:42:42 +03:00
Roman Khimov
4775b513f9 native: do proper error handling when deserializing user data 2021-07-19 15:42:42 +03:00
Roman Khimov
3a991abb62 fee: adjust SQRT price
See neo-project/neo#2540.
2021-07-19 15:42:41 +03:00
Roman Khimov
21e05f5779
Merge pull request #2055 from nspcc-dev/fix-gas-limits
Tune fixed GAS limits
2021-07-16 16:53:06 +03:00
Evgeniy Stratonikov
8077f9232d interop: implement System.Runtime.GetRandom
Our murmur3 implementation is architecture independent and optimized in
assembly.

Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>
2021-07-15 16:00:01 +03:00
Evgeniy Stratonikov
fdb54f2dc3 core/test: get rid of empty tx scripts
Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>
2021-07-15 15:58:49 +03:00
Evgeniy Stratonikov
3a4e0caeb8 core/block: add Nonce field to header
Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>
2021-07-15 15:58:49 +03:00
Evgeniy Stratonikov
2cd5b63f0b policy: fix max exec fee
Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>
2021-07-14 10:27:09 +03:00
Evgeniy Stratonikov
451b02122a *: increase GAS for verification
Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>
2021-07-14 10:27:09 +03:00
Roman Khimov
2b7abd20e7
Merge pull request #2057 from nspcc-dev/optimize-storage-find
Optimize `storageFind`
2021-07-13 12:51:10 +03:00
Roman Khimov
83a557f0eb
Merge pull request #2042 from nspcc-dev/oracle-not-supported
Check oracle response Content-Type header
2021-07-13 11:31:37 +03:00
Evgeniy Stratonikov
eb7bd7b99b core: optimize storageFind
Because `Map` stores elements in arbitrary order, addition of new
element takes linear time (`Index` iterates over all keys). Thus our
`storageFind` is actually quadratic in time. Optimize this by creating
map from sorted slice.

```
name           old time/op    new time/op    delta
StorageFind-8     157µs ± 2%     112µs ± 1%  -28.60%  (p=0.000 n=10+10)

name           old alloc/op   new alloc/op   delta
StorageFind-8    69.4kB ± 0%    60.5kB ± 0%  -12.90%  (p=0.000 n=9+10)

name           old allocs/op  new allocs/op  delta
StorageFind-8     2.21k ± 0%     2.00k ± 0%   -9.37%  (p=0.000 n=10+7)
```

Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>
2021-07-12 14:22:17 +03:00
Evgeniy Stratonikov
b8b272c8c4 core/test: add benchmark for storageFind
Signed-off-by: Evgeniy Stratonikov <evgeniy@nspcc.ru>
2021-07-12 14:20:45 +03:00