forked from TrueCloudLab/tzhash
c8a32b25ec
We use 6 instructions only to calculate mask based on single bit value. Use only 3 now and calculate multiple masks in parallel. Also `VPSUB*` is faster than VPBROADCAST*, see https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html . ``` name old time/op new time/op delta Sum/AVX2Inline_digest-8 1.83ms ± 0% 1.62ms ± 1% -11.23% (p=0.000 n=46+42) name old speed new speed delta Sum/AVX2Inline_digest-8 54.7MB/s ± 0% 61.6MB/s ± 1% +12.65% (p=0.000 n=46+42) ``` Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru> |
||
---|---|---|
.. | ||
avx.go | ||
avx2.go | ||
avx2_amd64.s | ||
avx2_inline.go | ||
avx_amd64.s | ||
avx_inline.go | ||
hash.go | ||
hash_test.go | ||
pure.go | ||
sl2.go | ||
sl2_test.go |