c68e38b943
Right now AVX2 implementation looses to C binding in speed. This is probably, because of 2 things: 1. Go does not inline `mulBitRightx2` in loop iteration. 2. `minmax` is loaded every time from memory. In this PR: 1. Unroll `mulBitRightx2` manually and use `mulByteRightx2` instead. 2. Generate `minmax` in place without `LOAD/LEA` instructions. |
||
---|---|---|
.. | ||
avx2_unroll_amd64.s | ||
hash.go | ||
hash_avx2.go | ||
hash_avx2_inline.go | ||
hash_test.go | ||
sl2.go | ||
sl2_test.go | ||
tzbits_amd64.s |