Commit graph

4 commits

Author SHA1 Message Date
Evgenii
c68e38b943 Inline asm function in loop for AVX2 implementation
Right now AVX2 implementation looses to C binding in speed.
This is probably, because of 2 things:
1. Go does not inline `mulBitRightx2` in loop iteration.
2. `minmax` is loaded every time from memory.

In this PR:
1. Unroll `mulBitRightx2` manually and use `mulByteRightx2` instead.
2. Generate `minmax` in place without `LOAD/LEA` instructions.
2019-07-19 16:11:06 +03:00
Evgenii
ad8c7bce1b Fix type assertions 2019-06-24 10:07:16 +03:00
Evgenii
4b11f50264 Fix error in AVX2 implementation 2019-06-21 23:10:08 +03:00
Evgenii
9485f49f3b Get rid of unsafe usage and add tests 2019-06-21 22:32:32 +03:00