Commit graph

33 commits

Author SHA1 Message Date
Evgenii Stratonikov
c8a32b25ec Optimize AVX2 implementation
We use 6 instructions only to calculate mask based on single bit value.
Use only 3 now and calculate multiple masks in parallel.

Also `VPSUB*` is faster than VPBROADCAST*,
see https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html .

```
name                     old time/op    new time/op    delta
Sum/AVX2Inline_digest-8    1.83ms ± 0%    1.62ms ± 1%  -11.23%  (p=0.000 n=46+42)

name                     old speed      new speed      delta
Sum/AVX2Inline_digest-8  54.7MB/s ± 0%  61.6MB/s ± 1%  +12.65%  (p=0.000 n=46+42)
```

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
a370c525ba Replace all SSE instructions with AVX ones
Also use integer MOV* variant instead of floating-point one.

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgeniy Kulikov
77b7d87549
Use golang.org/x/sys instead of self-implemented detector 2020-01-16 11:30:46 +03:00
Evgenii Stratonikov
a8357fda0e Change default AVX implementation 2019-10-16 15:11:57 +03:00
Evgenii Stratonikov
5f74bbc979 Update benchmark result in README.md
Also simplify test's and benchmark's names.
2019-10-16 15:11:57 +03:00
Evgenii Stratonikov
4b7f39cd1d Move mulBitRightx2 to avx2 assembly file 2019-10-16 15:11:57 +03:00
Evgenii Stratonikov
3191f1b3fd Add AVX implementation with inlined multiplication
Perform multiplication by-byte instead of by-bit as
in AVX2Inline implementation.
2019-10-16 15:11:53 +03:00
Evgenii Stratonikov
63834fe8c1 Remove non-AVX parts from avx package
Remove Inv(), Mul1(), And() because right now
they have no AVX optimizations.
2019-10-15 13:22:36 +03:00
Evgenii Stratonikov
0f8b498b58 Alias gf127.GF127 2019-10-15 13:22:36 +03:00
Evgenii Stratonikov
d891a9c591 Restructure code layout
Provide default implementations in gf127 package and
all optimizations in subpackages. This way it will be easier
to use from a client.
2019-10-15 13:22:31 +03:00
Evgenii Stratonikov
1d4e7550fc Use macros in AVX hash implementation 2019-10-10 11:29:40 +03:00
Evgenii Stratonikov
f296adb043 Remove usage of unsafe 2019-10-10 11:04:15 +03:00
Evgenii Stratonikov
782ed7554b Use macros in asm code 2019-10-09 18:11:53 +03:00
Evgenii Stratonikov
fc059cac87 Use AVX2 only if AVX is also present 2019-10-09 18:03:39 +03:00
Evgenii Stratonikov
648b1deca7 Move cpuid facility to separate package 2019-10-09 18:03:35 +03:00
Evgenii Stratonikov
f613ab2c25 Implement matrix multiplication with pure Go
Set suitable backend for GF127 arithmetic for Concat(), Sum() etc.
2019-10-09 12:31:47 +03:00
Evgenii Stratonikov
38df9b2c63 Detect CPU features in Sum() 2019-10-04 17:58:42 +03:00
Evgenii
63e8eeac86 Determine available features through CPUID 2019-09-04 11:47:44 +03:00
Evgenii
7c12188650 Perform allocation outside of mulBitRightPure 2019-07-19 19:04:44 +03:00
Evgenii
6c75cc0871 Add pure Go hash implementation 2019-07-19 18:59:43 +03:00
Evgenii
c3cfe63e64 Add possibility to use different implementations in cli
Also make API smaller and more consistent and fix typos in documentation.
2019-07-19 18:24:30 +03:00
Evgenii
c68e38b943 Inline asm function in loop for AVX2 implementation
Right now AVX2 implementation looses to C binding in speed.
This is probably, because of 2 things:
1. Go does not inline `mulBitRightx2` in loop iteration.
2. `minmax` is loaded every time from memory.

In this PR:
1. Unroll `mulBitRightx2` manually and use `mulByteRightx2` instead.
2. Generate `minmax` in place without `LOAD/LEA` instructions.
2019-07-19 16:11:06 +03:00
Evgenii
bd43de6056 Report benchmark results in MB/s 2019-07-10 12:07:54 +03:00
Evgenii
ad8c7bce1b Fix type assertions 2019-06-24 10:07:16 +03:00
Evgenii
e1d9fc8058 Use testify in tests 2019-06-21 23:18:16 +03:00
Evgenii
4b11f50264 Fix error in AVX2 implementation 2019-06-21 23:10:08 +03:00
Evgenii
eaeceead2f Add benchmarks 2019-06-21 22:40:17 +03:00
Evgenii
9485f49f3b Get rid of unsafe usage and add tests 2019-06-21 22:32:32 +03:00
Evgenii
a967cc9d3d Make use of AVX2 in Sum() by default 2019-06-21 18:47:01 +03:00
Evgeniy Kulikov
6b644651fa
Rewrite tests (#3)
- rewrite tests
- remove gomega from deps
2019-05-29 14:10:17 +03:00
Evgenii
d5efd8bdce add SubtractR/L operation on hashes
- add Inverse operation to sl2
- fix a bug in xN()
2019-01-29 16:11:50 +03:00
Evgeniy Kulikov
42499b9eb0
Fix formatting 2019-01-03 11:04:43 +03:00
Evgeniy Kulikov
5cf44c62ac
Initial 2018-12-29 16:04:17 +03:00