Evgenii Stratonikov
73d978c31e
Rewrite AVX2 loop in assembly
...
Helps to get rid of MOV and generating constants for each iteration.
```
name old time/op new time/op delta
Sum/AVX2Inline_digest-8 1.57ms ± 2% 1.41ms ± 0% -10.52% (p=0.000 n=9+9)
name old speed new speed delta
Sum/AVX2Inline_digest-8 63.6MB/s ± 1% 71.1MB/s ± 0% +11.76% (p=0.000 n=9+9)
```
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
d7c96f5d2e
Fix comments
...
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
8dd24d0195
Interleave carry registers for successive bits
...
8 instructions less per byte.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
921f8b0579
Optimize AVX implementation
...
1. Do the same mask trick as with AVX2.
2. Get rid of load, generate constant on the fly.
```
name old time/op new time/op delta
Sum/AVXInline_digest-8 2.26ms ± 4% 2.17ms ± 5% -4.05% (p=0.000 n=19+17)
name old speed new speed delta
Sum/AVXInline_digest-8 44.3MB/s ± 4% 46.2MB/s ± 5% +4.25% (p=0.000 n=19+17)
```
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
d4cb61e470
Replace two shifts with a single AND
...
We need to isolate HSB in every quad-word, this can be done with a
simple mask.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
a7201418ab
Fix linter issues
...
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
2e922115d8
Replace CircleCI with Github actions
...
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
bbbcf3fa5c
Use unaligned move in AVX2 implementation
...
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
c8a32b25ec
Optimize AVX2 implementation
...
We use 6 instructions only to calculate mask based on single bit value.
Use only 3 now and calculate multiple masks in parallel.
Also `VPSUB*` is faster than VPBROADCAST*,
see https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html .
```
name old time/op new time/op delta
Sum/AVX2Inline_digest-8 1.83ms ± 0% 1.62ms ± 1% -11.23% (p=0.000 n=46+42)
name old speed new speed delta
Sum/AVX2Inline_digest-8 54.7MB/s ± 0% 61.6MB/s ± 1% +12.65% (p=0.000 n=46+42)
```
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
a370c525ba
Replace all SSE instructions with AVX ones
...
Also use integer MOV* variant instead of floating-point one.
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
9b3f45993f
gf127: remove branch in pure Go operations
...
```
name old time/op new time/op delta
Sum/PureGo_digest-8 16.1ms ± 3% 10.4ms ± 3% -35.53% (p=0.000 n=10+10)
name old speed new speed delta
Sum/PureGo_digest-8 6.22MB/s ± 3% 9.65MB/s ± 3% +55.12% (p=0.000 n=10+10)
```
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 11:00:27 +03:00
fyrchik
33bf778066
Merge pull request #20 from nspcc-dev/use-x-sys-cpu-instead-of-self-implemented
...
Use golang.org/x/sys instead of self-implemented detector
2020-01-16 11:32:58 +03:00
Evgeniy Kulikov
77b7d87549
Use golang.org/x/sys instead of self-implemented detector
2020-01-16 11:30:46 +03:00
Evgeniy Kulikov
d4b45131cd
Update alpine image, fixup for Makefile, fixup for benchmark
2020-01-16 11:30:46 +03:00
Evgeniy Kulikov
9789dcb2b6
Ignore vendor and binary
2020-01-16 11:30:45 +03:00
fyrchik
3d96a71c03
Merge pull request #19 from nspcc-dev/feat/avx_inline
...
Speed up AVX implementation
2019-10-17 17:53:41 +03:00
Evgenii Stratonikov
a8357fda0e
Change default AVX implementation
2019-10-16 15:11:57 +03:00
Evgenii Stratonikov
5f74bbc979
Update benchmark result in README.md
...
Also simplify test's and benchmark's names.
2019-10-16 15:11:57 +03:00
Evgenii Stratonikov
4b7f39cd1d
Move mulBitRightx2 to avx2 assembly file
2019-10-16 15:11:57 +03:00
Evgenii Stratonikov
3191f1b3fd
Add AVX implementation with inlined multiplication
...
Perform multiplication by-byte instead of by-bit as
in AVX2Inline implementation.
2019-10-16 15:11:53 +03:00
fyrchik
702d2553ba
Merge pull request #18 from nspcc-dev/feat/interface
...
Restructure code layout in gf127/
2019-10-15 14:19:13 +03:00
Evgenii Stratonikov
63834fe8c1
Remove non-AVX parts from avx package
...
Remove Inv(), Mul1(), And() because right now
they have no AVX optimizations.
2019-10-15 13:22:36 +03:00
Evgenii Stratonikov
0f8b498b58
Alias gf127.GF127
2019-10-15 13:22:36 +03:00
Evgenii Stratonikov
d891a9c591
Restructure code layout
...
Provide default implementations in gf127 package and
all optimizations in subpackages. This way it will be easier
to use from a client.
2019-10-15 13:22:31 +03:00
Evgenii Stratonikov
c5fb08aece
Speed up gogf127.Mul()
...
Cache results of the shift. Also add test for checking if
implementation can work when result is one of the arguments.
2019-10-11 11:50:33 +03:00
fyrchik
b27c17ce19
Merge pull request #17 from nspcc-dev/fix/refactoring
...
Remove `unsafe` from code
2019-10-10 12:48:58 +03:00
Evgenii Stratonikov
1d4e7550fc
Use macros in AVX hash implementation
2019-10-10 11:29:40 +03:00
Evgenii Stratonikov
f296adb043
Remove usage of unsafe
2019-10-10 11:04:15 +03:00
fyrchik
5142f695cf
Merge pull request #16 from nspcc-dev/feat/cpuid
...
Move cpu id to a separate package
2019-10-09 18:18:41 +03:00
Evgenii Stratonikov
782ed7554b
Use macros in asm code
2019-10-09 18:11:53 +03:00
Evgenii Stratonikov
43033eedb1
Provide minimum go version in go.mod
2019-10-09 18:06:26 +03:00
Evgenii Stratonikov
fc059cac87
Use AVX2 only if AVX is also present
2019-10-09 18:03:39 +03:00
Evgenii Stratonikov
648b1deca7
Move cpuid facility to separate package
2019-10-09 18:03:35 +03:00
fyrchik
2470efda43
Merge pull request #15 from nspcc-dev/fix/cpu_features
...
Implement matrix multiplication with pure Go
2019-10-09 17:42:04 +03:00
Evgenii Stratonikov
f613ab2c25
Implement matrix multiplication with pure Go
...
Set suitable backend for GF127 arithmetic for Concat(), Sum() etc.
2019-10-09 12:31:47 +03:00
Evgeniy Kulikov
06362477ed
Merge pull request #13 from nspcc-dev/fix/cpuid
...
Detect CPU features in Sum()
2019-10-04 18:00:24 +03:00
Evgenii Stratonikov
38df9b2c63
Detect CPU features in Sum()
2019-10-04 17:58:42 +03:00
fyrchik
083d0ff054
Merge pull request #12 from nspcc-dev/feat/cpu_features
...
Determine available features through CPUID
2019-09-04 12:01:42 +03:00
Evgenii
63e8eeac86
Determine available features through CPUID
2019-09-04 11:47:44 +03:00
fyrchik
16d4da0a1d
Merge pull request #11 from nspcc-dev/feature/pure_go
...
Implement hashing in pure go
2019-09-04 10:52:02 +03:00
Evgenii
7c12188650
Perform allocation outside of mulBitRightPure
2019-07-19 19:04:44 +03:00
Evgenii
6c75cc0871
Add pure Go hash implementation
2019-07-19 18:59:43 +03:00
fyrchik
33f1403c28
Merge pull request #10 from nspcc-dev/feature/API_refactor
...
Add possibility to use different implementations in cli
2019-07-19 18:26:26 +03:00
Evgenii
c3cfe63e64
Add possibility to use different implementations in cli
...
Also make API smaller and more consistent and fix typos in documentation.
2019-07-19 18:24:30 +03:00
fyrchik
826ed77561
Merge pull request #9 from nspcc-dev/feature/AVX2_inline
...
Inline asm function in loop for AVX2 implementation
2019-07-19 17:54:25 +03:00
Evgenii
c68e38b943
Inline asm function in loop for AVX2 implementation
...
Right now AVX2 implementation looses to C binding in speed.
This is probably, because of 2 things:
1. Go does not inline `mulBitRightx2` in loop iteration.
2. `minmax` is loaded every time from memory.
In this PR:
1. Unroll `mulBitRightx2` manually and use `mulByteRightx2` instead.
2. Generate `minmax` in place without `LOAD/LEA` instructions.
2019-07-19 16:11:06 +03:00
fyrchik
dd15c90530
Merge pull request #8 from nspcc-dev/pureGo
...
Add pure-go GF(2^127) implementation
2019-07-19 12:06:24 +03:00
Evgenii
5c2544cf3b
Add pure-go GF(2^127) implementation
2019-07-19 12:04:16 +03:00
fyrchik
5c06a9fa8f
Merge pull request #7 from nspcc-dev/feat/mbpers
...
Report benchmark results in MB/s
2019-07-10 13:15:00 +03:00
Evgenii
bd43de6056
Report benchmark results in MB/s
2019-07-10 12:07:54 +03:00