Commit graph

7 commits

Author SHA1 Message Date
Evgenii Stratonikov
0fa6b1314e *: format assembly code with asmfmt
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-21 12:30:08 +03:00
Evgenii Stratonikov
921f8b0579 Optimize AVX implementation
1. Do the same mask trick as with AVX2.
2. Get rid of load, generate constant on the fly.

```
name                    old time/op    new time/op    delta
Sum/AVXInline_digest-8    2.26ms ± 4%    2.17ms ± 5%  -4.05%  (p=0.000 n=19+17)

name                    old speed      new speed      delta
Sum/AVXInline_digest-8  44.3MB/s ± 4%  46.2MB/s ± 5%  +4.25%  (p=0.000 n=19+17)
```

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
a370c525ba Replace all SSE instructions with AVX ones
Also use integer MOV* variant instead of floating-point one.

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
4b7f39cd1d Move mulBitRightx2 to avx2 assembly file 2019-10-16 15:11:57 +03:00
Evgenii Stratonikov
3191f1b3fd Add AVX implementation with inlined multiplication
Perform multiplication by-byte instead of by-bit as
in AVX2Inline implementation.
2019-10-16 15:11:53 +03:00
Evgenii Stratonikov
1d4e7550fc Use macros in AVX hash implementation 2019-10-10 11:29:40 +03:00
Evgenii
c3cfe63e64 Add possibility to use different implementations in cli
Also make API smaller and more consistent and fix typos in documentation.
2019-07-19 18:24:30 +03:00
Renamed from tz/tzbits_amd64.s (Browse further)