Commit graph

90 commits

Author SHA1 Message Date
489dcf4d8b [#3] Change license to Apache 2.0
Written approvals from all original authors and contributors were
received.

Signed-off-by: Stanislav Bogatyrev <s.bogatyrev@yadro.com>
2023-02-13 15:16:25 +03:00
3a7bdcc020 [#3] Simplify demo and benchmark
No need to build everything every time and no need to do it in docker.

Signed-off-by: Stanislav Bogatyrev <s.bogatyrev@yadro.com>
2023-02-13 15:16:25 +03:00
a5347ee68e [#3] Simplify Makefile
Signed-off-by: Stanislav Bogatyrev <s.bogatyrev@yadro.com>
2023-02-13 15:16:25 +03:00
9f80f99aed [#3] Update README and Contributing guide
Preparing to start accepting PRs from everybody.

Signed-off-by: Stanislav Bogatyrev <s.bogatyrev@yadro.com>
2023-02-13 15:16:25 +03:00
4d1b95c926 Move from nspcc-dev to TrueCloudLab
Signed-off-by: Evgenii Stratonikov <e.stratonikov@yadro.com>
2022-12-12 21:40:06 +03:00
Evgenii Stratonikov
85abb43253 tz: initialize digest in Sum
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-07-06 18:05:21 +03:00
Evgenii Stratonikov
3de3046074 tz: optimize AVX2 implementation
1. Perform masking with 2 instructions instead of 3 (use arithmetic
   shift).
2. Broadcast data byte in one instruction at the start of byte-processing
3. Reorder instructions to reduce the amount of data hazards and resources
   contention.

```
name               old time/op    new time/op    delta
Sum/AVX2_digest-8    1.39ms ± 0%    1.22ms ± 0%  -12.18%  (p=0.000 n=9+7)

name               old speed      new speed      delta
Sum/AVX2_digest-8  71.7MB/s ± 0%  81.7MB/s ± 0%  +13.87%  (p=0.000 n=9+7)
```

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-22 12:25:13 +03:00
Evgenii Stratonikov
defa61ce8f go.mod: bump to go1.16
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-21 12:30:08 +03:00
Evgenii Stratonikov
f4cc7726e9 benchmark: fix shellcheck issues
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-21 12:30:08 +03:00
Evgenii Stratonikov
83ba541725 tz: add comments to public functions
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-21 12:30:08 +03:00
Evgenii Stratonikov
0d764a51b7 *: rename ByteArray to Bytes
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-21 12:30:08 +03:00
Evgenii Stratonikov
0e0d28e82f tz: use build tags for different implemenations
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-21 12:30:08 +03:00
Evgenii Stratonikov
3491e7c5ea Makefile: add target for testing generic implementation
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-21 12:30:08 +03:00
Evgenii Stratonikov
026731b260 gf127: use build tags for different implemenations
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-21 12:30:08 +03:00
Evgenii Stratonikov
0fa6b1314e *: format assembly code with asmfmt
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-03-21 12:30:08 +03:00
Evgenii Stratonikov
1520cde665 tz: fix package comments
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-02-25 14:53:59 +03:00
Evgenii Stratonikov
337819d130 tz: export checksum size
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-02-25 14:53:59 +03:00
Evgenii Stratonikov
0078ce6e1d go.mod: update dependencies
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-24 13:58:13 +03:00
Evgenii Stratonikov
3a90fa7a76 Add more tests
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
73d978c31e Rewrite AVX2 loop in assembly
Helps to get rid of MOV and generating constants for each iteration.

```
name                     old time/op    new time/op    delta
Sum/AVX2Inline_digest-8    1.57ms ± 2%    1.41ms ± 0%  -10.52%  (p=0.000 n=9+9)

name                     old speed      new speed      delta
Sum/AVX2Inline_digest-8  63.6MB/s ± 1%  71.1MB/s ± 0%  +11.76%  (p=0.000 n=9+9)
```

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
d7c96f5d2e Fix comments
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
8dd24d0195 Interleave carry registers for successive bits
8 instructions less per byte.

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
921f8b0579 Optimize AVX implementation
1. Do the same mask trick as with AVX2.
2. Get rid of load, generate constant on the fly.

```
name                    old time/op    new time/op    delta
Sum/AVXInline_digest-8    2.26ms ± 4%    2.17ms ± 5%  -4.05%  (p=0.000 n=19+17)

name                    old speed      new speed      delta
Sum/AVXInline_digest-8  44.3MB/s ± 4%  46.2MB/s ± 5%  +4.25%  (p=0.000 n=19+17)
```

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
d4cb61e470 Replace two shifts with a single AND
We need to isolate HSB in every quad-word, this can be done with a
simple mask.

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2022-01-17 17:18:36 +03:00
Evgenii Stratonikov
a7201418ab Fix linter issues
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
2e922115d8 Replace CircleCI with Github actions
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
bbbcf3fa5c Use unaligned move in AVX2 implementation
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
c8a32b25ec Optimize AVX2 implementation
We use 6 instructions only to calculate mask based on single bit value.
Use only 3 now and calculate multiple masks in parallel.

Also `VPSUB*` is faster than VPBROADCAST*,
see https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html .

```
name                     old time/op    new time/op    delta
Sum/AVX2Inline_digest-8    1.83ms ± 0%    1.62ms ± 1%  -11.23%  (p=0.000 n=46+42)

name                     old speed      new speed      delta
Sum/AVX2Inline_digest-8  54.7MB/s ± 0%  61.6MB/s ± 1%  +12.65%  (p=0.000 n=46+42)
```

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
a370c525ba Replace all SSE instructions with AVX ones
Also use integer MOV* variant instead of floating-point one.

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 13:23:05 +03:00
Evgenii Stratonikov
9b3f45993f gf127: remove branch in pure Go operations
```
name                 old time/op    new time/op    delta
Sum/PureGo_digest-8    16.1ms ± 3%    10.4ms ± 3%  -35.53%  (p=0.000 n=10+10)

name                 old speed      new speed      delta
Sum/PureGo_digest-8  6.22MB/s ± 3%  9.65MB/s ± 3%  +55.12%  (p=0.000 n=10+10)
```

Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
2021-12-29 11:00:27 +03:00
fyrchik
33bf778066
Merge pull request #20 from nspcc-dev/use-x-sys-cpu-instead-of-self-implemented
Use golang.org/x/sys instead of self-implemented detector
2020-01-16 11:32:58 +03:00
Evgeniy Kulikov
77b7d87549
Use golang.org/x/sys instead of self-implemented detector 2020-01-16 11:30:46 +03:00
Evgeniy Kulikov
d4b45131cd
Update alpine image, fixup for Makefile, fixup for benchmark 2020-01-16 11:30:46 +03:00
Evgeniy Kulikov
9789dcb2b6
Ignore vendor and binary 2020-01-16 11:30:45 +03:00
fyrchik
3d96a71c03
Merge pull request #19 from nspcc-dev/feat/avx_inline
Speed up AVX implementation
2019-10-17 17:53:41 +03:00
Evgenii Stratonikov
a8357fda0e Change default AVX implementation 2019-10-16 15:11:57 +03:00
Evgenii Stratonikov
5f74bbc979 Update benchmark result in README.md
Also simplify test's and benchmark's names.
2019-10-16 15:11:57 +03:00
Evgenii Stratonikov
4b7f39cd1d Move mulBitRightx2 to avx2 assembly file 2019-10-16 15:11:57 +03:00
Evgenii Stratonikov
3191f1b3fd Add AVX implementation with inlined multiplication
Perform multiplication by-byte instead of by-bit as
in AVX2Inline implementation.
2019-10-16 15:11:53 +03:00
fyrchik
702d2553ba
Merge pull request #18 from nspcc-dev/feat/interface
Restructure code layout in gf127/
2019-10-15 14:19:13 +03:00
Evgenii Stratonikov
63834fe8c1 Remove non-AVX parts from avx package
Remove Inv(), Mul1(), And() because right now
they have no AVX optimizations.
2019-10-15 13:22:36 +03:00
Evgenii Stratonikov
0f8b498b58 Alias gf127.GF127 2019-10-15 13:22:36 +03:00
Evgenii Stratonikov
d891a9c591 Restructure code layout
Provide default implementations in gf127 package and
all optimizations in subpackages. This way it will be easier
to use from a client.
2019-10-15 13:22:31 +03:00
Evgenii Stratonikov
c5fb08aece Speed up gogf127.Mul()
Cache results of the shift. Also add test for checking if
implementation can work when result is one of the arguments.
2019-10-11 11:50:33 +03:00
fyrchik
b27c17ce19
Merge pull request #17 from nspcc-dev/fix/refactoring
Remove `unsafe` from code
2019-10-10 12:48:58 +03:00
Evgenii Stratonikov
1d4e7550fc Use macros in AVX hash implementation 2019-10-10 11:29:40 +03:00
Evgenii Stratonikov
f296adb043 Remove usage of unsafe 2019-10-10 11:04:15 +03:00
fyrchik
5142f695cf
Merge pull request #16 from nspcc-dev/feat/cpuid
Move cpu id to a separate package
2019-10-09 18:18:41 +03:00
Evgenii Stratonikov
782ed7554b Use macros in asm code 2019-10-09 18:11:53 +03:00
Evgenii Stratonikov
43033eedb1 Provide minimum go version in go.mod 2019-10-09 18:06:26 +03:00