tzhash

Author	SHA1	Message	Date
Evgenii Stratonikov	3a90fa7a76	Add more tests Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-01-17 17:18:36 +03:00
Evgenii Stratonikov	73d978c31e	Rewrite AVX2 loop in assembly Helps to get rid of MOV and generating constants for each iteration. ``` name old time/op new time/op delta Sum/AVX2Inline_digest-8 1.57ms ± 2% 1.41ms ± 0% -10.52% (p=0.000 n=9+9) name old speed new speed delta Sum/AVX2Inline_digest-8 63.6MB/s ± 1% 71.1MB/s ± 0% +11.76% (p=0.000 n=9+9) ``` Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-01-17 17:18:36 +03:00
Evgenii Stratonikov	d7c96f5d2e	Fix comments Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-01-17 17:18:36 +03:00
Evgenii Stratonikov	8dd24d0195	Interleave carry registers for successive bits 8 instructions less per byte. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-01-17 17:18:36 +03:00
Evgenii Stratonikov	921f8b0579	Optimize AVX implementation 1. Do the same mask trick as with AVX2. 2. Get rid of load, generate constant on the fly. ``` name old time/op new time/op delta Sum/AVXInline_digest-8 2.26ms ± 4% 2.17ms ± 5% -4.05% (p=0.000 n=19+17) name old speed new speed delta Sum/AVXInline_digest-8 44.3MB/s ± 4% 46.2MB/s ± 5% +4.25% (p=0.000 n=19+17) ``` Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-01-17 17:18:36 +03:00
Evgenii Stratonikov	d4cb61e470	Replace two shifts with a single AND We need to isolate HSB in every quad-word, this can be done with a simple mask. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2022-01-17 17:18:36 +03:00
Evgenii Stratonikov	a7201418ab	Fix linter issues Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2021-12-29 13:23:05 +03:00
Evgenii Stratonikov	2e922115d8	Replace CircleCI with Github actions Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2021-12-29 13:23:05 +03:00
Evgenii Stratonikov	bbbcf3fa5c	Use unaligned move in AVX2 implementation Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2021-12-29 13:23:05 +03:00
Evgenii Stratonikov	c8a32b25ec	Optimize AVX2 implementation We use 6 instructions only to calculate mask based on single bit value. Use only 3 now and calculate multiple masks in parallel. Also `VPSUB` is faster than VPBROADCAST, see https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html . ``` name old time/op new time/op delta Sum/AVX2Inline_digest-8 1.83ms ± 0% 1.62ms ± 1% -11.23% (p=0.000 n=46+42) name old speed new speed delta Sum/AVX2Inline_digest-8 54.7MB/s ± 0% 61.6MB/s ± 1% +12.65% (p=0.000 n=46+42) ``` Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2021-12-29 13:23:05 +03:00
Evgenii Stratonikov	a370c525ba	Replace all SSE instructions with AVX ones Also use integer MOV* variant instead of floating-point one. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2021-12-29 13:23:05 +03:00
Evgenii Stratonikov	9b3f45993f	gf127: remove branch in pure Go operations ``` name old time/op new time/op delta Sum/PureGo_digest-8 16.1ms ± 3% 10.4ms ± 3% -35.53% (p=0.000 n=10+10) name old speed new speed delta Sum/PureGo_digest-8 6.22MB/s ± 3% 9.65MB/s ± 3% +55.12% (p=0.000 n=10+10) ``` Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>	2021-12-29 11:00:27 +03:00
fyrchik	33bf778066	Merge pull request #20 from nspcc-dev/use-x-sys-cpu-instead-of-self-implemented Use golang.org/x/sys instead of self-implemented detector	2020-01-16 11:32:58 +03:00
Evgeniy Kulikov	77b7d87549	Use golang.org/x/sys instead of self-implemented detector	2020-01-16 11:30:46 +03:00
Evgeniy Kulikov	d4b45131cd	Update alpine image, fixup for Makefile, fixup for benchmark	2020-01-16 11:30:46 +03:00
Evgeniy Kulikov	9789dcb2b6	Ignore vendor and binary	2020-01-16 11:30:45 +03:00
fyrchik	3d96a71c03	Merge pull request #19 from nspcc-dev/feat/avx_inline Speed up AVX implementation	2019-10-17 17:53:41 +03:00
Evgenii Stratonikov	a8357fda0e	Change default AVX implementation	2019-10-16 15:11:57 +03:00
Evgenii Stratonikov	5f74bbc979	Update benchmark result in README.md Also simplify test's and benchmark's names.	2019-10-16 15:11:57 +03:00
Evgenii Stratonikov	4b7f39cd1d	Move mulBitRightx2 to avx2 assembly file	2019-10-16 15:11:57 +03:00
Evgenii Stratonikov	3191f1b3fd	Add AVX implementation with inlined multiplication Perform multiplication by-byte instead of by-bit as in AVX2Inline implementation.	2019-10-16 15:11:53 +03:00
fyrchik	702d2553ba	Merge pull request #18 from nspcc-dev/feat/interface Restructure code layout in gf127/	2019-10-15 14:19:13 +03:00
Evgenii Stratonikov	63834fe8c1	Remove non-AVX parts from avx package Remove Inv(), Mul1(), And() because right now they have no AVX optimizations.	2019-10-15 13:22:36 +03:00
Evgenii Stratonikov	0f8b498b58	Alias gf127.GF127	2019-10-15 13:22:36 +03:00
Evgenii Stratonikov	d891a9c591	Restructure code layout Provide default implementations in gf127 package and all optimizations in subpackages. This way it will be easier to use from a client.	2019-10-15 13:22:31 +03:00
Evgenii Stratonikov	c5fb08aece	Speed up gogf127.Mul() Cache results of the shift. Also add test for checking if implementation can work when result is one of the arguments.	2019-10-11 11:50:33 +03:00
fyrchik	b27c17ce19	Merge pull request #17 from nspcc-dev/fix/refactoring Remove `unsafe` from code	2019-10-10 12:48:58 +03:00
Evgenii Stratonikov	1d4e7550fc	Use macros in AVX hash implementation	2019-10-10 11:29:40 +03:00
Evgenii Stratonikov	f296adb043	Remove usage of unsafe	2019-10-10 11:04:15 +03:00
fyrchik	5142f695cf	Merge pull request #16 from nspcc-dev/feat/cpuid Move cpu id to a separate package	2019-10-09 18:18:41 +03:00
Evgenii Stratonikov	782ed7554b	Use macros in asm code	2019-10-09 18:11:53 +03:00
Evgenii Stratonikov	43033eedb1	Provide minimum go version in go.mod	2019-10-09 18:06:26 +03:00
Evgenii Stratonikov	fc059cac87	Use AVX2 only if AVX is also present	2019-10-09 18:03:39 +03:00
Evgenii Stratonikov	648b1deca7	Move cpuid facility to separate package	2019-10-09 18:03:35 +03:00
fyrchik	2470efda43	Merge pull request #15 from nspcc-dev/fix/cpu_features Implement matrix multiplication with pure Go	2019-10-09 17:42:04 +03:00
Evgenii Stratonikov	f613ab2c25	Implement matrix multiplication with pure Go Set suitable backend for GF127 arithmetic for Concat(), Sum() etc.	2019-10-09 12:31:47 +03:00
Evgeniy Kulikov	06362477ed	Merge pull request #13 from nspcc-dev/fix/cpuid Detect CPU features in Sum()	2019-10-04 18:00:24 +03:00
Evgenii Stratonikov	38df9b2c63	Detect CPU features in Sum()	2019-10-04 17:58:42 +03:00
fyrchik	083d0ff054	Merge pull request #12 from nspcc-dev/feat/cpu_features Determine available features through CPUID	2019-09-04 12:01:42 +03:00
Evgenii	63e8eeac86	Determine available features through CPUID	2019-09-04 11:47:44 +03:00
fyrchik	16d4da0a1d	Merge pull request #11 from nspcc-dev/feature/pure_go Implement hashing in pure go	2019-09-04 10:52:02 +03:00
Evgenii	7c12188650	Perform allocation outside of mulBitRightPure	2019-07-19 19:04:44 +03:00
Evgenii	6c75cc0871	Add pure Go hash implementation	2019-07-19 18:59:43 +03:00
fyrchik	33f1403c28	Merge pull request #10 from nspcc-dev/feature/API_refactor Add possibility to use different implementations in cli	2019-07-19 18:26:26 +03:00
Evgenii	c3cfe63e64	Add possibility to use different implementations in cli Also make API smaller and more consistent and fix typos in documentation.	2019-07-19 18:24:30 +03:00
fyrchik	826ed77561	Merge pull request #9 from nspcc-dev/feature/AVX2_inline Inline asm function in loop for AVX2 implementation	2019-07-19 17:54:25 +03:00
Evgenii	c68e38b943	Inline asm function in loop for AVX2 implementation Right now AVX2 implementation looses to C binding in speed. This is probably, because of 2 things: 1. Go does not inline `mulBitRightx2` in loop iteration. 2. `minmax` is loaded every time from memory. In this PR: 1. Unroll `mulBitRightx2` manually and use `mulByteRightx2` instead. 2. Generate `minmax` in place without `LOAD/LEA` instructions.	2019-07-19 16:11:06 +03:00
fyrchik	dd15c90530	Merge pull request #8 from nspcc-dev/pureGo Add pure-go GF(2^127) implementation	2019-07-19 12:06:24 +03:00
Evgenii	5c2544cf3b	Add pure-go GF(2^127) implementation	2019-07-19 12:04:16 +03:00
fyrchik	5c06a9fa8f	Merge pull request #7 from nspcc-dev/feat/mbpers Report benchmark results in MB/s	2019-07-10 13:15:00 +03:00

1 2

72 commits