forked from TrueCloudLab/tzhash
Replace two shifts with a single AND
We need to isolate HSB in every quad-word, this can be done with a simple mask. Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
This commit is contained in:
parent
a7201418ab
commit
d4cb61e470
1 changed files with 5 additions and 4 deletions
|
@ -10,8 +10,7 @@
|
||||||
VPALIGNR $8, Y1, Y0, Y2 \
|
VPALIGNR $8, Y1, Y0, Y2 \
|
||||||
VPSRLQ $63, Y2, Y2 \
|
VPSRLQ $63, Y2, Y2 \
|
||||||
VPXOR Y1, Y2, Y2 \
|
VPXOR Y1, Y2, Y2 \
|
||||||
VPSRLQ $63, Y1, Y3 \
|
VPAND Y1, Y14, Y3 \
|
||||||
VPSLLQ $63, Y3, Y3 \
|
|
||||||
VPUNPCKHQDQ Y3, Y3, Y3 \
|
VPUNPCKHQDQ Y3, Y3, Y3 \
|
||||||
VPXOR Y2, Y3, Y3 \
|
VPXOR Y2, Y3, Y3 \
|
||||||
mask(bit, Y11, Y2) \
|
mask(bit, Y11, Y2) \
|
||||||
|
@ -28,8 +27,10 @@ TEXT ·mulByteRightx2(SB),NOSPLIT,$0
|
||||||
VMOVDQU (BX), Y8
|
VMOVDQU (BX), Y8
|
||||||
|
|
||||||
VPXOR Y13, Y13, Y13 // Y13 = 0x0000...
|
VPXOR Y13, Y13, Y13 // Y13 = 0x0000...
|
||||||
VPCMPEQB Y12, Y12, Y12 // Y12 = 0xFFFF...
|
VPCMPEQB Y14, Y14, Y14 // Y14 = 0xFFFF...
|
||||||
VPSUBW Y12, Y13, Y12 // Y12 = 0x00010001... (packed words of 1)
|
VPSUBQ Y14, Y13, Y10
|
||||||
|
VPSUBW Y14, Y13, Y12 // Y12 = 0x00010001... (packed words of 1)
|
||||||
|
VPSLLQ $63, Y10, Y14 // Y14 = 0x10000000... (packed quad-words with HSB set)
|
||||||
|
|
||||||
VPBROADCASTB b+16(FP), X10 // X10 = packed bytes of b.
|
VPBROADCASTB b+16(FP), X10 // X10 = packed bytes of b.
|
||||||
VPMOVZXBW X10, Y10 // Extend with zeroes to packed words.
|
VPMOVZXBW X10, Y10 // Extend with zeroes to packed words.
|
||||||
|
|
Loading…
Reference in a new issue