Helps to get rid of MOV and generating constants for each iteration.
```
name old time/op new time/op delta
Sum/AVX2Inline_digest-8 1.57ms ± 2% 1.41ms ± 0% -10.52% (p=0.000 n=9+9)
name old speed new speed delta
Sum/AVX2Inline_digest-8 63.6MB/s ± 1% 71.1MB/s ± 0% +11.76% (p=0.000 n=9+9)
```
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>
We use 6 instructions only to calculate mask based on single bit value.
Use only 3 now and calculate multiple masks in parallel.
Also `VPSUB*` is faster than VPBROADCAST*,
see https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html .
```
name old time/op new time/op delta
Sum/AVX2Inline_digest-8 1.83ms ± 0% 1.62ms ± 1% -11.23% (p=0.000 n=46+42)
name old speed new speed delta
Sum/AVX2Inline_digest-8 54.7MB/s ± 0% 61.6MB/s ± 1% +12.65% (p=0.000 n=46+42)
```
Signed-off-by: Evgenii Stratonikov <evgeniy@nspcc.ru>