Provide default implementations in gf127 package and all optimizations in subpackages. This way it will be easier to use from a client.
AVX2 permits working with 256-bit registers. Thus we can multiply 2 GF(2^127) elements in parallel. This commit adds 2 such functions for multiplication by 10 and 11).