NEON intrinsics are broken on big-endian

These are currently broken because the order of elements inside vectors is reversed on big-endian systems: the ARM ABI requires that element 0 is located at the highest address of the vector type. However LLVM intrinsics expect element 0 to be located at the lowest address.

See https://llvm.org/docs/BigEndianNEON.html and `arm_neon.h` in Clang for more details.