Provide Golang native SIMD intrinsics on x86/amd64 platform
package main
import (
"fmt"
"github.com/mengzhuo/intrinsic/sse2"
)
func main() {
src := []float32{3.14, 2.17}
dst := []float32{2.17, 3.15}
sse2.MAXSDm64float32(src, dst)
fmt.Print(src, dst) //[2.17 3.15] [2.17 3.15]
}
SSE2 it will provide about 6x-7x performance enhancement.
BenchmarkPMINUBByte-4 1000000000 2.65 ns/op 0 B/op 0 allocs/op
BenchmarkGeneralPMINUBByte-4 100000000 15.8 ns/op 0 B/op 0 allocs/op
BenchmarkPAND-4 1000000000 2.61 ns/op 0 B/op 0 allocs/op
BenchmarkGeneralAND-4 100000000 15.4 ns/op 0 B/op 0 allocs/op
All codes in subdir is generated by scanner.go , see Makefile for more detail.
x86.csv and x86desc.csv are from another repos in https://github.com/mengzhuo/x86data
- resolve immediate opcode generate
- SSE2 gen=80, total=141, ratio=56.74%
- SSE3 gen=6, total=10, ratio=60.00%
- SSSE3 gen=15, total=32, ratio=46.88%
- SSE4_1 gen=26, total=49, ratio=53.06%
- SSE4_2 gen=1, total=5, ratio=20.00%
- AVX gen=66, total=378, ratio=17.46%
- AVX2 gen=8, total=159, ratio=5.03%
- FMA