Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: code generates unnecessary mov-s #65192

Open
egonelbre opened this issue Jan 21, 2024 · 0 comments
Open

cmd/compile: code generates unnecessary mov-s #65192

egonelbre opened this issue Jan 21, 2024 · 0 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@egonelbre
Copy link
Contributor

egonelbre commented Jan 21, 2024

Go version

go1.21.6

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/egon/Library/Caches/go-build'
GOENV='/Users/egon/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/egon/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/egon/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.21.6'
GCCGO='gccgo'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/egon/tmp/opt/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/yr/rzc9gn3d1mddybrx9v7220x80000gn/T/go-build390599184=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Compiling the following code leads to assembly that's suboptimal assembly:

func Axpy(alpha float32, xs *float32, incx uintptr, ys *float32, incy uintptr, n uintptr) {
	xp := unsafe.Pointer(xs)
	yp := unsafe.Pointer(ys)
	xn := unsafe.Add(xp, 4*n*incx)
	for uintptr(xp) < uintptr(xn) {
		*(*float32)(yp) += alpha * *(*float32)(xp)
		xp, yp = unsafe.Add(xp, 4*incx), unsafe.Add(yp, 4*incy)
	}
}

The optimization seems to be missing both on arm64 and amd64. I'm currently only showing the output from arm64, because the amd64 is similar.

What did you see happen?

The code gets compiled into this:

TEXT main.Axpy(SB) /Users/egon/tmp/opt/main.go
	xn := unsafe.Add(xp, 4*n*incx)
  0x100056880		d37ef484		LSL $2, R4, R4		
  0x100056884		9b040024		MADD R4, R0, R1, R4	
	for uintptr(xp) < uintptr(xn) {
  0x100056888		14000008		JMP 8(PC)		
		*(*float32)(yp) += alpha * *(*float32)(xp)
  0x10005688c		bd400041		FMOVS (R2), F1		
  0x100056890		bd4000a2		FMOVS (R5), F2		
  0x100056894		1f000441		FMADDS F0, F1, F2, F1	
  0x100056898		bd000041		FMOVS F1, (R2)		
		xp, yp = unsafe.Add(xp, 4*incx), unsafe.Add(yp, 4*incy)
  0x10005689c		8b0108a0		ADD R1<<2, R5, R0	// <-- related to R0, R5 juggling
  0x1000568a0		8b030842		ADD R3<<2, R2, R2	
	for uintptr(xp) < uintptr(xn) {
  0x1000568a4		aa0603e4		MOVD R6, R4		// <------------------
  0x1000568a8		aa0003e5		MOVD R0, R5		// <------------------
  0x1000568ac		aa0403e6		MOVD R4, R6		// <------------------
  0x1000568b0		eb00009f		CMP R0, R4		
  0x1000568b4		54fffec8		BHI -10(PC)		
}
  0x1000568b8		d65f03c0		RET			
  0x1000568bc		00000000		?			

What did you expect to see?

I would've expected code more in the lines of:

        LSL     $2, R4, R4
        MADD    R4, R0, R1, R4
        JMP     check_boundary
loop:
        FMOVS   (R2), F1
        FMOVS   (R5), F2
        FMADDS  F0, F1, F2, F1
        FMOVS   F1, (R2)
        ADD     R1<<2, R5, R5
        ADD     R3<<2, R2, R2
check_boundary:
        CMP     R5, R4
        BHI     loop
        RET

PS: I just realized that maybe that's happening because it's trying to preserve the register state for returning from the func... if that's the case, the whole logic could happen on registers that don't need to be preserved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

4 participants