Replace store of freeze in allocop and in emit new struct with memset since aggregate stores are bad #55879

gbaraldi · 2024-09-25T20:10:48Z

This fixes the issues found in slack in the reinterprets of

julia> split128_v2(x::UInt128) = (first(reinterpret(NTuple{2, UInt}, x)), last(reinterpret(NTuple{2, UInt}, x)))
split128_v2 (generic function with 1 method)

julia> split128(x::UInt128) = reinterpret(NTuple{2, UInt}, x)
split128 (generic function with 1 method)

@code_native  split128(UInt128(5))
	push	rbp
	mov	rbp, rsp
	mov	rax, rdi
	mov	qword ptr [rdi + 8], rdx
	mov	qword ptr [rdi], rsi
	pop	rbp
	ret

@code_native  split128_v2(UInt128(5))
	push	rbp
	mov	rbp, rsp
	mov	rax, rdi
	mov	qword ptr [rdi], rsi
	mov	qword ptr [rdi + 8], rdx
	pop	rbp
	ret

vs on master where

julia> @code_native  split128(UInt128(5))

	push	rbp
	mov	rbp, rsp
	mov	eax, esi
	shr	eax, 8
	mov	ecx, esi
	shr	ecx, 16
	mov	r8, rsi
	mov	r9, rsi
	vmovd	xmm0, esi
	vpinsrb	xmm0, xmm0, eax, 1
	mov	rax, rsi
	vpinsrb	xmm0, xmm0, ecx, 2
	mov	rcx, rsi
	shr	esi, 24
	vpinsrb	xmm0, xmm0, esi, 3
	shr	r8, 32
	vpinsrb	xmm0, xmm0, r8d, 4
	shr	r9, 40
	vpinsrb	xmm0, xmm0, r9d, 5
	shr	rax, 48
	vpinsrb	xmm0, xmm0, eax, 6
	shr	rcx, 56
	vpinsrb	xmm0, xmm0, ecx, 7
	vpinsrb	xmm0, xmm0, edx, 8
	mov	eax, edx
	shr	eax, 8
	vpinsrb	xmm0, xmm0, eax, 9
	mov	eax, edx
	shr	eax, 16
	vpinsrb	xmm0, xmm0, eax, 10
	mov	eax, edx
	shr	eax, 24
	vpinsrb	xmm0, xmm0, eax, 11
	mov	rax, rdx
	shr	rax, 32
	vpinsrb	xmm0, xmm0, eax, 12
	mov	rax, rdx
	shr	rax, 40
	vpinsrb	xmm0, xmm0, eax, 13
	mov	rax, rdx
	shr	rax, 48
	vpinsrb	xmm0, xmm0, eax, 14
	mov	rax, rdi
	shr	rdx, 56
	vpinsrb	xmm0, xmm0, edx, 15
	vmovdqu	xmmword ptr [rdi], xmm0
	pop	rbp
	ret

… since aggregate stores are bad

src/llvm-alloc-opt.cpp

src/cgutils.cpp

…th memset since aggregate stores are bad (#55879) This fixes the issues found in slack in the reinterprets of ```julia julia> split128_v2(x::UInt128) = (first(reinterpret(NTuple{2, UInt}, x)), last(reinterpret(NTuple{2, UInt}, x))) split128_v2 (generic function with 1 method) julia> split128(x::UInt128) = reinterpret(NTuple{2, UInt}, x) split128 (generic function with 1 method) @code_native split128(UInt128(5)) push rbp mov rbp, rsp mov rax, rdi mov qword ptr [rdi + 8], rdx mov qword ptr [rdi], rsi pop rbp ret @code_native split128_v2(UInt128(5)) push rbp mov rbp, rsp mov rax, rdi mov qword ptr [rdi], rsi mov qword ptr [rdi + 8], rdx pop rbp ret ``` vs on master where ```julia julia> @code_native split128(UInt128(5)) push rbp mov rbp, rsp mov eax, esi shr eax, 8 mov ecx, esi shr ecx, 16 mov r8, rsi mov r9, rsi vmovd xmm0, esi vpinsrb xmm0, xmm0, eax, 1 mov rax, rsi vpinsrb xmm0, xmm0, ecx, 2 mov rcx, rsi shr esi, 24 vpinsrb xmm0, xmm0, esi, 3 shr r8, 32 vpinsrb xmm0, xmm0, r8d, 4 shr r9, 40 vpinsrb xmm0, xmm0, r9d, 5 shr rax, 48 vpinsrb xmm0, xmm0, eax, 6 shr rcx, 56 vpinsrb xmm0, xmm0, ecx, 7 vpinsrb xmm0, xmm0, edx, 8 mov eax, edx shr eax, 8 vpinsrb xmm0, xmm0, eax, 9 mov eax, edx shr eax, 16 vpinsrb xmm0, xmm0, eax, 10 mov eax, edx shr eax, 24 vpinsrb xmm0, xmm0, eax, 11 mov rax, rdx shr rax, 32 vpinsrb xmm0, xmm0, eax, 12 mov rax, rdx shr rax, 40 vpinsrb xmm0, xmm0, eax, 13 mov rax, rdx shr rax, 48 vpinsrb xmm0, xmm0, eax, 14 mov rax, rdi shr rdx, 56 vpinsrb xmm0, xmm0, edx, 15 vmovdqu xmmword ptr [rdi], xmm0 pop rbp ret ```

Replace store of freeze in allocop and in emit new struct with memset…

2cce1bd

… since aggregate stores are bad

gbaraldi requested a review from vtjnash September 25, 2024 20:23

vtjnash reviewed Sep 25, 2024

View reviewed changes

src/llvm-alloc-opt.cpp Show resolved Hide resolved

vtjnash reviewed Sep 25, 2024

View reviewed changes

src/cgutils.cpp Show resolved Hide resolved

gbaraldi added 5 commits September 26, 2024 14:30

Apply suggestions from code review

a891be2

Merge branch 'master' into gb/memset

a3df253

Merge branch 'master' into gb/memset

6ea886d

Fix promotion_point mishap

dd89975

Fix llvmpasses tests

9c6875c

gbaraldi added performance Must go faster merge me PR is reviewed. Merge when all tests are passing and removed merge me PR is reviewed. Merge when all tests are passing labels Oct 16, 2024

gbaraldi requested a review from vtjnash October 18, 2024 14:15

vtjnash merged commit e5aff12 into master Oct 18, 2024
10 checks passed

vtjnash deleted the gb/memset branch October 18, 2024 15:01

oscardssmith added compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM labels Oct 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace store of freeze in allocop and in emit new struct with memset since aggregate stores are bad #55879

Replace store of freeze in allocop and in emit new struct with memset since aggregate stores are bad #55879

gbaraldi commented Sep 25, 2024

Replace store of freeze in allocop and in emit new struct with memset since aggregate stores are bad #55879

Replace store of freeze in allocop and in emit new struct with memset since aggregate stores are bad #55879

Conversation

gbaraldi commented Sep 25, 2024