Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tbaa_gcframe #13463

Merged
merged 2 commits into from
Apr 2, 2016
Merged

tbaa_gcframe #13463

merged 2 commits into from
Apr 2, 2016

Conversation

yuyichao
Copy link
Contributor

@yuyichao yuyichao commented Oct 6, 2015

Add tbaa_gcframe and use it to decorate all stores and loads to/from the GC frame.
This doesn't rely on the the codegen rewrite so it should be easily back ported (probably after 0.4.0).

This is also yet another way to fix the performance issue in #13459 and #13461 .

Close #13301 (Load from gc frame is still bad but it doesn't prevent vectorization anymore)

@vtjnash @simonster .

@carnaval , you also mentioned this during JuliaCon.

@yuyichao yuyichao added performance Must go faster compiler:codegen Generation of LLVM IR and native code labels Oct 6, 2015
@yuyichao
Copy link
Contributor Author

yuyichao commented Oct 6, 2015

..... hopefully compatible with LLVM 3.3 this time....

Edit: OK, that failed, so there's basically no way around ifdef's....

@yuyichao
Copy link
Contributor Author

yuyichao commented Oct 6, 2015

Finally make it compatible with LLVM 3.3 and all CI passes...

@KristofferC
Copy link
Member

Can confirm that for example fill!(A, 0.) and A[:] = 0. has the same performance on this branch.

@vtjnash
Copy link
Member

vtjnash commented Oct 6, 2015

I'm concerned this can manage to walk too far through a gcframe and mark a load of a field of a variable. We don't usually emit code of that form currently, but I don't think such a scenario is too unlikely. For the PR, I think marking the load/store in the emit_var / emit_assignment sections should be sufficient. The temporary variables slot usages will always be closely preceding a function call anyways.

For testing against multiple llvm versions locally, what I do is make configure O=build-llvm37 && cd build-llvm37 && echo 'LLVM_VER = 3.7.0' >> Make.user && make. It'll inherit any settings from your global Make.user file, then override those with the settings in the subfolder.

@yuyichao
Copy link
Contributor Author

yuyichao commented Oct 6, 2015

I'm concerned this can manage to walk too far through a gcframe and mark a load of a field of a variable.

Will this matter? If we endup storing random structures (as oppose to just the pointer to them) to the GC frame, those load are still going to be from the GC frame since AFAIK the instructions I'm following getelementptr and bitcast only does pointer algorithms and not load. (i.e. gcframe[offset]->field won't be marked).

@vtjnash
Copy link
Member

vtjnash commented Oct 6, 2015

it matters for alias analysis, since you could have another load/store to the same location that doesn't get marked the same way

@yuyichao
Copy link
Contributor Author

yuyichao commented Apr 1, 2016

I rebased this on top of current master. It is still tracing the gep's for some instructions but should be more fine grain than before. Depending on how we add stack allocated object, in the worst case we can add a special gc root allocation intrinsics that is ignored by this marking (if we simply allocate it on the stack with a tag we might not need to do anything special).

With this patch the issue in #15717 still exists (seems that there's too many stores for llvm to naively hoist out of the loop) but SIMD finally works with -O3.

@vtjnash

@yuyichao
Copy link
Contributor Author

yuyichao commented Apr 1, 2016

runbenchmarks("simd", vs = "JuliaLang/julia:master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load from GC frame preventing vectorization
4 participants