Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC/WIP: Try to run gcroot lowering after some LLVM optimizations #17363

Merged
merged 3 commits into from
Jul 21, 2016

Conversation

yuyichao
Copy link
Contributor

I believe some constant propagation (mostly store to load forwarding) would help some GC frame optimization to identify possible duplicate roots. Instead of making our own for special cases, it would be nice if we could let llvm do that. This requires running some LLVM passes before we do the GC frame lowering.

This PR solves the easy part of it, i.e. making the lowering a true LLVM pass and making sure we have enough information in the optimized IR for liveness analysis. (mostly to make sure that the julia.gc_kill won't be deleted due to dead branch).

The goal is to at least be able to run the lowering after mem2reg (and the instcombine after it). A lot of issues still need to be solve.

  1. We need to trace back bitcast and gep when identifying the address of gcframe stores/loads
  2. Need to make sure we can still identify the use of the value after llvm optimizes the IR (possibly by adding more explicit marks in the IR)

If this approach looks good and tests passes, we can probably merge this first and improve the algorithm later.

Based on #17352 to reduce conflicts.

c.c. @vtjnash

@yuyichao yuyichao added compiler:codegen Generation of LLVM IR and native code GC Garbage collector labels Jul 10, 2016
@yuyichao yuyichao force-pushed the yyc/codegen/gc-pass branch 2 times, most recently from dad001b to 1455659 Compare July 10, 2016 23:02
}
// Mark GC use before **and** after the llvmcall to make sure the arguments
// are alive during the llvmcall even if the llvmcall has `unreachable`.
// If the llvmcall generates GC safepoint, it might need to emit it's own
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its own

typedef unsigned id;
enum {
// an assignment to a gcroot exists in the basic-block
// (potentially no live-in from the predecessor basic-blocks)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "live-in" mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value in the slot when getting into this basic block is live at that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could that be explained in a handful more words? having confusing jargon in the comments sort of defeats the explanatory purpose of them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just reading some doc about liveness analysis and I've seen a few use of it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a reference link would work too

@JeffBezanson
Copy link
Member

Does this fix #17342?

@yuyichao
Copy link
Contributor Author

Does this fix #17342

Unfortunately no (This PR is only about code rearrangement and fixing incompatibility rather than improving the optimizations). With the load forwarding done before the gc frame lowering, it should be much easier to fix that issue though....

@@ -52,6 +54,9 @@ static void addOptimizationPasses(T *PM)
#ifndef INSTCOMBINE_BUG
PM->add(createInstructionCombiningPass()); // Cleanup for scalarrepl.
#endif
// Let the InstCombine pass remove the unnecessary load of
// safepoint address first
PM->add(createLowerPTLSPass(imaging_mode, tbaa_const));
Copy link
Member

@vtjnash vtjnash Jul 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a module pass, but addOptimizationPasses was templated to accept the FunctionPassManager too. I think we only need to make module pass managers now (which might also make the LowerGCFramePass easier, since it can walk the uses list instead of walking the Function Instructions), so it might be good to update the function signature.

fixed upstream

@yuyichao yuyichao force-pushed the yyc/codegen/gc-pass branch 4 times, most recently from 8a53e5f to 1c114f5 Compare July 16, 2016 01:35

// This file defines two entry points:
// global function annotateSimdLoop: mark a loop as a SIMD loop.
// createLowerSimdLoopPass: construct LLVM for lowering a marked loop later.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these comments don't seem accurate

// DLLImport only needs to be set for the shadow module
// it just gets annoying in the JIT
proto->setDLLStorageClass(GlobalValue::DefaultStorageClass);
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can go away (it isn't needed unless you call addComdat)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if we bring addComdat back we'd need to add this back?

Copy link
Member

@vtjnash vtjnash Jul 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's permanently gone. besides, we don't need comdat on variables or dllimports.

but that does bring up the point that this declaration will be wrong for a no-threads build on windows, since at the end of the pass, it'll be missing the DLLImport attribute (usually would have been fixed up by the merge-module pass).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what needs to be fixed here? I have no idea how DLLImport works on windows....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just have to mark AoT which symbols will be statically linked into the binary and which will be found via the dynamic linker (dllimport)

@vtjnash
Copy link
Member

vtjnash commented Jul 18, 2016

lgtm. can you rebase & update, to make sure there aren't logical merge conflicts with the codegen / jitlayers split, & merge.

@yuyichao yuyichao force-pushed the yyc/codegen/gc-pass branch 2 times, most recently from 265e955 to 2e26033 Compare July 19, 2016 13:14
proto->setDLLStorageClass(GlobalValue::DLLImportStorageClass);
#else
proto->setLinkage(GlobalValue::DLLImportLinkage);
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vtjnash like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

* Make the pass more robust against dead code elimination
* Remove module level setup code (so that the the GC frame lowering can be
  a `FunctionPass`)
* Create PTLS lowering (module) pass since it needs to be run after the GC frame
  lowering
* Set DLLImport attribute for `jl_tls_states` on windows,
  since it is not handled by module merger anymore
@vtjnash vtjnash merged commit 1fcb81c into master Jul 21, 2016
@vtjnash vtjnash deleted the yyc/codegen/gc-pass branch July 21, 2016 19:16
maleadt added a commit to JuliaGPU/CUDAdrv.jl that referenced this pull request Jul 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code GC Garbage collector
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants