Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add a compile-on-demand layer #44575

Closed
wants to merge 4 commits into from
Closed

WIP: Add a compile-on-demand layer #44575

wants to merge 4 commits into from

Conversation

pchintalapudi
Copy link
Member

Compile on demand comes with some very nice theoretical benefits:

  • We only optimize methods when we first call them (so we don't waste time in LLVM optimization if we never physically jump to the method in question)
  • We drop the codegen lock before any LLVM optimization
    • This probably will help the most, as it decreases parallel codegen contention between multiple threads.

But it requires some pretty high dependencies:

  • JITLink must be enabled (our codegen memory manager is too restrictive for the workload of arbitrary memory allocations)
  • Therefore needs LLVM14 or higher (JITLink ELF support on x86 only arrives in LLVM14)

It's also currently broken, fails with an extra symbol error during sysimage build.

I'm guessing there's a lot of implicit dependencies on other pull requests, mostly due to refactors which reduce the number of codegen global dependencies/thread safety bugs (#44573, #44454, and #44440 come to mind as a start) because of us not holding the codegen lock during optimization anymore.

@pchintalapudi pchintalapudi added compiler:codegen Generation of LLVM IR and native code compiler:latency Compiler latency compiler:llvm For issues that relate to LLVM labels Mar 12, 2022
@gbaraldi
Copy link
Member

On the m1 mac this seems to work very well, I'm running my PR (basically this but without the memorymanager stuff) through the OrdinaryDiffeEq tests to see what happens and for now it seems to be doing just fine.

@gbaraldi
Copy link
Member

gbaraldi commented Mar 25, 2022

Lazy Compilation:

Test Summary:            | Pass  Total     Time
Discrete Algorithm Tests |   14     14  1m02.1s
 62.305803 seconds (194.20 M allocations: 13.563 GiB, 3.46% gc time, 55.94% compilation time)
Test Summary: | Pass  Total  Time
Tstops Tests  |   24     24  5.0s
  5.046192 seconds (12.58 M allocations: 897.542 MiB, 3.43% gc time, 39.43% compilation time)
Test Summary:   | Pass  Total  Time
Backwards Tests |    3      3  7.3s
  7.346019 seconds (15.73 M allocations: 1.063 GiB, 3.14% gc time, 74.90% compilation time)
Master:
     Testing Running tests...
Test Summary:            | Pass  Total     Time
Discrete Algorithm Tests |   14     14  1m03.8s
 63.971843 seconds (202.77 M allocations: 14.236 GiB, 3.51% gc time, 84.91% compilation time)
Test Summary: | Pass  Total  Time
Tstops Tests  |   24     24  5.4s
  5.422227 seconds (13.55 M allocations: 983.036 MiB, 2.80% gc time, 99.22% compilation time)
Test Summary:   | Pass  Total  Time
Backwards Tests |    3      3  7.8s
  7.773433 seconds (15.85 M allocations: 1.080 GiB, 2.93% gc time, 99.82% compilation time)

It seems to have broken the compilation time measurement.

@oscardssmith
Copy link
Member

Is this able to test TTFP yet?

@pchintalapudi
Copy link
Member Author

Is this able to test TTFP yet?

This particular branch fails to build, but I have a different branch that does build (it's a few hundred commits behind master and doesn't pass tests though).

@pchintalapudi
Copy link
Member Author

I'm also not being very careful about maintaining the compilation timers, because they assume single-threaded codegen and need to be redesigned for a multithreaded environment with concurrent codegen. (In this case, the compilation time difference could be due to the optimization pipeline not being run underneath the compilation timers).

@gbaraldi
Copy link
Member

I tested ttfp locally and it seems to be about the same. Though I don't think it was expected to change much. This should help in case where we compile lots of code and then don't call it. Not sure if plots is that however

@ancapdev
Copy link
Contributor

Would there be any way to force compilation before calling a method? Working on (soft) real-time systems, this change sounds like it will make it harder to avoid unexpected JIT lags.

@gbaraldi
Copy link
Member

I could be wrong but there is. This change however probably wouldn't affect that too much (it's something to be wary of however). It's more related to some cases where julia compiles a bunch of code that isn't called ever.

@ancapdev
Copy link
Contributor

I could be wrong but there is. This change however probably wouldn't affect that too much (it's something to be wary of however). It's more related to some cases where julia compiles a bunch of code that isn't called ever.

I see, so precompile() should still trigger full compilation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code compiler:latency Compiler latency compiler:llvm For issues that relate to LLVM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants