-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compilation time regression on nightly due to #41099 #50082
Comments
Unfortunately I won't be a lot of help on my own, though happy to do what I can with some pointers. Is there a minimal test you could distill from this that I could use? At a minimum I can isolate which part of the PR may be responsible. Though I'd guess it's from storing MethodInstances in LineInfoNodes, somehow. But I may be a bit lost trying to figure out where in the compiler that's causing a problem. |
I tried to find a minimal testcase, but the effect seems to disappear once I disable too many other tests. This somehow seems to depend on the amount of tests that are run before and the effect is a lot worse for the later tests: For my previous test runs I had our container test groups like this, with SparseMatrix at the end taking very long:
with reverts:
While trying to reduce this I tried reordering the test groups and got this,
with reverts:
Is it possible that there are some caches, lookup tables, ... that are somehow getting large and slow? |
I suppose we know all the slowdown is in here: Lines 767 to 858 in c4d162e
I wonder if the timing mechanism could be extended to narrow down which specific portions are slower? Though an educated guess would be that the linetables have more information in them and, therefore, take longer to compress? (But I may be reading this wrong...) Lines 830 to 842 in c4d162e
|
That does fit my observations, it looks like it is spending quite some time in compressing the LineInfoNodes that now store the Methods / MethodInstances instead of the name. I repeatedly hit the running process with
I also managed to hit such a backtrace with gdb which gives a bit more information:
Frame 7 is the The next frame just unpacks that array. Frame 5 is for encoding the LineInfoNode, and we are at
And then for this encoding (via
This second loop seems to take quite a while, there are also quite a lot of roots.
Looks like it can, I added
So if this works as expected, almost all the time inside AST_COMPRESS seems to be spent in this PS: Line numbers correspond to commit f407a4c. Edit:
|
That's awesome, thanks for poking into it! Love identifying a good quadratic-time algorithm to optimize. Does Julia need to look up the root each time it's being compressed? If there are many copies of the same method instances, that's a lot of unnecessary repeated work. Can the root be stored with the method instance at an earlier point, so the lookup is only performed once? Can the lookup operation be turned into a dictionary access rather than array iteration, so the lookup itself is constant-time? |
@JeffBezanson @timholy - Since you've both most recently (2-3 years ago :-p) touched Trying to gauge if this is feasible for me to tackle, or if it's a much bigger problem to untangle. |
That array should never be long, since that can cause a myriad of other problems. How is that happening and how can that be avoided? |
I added a little bit of code to print the maximal size for that array:
On master (more precisely f407a4c, including #41099) our testsuite ends with:
On my other build where I reverted #41099 (see my earlier posts for the precise commits) I get:
13894 seems quite large as well? But I guess it is a lot better than 111575. I really don't understand the details, I am just trying to debug what is happening here. So adding the Methods / MethodInstances to those LineNumberNodes seems to trigger a lot more roots? |
Was just exploring the same kind of testing. When Julia is started for the first time after building with that PR ( With the commit just before the PR merge, this count is 870,750. EDIT: (The fact that it's almost exactly double seems suspicious.) |
It appears that So the |
Some more details from running our testsuite, inspired by the plots from #14556, number of times a specific roots array length is encountered in Total number of calls to
|
I also noticed that running tests of a package of mine on c58e508 takes sensibly longer than on v1.9, but haven't investigate what's the cause of the longer runtime. |
Thanks for that plot - some other observations from Julia on first run:
|
The Base method with the most roots after the PR is Before the PR it is |
Again with a fresh Julia build/start: Before the PR, After the PR, So... is adding these roots incorrect in some way, or just inconvenient for the current implementation? And particularly bad for certain generic methods that have numerous, possibly single-use, specializations? Is there documentation of what |
Generally speaking, it is incorrect, as these roots are supposed to be for syntax, not compiler-generated content. Do we need to revert that PR, or can we partly disable whatever part is breaking this array? |
When you say the roots are for syntax, what does that mean? The types of the roots of
Should Core.MethodInstance not be in there? (If that's what you mean by compiler-generated content?) I'm looking back up the stack, in The method roots get added in the fallback We could add a case inside the |
Looking more:
CodeInfo objects contain the linetable, and a (For the Looking at the roots of
Here, a MethodInstance contained in a specialization of
Though this doesn't show up in the linetable:
Is one issue that |
Instructive to compare with the roots of this
Observations:
|
What if... The names of the methods of Then the question would be where EDIT: After trying to implement something like this, learned of a flaw: in |
I've hit this again, we spend almost 10% of the time in some workloads on the test suite just doing |
Ugh, yeah that might be a good idea - the part that would need to be reverted is minimal, the stacktrace lookup methods could be left in and ready. But hopefully it can be restored in 1.11. |
@BioTurboNick could I bother you to open a PR to temporarily revert and mark it as 1.10 backport? |
I have been trying to debug a regression in the compilation time for many tests of Polymake.jl, it started with this PkgEval report https://s3.amazonaws.com/julialang-reports/nanosoldier/pkgeval/by_date/2023-04/27/report.html where the tests jumped from about 13 minutes to a timeout (>45min). The tests are mostly checking compatibility of our own vector and matrix types with the corresponding julia types (for different element types). These types are mostly defined via CxxWrap and a corresponding libcxxwrap module.
In our own CI we see similar effects with nightly taking more than twice as long for the tests as 1.8 or 1.9.
I added a
@time
to several of the affected test groups and the output looks like this:So this is indeed compilation time and not related to the actual computations. That timing was for this code block.
I bisected the regression to #41099 by @BioTurboNick but I don't really know how to debug this further.
In these timings below I noticed most of the extra time seems to be spent in
AST_COMPRESS
, which went from 4% to 51%.CODEGEN_LLVM
also went up a bit. But I don't really know what to make of this now, especially why or how our code is triggering this in combination with that PR.Timings
I ran some tests on a nightly build with
WITH_TIMING_COUNTS=1
. I added some extra verbosity to the test sets to see more timings and disabled several test sets which do not exhibit this problem.This is for the current master (Commit f407a4c):
And from another build where I reverted #41099 (and three other commits which I needed to revert to avoid conflicts, see at the end of this issue [1]):
The corresponding test timings for one test group which illustrates the problem quite well. This is for master:
And with the reverted commits:
Other packages
While looking at the PkgEval report mentioned earlier I noticed that MultivariatePolynomials.jl might have similar issues, at least for some of its tests:
before and after, from the second block with
MutableArithmetics
, the last one before the tests were killed.additional notes:
The text was updated successfully, but these errors were encountered: