-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compilation bottleneck #14556
Comments
we could put them in an ObjectIdDict, but that may make it slower in the expected usage. that table is supposed to be nearly empty, but it can get populated if code is interpolating non-ast objects into the ast. perhaps we should switch to using a per-linfo list of literals? the list could re-use the codegen roots list -- it seems likely that these same objects are likely getting copied there by |
At one point we switched to per-module literal lists since many values ended up being shared by several functions. But I agree it seems silly to have both that and linfo->roots. 10k is way more than I'd expect. I wonder if we can cut that down (e.g. by serializing more things). Or just fully switch to a bytecode representation. |
It will be a little while, but to save anyone else time, I'm collecting the histogram of counts by running diff --git a/src/dump.c b/src/dump.c
index e11bff5..50ab040 100644
--- a/src/dump.c
+++ b/src/dump.c
@@ -6,6 +6,7 @@
#include <stdlib.h>
#include <string.h>
#include <assert.h>
+#include <stdio.h>
#include "julia.h"
#include "julia_internal.h"
@@ -616,8 +617,21 @@ static int is_ast_node(jl_value_t *v)
jl_is_labelnode(v) || jl_is_linenode(v) || jl_is_globalref(v);
}
+int tlvlen[50000] = { 0 };
+
+JL_DLLEXPORT void jl_write_tlvlen(void)
+{
+ FILE *f = fopen("/tmp/tlvlen.bin", "w+");
+ fwrite(tlvlen, sizeof(int), 50000, f);
+ fclose(f);
+}
+
static int literal_val_id(jl_value_t *v)
{
+ int n = jl_array_len(tree_literal_values);
+ if (n >= 50000)
+ n = 49999;
+ tlvlen[n] += 1;
for(int i=0; i < jl_array_len(tree_literal_values); i++) {
if (jl_egal(jl_cellref(tree_literal_values,i), v))
return i;
diff --git a/src/julia.h b/src/julia.h
index b93c268..03545fc 100644
--- a/src/julia.h
+++ b/src/julia.h
@@ -1678,6 +1678,8 @@ JL_DLLEXPORT extern const char* jl_ver_string(void);
JL_DLLEXPORT const char *jl_git_branch(void);
JL_DLLEXPORT const char *jl_git_commit(void);
+ JL_DLLEXPORT void jl_write_tlvlen(void);
+
// nullable struct representations
typedef struct {
uint8_t isnull;
diff --git a/test/runtests.jl b/test/runtests.jl
index 969c72e..6b7cb00 100644
--- a/test/runtests.jl
+++ b/test/runtests.jl
@@ -72,3 +72,5 @@ cd(dirname(@__FILE__)) do
@unix_only n > 1 && rmprocs(workers(), waitfor=5.0)
println(" \033[32;1mSUCCESS\033[0m")
end
+
+ccall(:jl_write_tlvlen, Void, ()) Will report back when it finishes. |
It might also make sense to count the total iterations spent in that loop (# of jl_egal calls). Or maybe you have that already? |
Wait, I was totally borked on those statistics. Median value is 10024, mean value is 11305. |
I decided to profile the test suite. I had to exclude a few tests ( Analysis script (whose purpose was to show "flat" results that don't suffer from the duplicates-due-to-recursion problem as illustrated by While it's a shame that so many important lines lack good backtrace info (here's hoping that @Keno's work will help), to me it looks as if |
Unfortunately, I'm probably not as much help here. I briefly implemented a more efficient/accurate profiler on top of Gallium, but it only worked on OS X and had a number of bugs that made it essentially unusable. That said, I do plan to revisit the topic of profiling once I've made some progress on the debugger front. I can say however, that I concur with your analysis of jl_egal being a significant chunk of our compile time. I don't have the traces anymore, but I remember pointing this fact out to @JeffBezanson |
Bummer about the backtraces. Hopefully we'll get there some day. I also just remembered that long ago I convinced myself (rightly or wrongly) that most of the borked backtraces are almost surely from our many C functions that are declared |
Well, the backtraces shouldn't be super borked. What OS are you on? |
See more info here. |
Kubuntu 14.04.3 LTS, Trusty Tahr |
Hmm, should be fine then. On OS X you have to jump through some hoops to make the dSYM available such that we can find it to get at the debug line info. |
This affects every computer I've tested that's running Kubuntu. I seem to remember noticing a CentOS machine did better, however (can't test anymore, that machine was wiped clean and replaced with Kubuntu). I could try to set up an account for you on my machine---it won't be trivial because here they're pretty draconian about the firewall, but I suspect I can find a way. |
I did a full run of the tests (on latest master) with VTune XE 2016. If anyone want to download the traces, I put them up here (80M file, maybe I should have used a longer trace interval): https://dl.dropboxusercontent.com/u/14340581/julia_tests_vtune.tar.gz. Note, if too many download them, it is likely that Dropbox disable the file download. Here are a few screenshots: |
I get a 403 from the last two links---I didn't try the first link. But that's what I was guessing from your email to julia-dev (I take it you fixed the line numbers?). Does that mean that 11.2% of the total test-suite time is that one line?? Surprising that the line above it doesn't take any appreciable time, given that this is a pointer comparison and the line above it also calls Also, can you tell how many calls are on each line? If it usually exits at line 294, that might explain why the later lines aren't so expensive despite the fact that they involve more operations. Another oddity is that the fact this appears at all in the backtraces leads me to guess that |
Updated screenshot links, do they work now? |
Nope. |
@timholy After refreshing they worked for me. |
Thanks, @vchuravy. Wow, that is a standout. |
CC @ArchRobison, who I suspect is the local expert on VTune. |
On Mon, Jan 4, 2016 at 12:03 PM, Kristoffer Carlsson <
I am actually looking at profiles myself right now. Will see what I can Overhead Command Shared Object Symbol If the issue is similar to last time it should be easy to figure out. |
Thanks for the explanation. Also, good that two different tools seem to agree. |
There was indeed again one function call in jl_egal which caused a stack I'll prepare a formal pull request in a bit but first I want to see whether The question about this is: do people think it's time to do these types of diff --git a/src/builtins.c b/src/builtins.c // primitives-static int bits_equal(void a, void *b, int sz) On Mon, Jan 4, 2016 at 12:44 PM, Kristoffer Carlsson <
|
Guess it depends just how ugly, but there's a fair amount of platform specific pieces in the C runtime already so I'd think platform-specific optimizations would be fair game as long as the changes are maintainable going forward. |
I used The output is sorted by L2 cache misses, and recursive calls to As an outsider I'm wondering: There is a function that compares types for equality, but no function for a "less than" relation or hash function? Does this mean that types are stored mostly in lists and arrays, and not in trees or heaps or hash tables? |
See If you read from the top of this issue, you'll see that the number of calls to |
@eschnett There is jl_object_id, which is a hash function that pairs with jl_egal. However for types these are structural, and do not implement type equality (in the sense of |
@drepper Yes jl_egal should be quite stable now and I think we can handle some ugliness there. |
On Tue, Jan 5, 2016 at 12:14 PM, Jeff Bezanson [email protected]
The ugliness will spread, unfortunately. The jl_egal function cannot be |
Would that be equivalent to making it a |
I agree with restricting interposition for jl_egal; replacing it would not be a good idea anyway. However it's also true that the bigger problem is us just calling it too often. |
#14656 might fix this. |
Exciting! Looking forward to testing it. Thanks so much for tackling this. |
I don't understand the compile toolchain very well, and I haven't yet gotten good profile info (see https://groups.google.com/d/msg/julia-dev/RKAahOwppNs/Kg0TFx_SBwAJ), but at least for the SubArray tests I'm guessing that this line, via
jl_compress_ast
, is an enormous bottleneck in compilation. I inserted a debugging line,and saw that most of the lengths were around 10k or so. That's a heck of a lot of calls to
jl_egal
. Is there anything that can be done to reduce this number?The text was updated successfully, but these errors were encountered: