-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rebar3 Erlang compiler: performance improvements #2322
Conversation
This commit makes rebar_compiler_erl to start a gen_server that acts as a filesystem cache, avoiding re-scanning it over and over.
543a1b7
to
725a732
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this PR as something we need for large setups. I'm currently not noticing any performance drop on smaller repos (i.e. rebar3's own), though there's one bit I highlighted where error handling needs improving.
If you're short on time for these, I can do the changes myself.
src/rebar_compiler_dag.erl
Outdated
not lists:member(Path, AbsIncls)], | ||
%% Add the rest | ||
[digraph:add_edge(G, Source, Incl) || Incl <- AbsIncls], | ||
%% admittedly, mark_dirty does not need to happen every time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It now needs to happen every time, since it has been taken out of the call-site doing the spawning in most cases. Since this function is now expected to be called on a change, the comment is outdated and mark_dirty
must happen every time to cover for the parent site no longer doing it. It's a bit of a messy split in responsibility now so updating the comment to mention that would make snese.
src/rebar_compiler_dag.erl
Outdated
end, | ||
Control ! {deps, self(), AbsIncls} | ||
end | ||
). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be safer to just have a spawn_link
to propagate crashes or a spawn_monitor
to catch and report them in the finalise call. The custom compilers here can be user-defined plugins and I would expect Compiler:dependencies/3-4
call to be possibly failure prone.
The reporting does not necessarily need to be crystal clear (just the stacktrace is what we had before), but as of this patch, we would instead likely just hang indefinitely in case of a compiler plugin crash.
Ah good, @ferd, my main question was going to be, "what is the effect on small and medium size projects". |
Most of the high cost in general would come from copying top-level The coordination cost is also likely to be rather small there, and offset by the DAG loading and cache for this specific PR. I'm always at around 12s for a full rebar3 rebuild from scratch (was at ~11s here) and ~1s for partial builds (including erts start as an escript), which is on par with current master on my VPS where I did the previous benchmarks. |
Yes, I tested this on smaller codebases, there are no regressions (other than memory, for cases when a very large/deep source tree is cached). |
Instead of sequentially running *.erl (or any other) files one-by-one, spawn a separate process, in a scatter-gather way, then wait for all processes to complete. Combined with the previous change it introduces a significant speed-up for large repositories, up to a number of cores that the host has.
Saves a few seconds when loading DAG with thousands of files. Unfortunately, uses internal knowledge about digraph implementation.
725a732
to
8fca720
Compare
Addressed review comments. |
Yeah we can always refactor at a later point, but get this in 3.14 first. |
{NewFs, Files} = list_directory(Dir, Fs), | ||
case lists:member(Name, Files) of | ||
true -> | ||
{{true, filename:join(Dir, Name)}, NewFs}; | ||
false -> | ||
resolve(Name, NewFs, Tail) | ||
end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently writing a fix for this; this search needs to be recursive in order to work since we support directory recursion. Not having it is currently causing issues for some builds in 3.14, but I'll try to re-shape it such that it doesn't cause problems in terms of performance if you don't need recursion to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems there's already some recursion below, so there might just be an issue with properly returning sub-paths rather than the top directory.
This was due to a subtle mistake in the new code in erlang#2322 which optimized the search behaviour of files not found by EPP. The patch mistakenly always kept the top-level src/ directory for an app even if it did a lookup in recursive ones. This patch tracks and maintains the proper subdirectory hierarchy. To keep performance adequate, the lists:keyfind/3 function is used, which is also now a NIF to the same extent as lists:member/2; manual lookups would likely end up reducing performance, particularly in deep hierarchies. A test case has been added to track the regression. Reported by @elbrujohalcon
This PR contains 3 commits, that work best together, but could also be applied independently.
First, for large projects, rebar_compiler_erl spends vast majority of time re-scanning file system in order to find parse_transform/behaviour file by the name (e.g. "cth_readable" parse_transform that is applied to all files for "TEST" profile). When there are several hundred directories to scan, it becomes a bottleneck.
Caching results of this scan helps to bring down analysis of a large repo from 11 minutes to just 3, on 8-core intel CPU.
Second improvement runs EPP concurrently within an application, saving considerable amount of time for large applications.
Last change circumvents documented interfaces for digraph save/load, and uses internal undocumented structures. Better approach would be to make a PR to digraph module in OTP, but it will take several years, and a dangerous switch "don't check if the graph is acyclic".