Skip to content

Conversation

@juj
Copy link
Collaborator

@juj juj commented Feb 14, 2021

This PR adds support to ccache for Emscripten. The corresponding PR for emsdk is at emscripten-core/emsdk#711 and the corresponding changes to ccache itself are at https://github.com/juj/ccache/tree/emscripten.

The general mechanism is that after activating ccache in Emscripten SDK, emcc and em++ will automatically take usage of the installed ccache tool.

Usage:

emsdk install sdk-upstream-master-64bit ccache-git-emscripten-64bit
emsdk activate sdk-upstream-master-64bit  ccache-git-emscripten-64bit
cd %EMSCRIPTEN%
git remote add juj https://github.com/juj/emscripten.git
git fetch emscripten
git checkout ccache

# Example 1: rebuilding libc is fast from cache:
ccache -s # check initial ccache statistics: zero compiled files in cache
emcc --clear-cache

python embuilder.py build libc # cache miss
ccache -s # will show 862 files missed cache

ccache -z # zero ccache statistics

emcc --clear-cache
python embuilder.py build libc # cache hit!
ccache -s # now shows 862 files found from cache

# Example 2: rebuilding custom code is fast from cache:
ccache -z # zero statistics
emcc -c tests\hello_world.c -o a.o # cache miss
ccache -s # shows one file missed cache
emcc -c tests\hello_world.c -o a.o # cache hit!
ccache -s # shows one file hit cache

@juj juj mentioned this pull request Feb 14, 2021
@kripken
Copy link
Member

kripken commented Feb 16, 2021

Where in the example in the first comment is emcc told to use ccache?

Skimming the code I see EMCC_CCACHE as an env var, is that the way to opt in? Alternatively, could ccache not wrap around emcc from the outside?

exec "$PYTHON" "$0.py" "$@"
else
unset EMCC_CCACHE
exec ccache "$0" "$@"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to handle injecting ccache like this when other compilers such as clang and gcc don't? Can't we ask folks to use cache in the normal way (which I believe is to perpend it to compile command?)

Copy link
Collaborator Author

@juj juj Mar 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we ask folks to use cache in the normal way

This is the/a normal way. Ccache supports two methods of installation: 1) explicit invocation with ccache gcc ..., and 2) automatic invocation with gcc .... See the documentation of ccache.

(which I believe is to perpend it to compile command?)

That is not the full picture. See the docs above. There are two recommended ways to install ccache, this integration is ensuring that both usages work.

E.g. man documentation: https://www.freebsd.org/cgi/man.cgi?query=ccache&sektion=1&n=1

RUN MODES
       There are two ways to use ccache. You can either	prefix your
       compilation commands with ccache	or you can let ccache masquerade as
       the compiler by creating	a symbolic link	(named as the compiler)	to
       ccache. The first method	is most	convenient if you just want to try out
       ccache or wish to use it	for some specific projects. The	second method
       is most useful for when you wish	to use ccache for all your
       compilations.

Why do we need to handle injecting ccache like this when other compilers such as clang and gcc don't?

The reason that symbolic links are not used by this integration is because that would intrude with emsdk installation of either ccache or emscripten packages (say, e.g. if installing ccache package would modify installed emscripten package)

Also while Linux and macOS could use symbolic links to inject ccache, that is not well supported on Windows.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thoughts about this in general: if neither gcc nor clang require explicit in-tree modification to make ccache work I would hope that we do not need this either.

If seems that we are adding a third RUN MODE here that is emscripten-specific:

  1. Run explicitly via ccache <compiler>
  2. Run implicitly via having a <compiler> -> ccache symlink in your PATH
  3. Run implicitly via add EMCC_CACHE to the environment.

Given that mode (3) does not exist today with existing compilers I don't see why we in emscripten want to be special and add this third mode. I understand that mode (2) is hard/impossible on windows but that is pre-existing condition unrelated to emscripten, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't agree here. Given that we can offer an easy "emsdk activate ccache" feature that just does the full integration, I don't understand why we should not do it. It gives a straightforward "fully set up out of the box" experience without manual hassles. Option 2 is a complex burden on Windows, but also on non-Windows platforms.

Also, the step (3) does not need to be something public, but an internal implementation aid to carry the integration together. I.e. we are not offering a "third run mode" here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I get that we want this to work out-of-the-box for all emsdk users including windows users.

I'd still like find a solution that doesn't involved modifying emscripten. How about this solution:

  1. create emsdk/ccache/bin/emcc and emsdk/ccache/bin/em++ (these are bat/sh wrappers rather than symlinks)
  2. Have emsdk activate ccache put emsdk/ccache/bin first in the PATH before emscripten/bin.

This avoids changing emscripten at all and works more in like the symlink approach that exists today for ccache (but also happens to work on windows). It also works with older/current versions of emscripten too.

This means figuring out how to chain emsdk/ccache/bin/emcc -> emscripten/bin/emcc in the wrapper script. I don't know of the top of my head how to do that. If its too tricky I'm not going to resist this emscripten-side change too much, but it does seem wrong to me that we are hacking the compiler to be specifically aware of ccache (unlike gcc or clang).

Base automatically changed from master to main March 8, 2021 23:50
@juj
Copy link
Collaborator Author

juj commented Mar 13, 2021

Where in the example in the first comment is emcc told to use ccache?

This is done on the line

emsdk activate ... ccache-git-emscripten-64bit

Skimming the code I see EMCC_CCACHE as an env var, is that the way to opt in?

It is a way to opt in. The other method is to manually invoke with ccache emcc ....

Alternatively, could ccache not wrap around emcc from the outside?

Yes.

:: Remove the ccache env. var, invoke ccache and re-enter this script to take the above branch.
set EMCC_CCACHE=
ccache "%~dp0\%~n0.bat" %*
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These launcher scripts are tool-generated (by tools/create_entry_points.py), so it want to move forward with this I think we would want to modify that script.

Or as an alternative we would inject the cache command around the inner call to clang rather than the out call to python emcc?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Injecting to the inner clang call is considerably weaker in terms of performance than injecting to the outer python emcc call.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the create_entry_points.py script.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 14, 2021

Oh wait, sorry, I forgot we already added support for essentially this via the COMPILER_WRAPPER config option:
https://emscripten.org/docs/compiling/Building-Projects.html#using-a-compiler-wrapper. See #12380

This can be set in the config file as COMPILER_WRAPPER or in the environment as EM_COMPILER_WRAPPER.

This option always wraps the inner clang command. I think this is preferable as it doesn't run into the STB_IMAGE issue.

@juj
Copy link
Collaborator Author

juj commented Mar 15, 2021

The COMPILER_WRAPPER option is not useful, since it injects to clang and not to emcc. Injecting to emcc is much better since it avoids needing to spawn a python interpreter altogether, which is much faster as opposed to injecting just the backend compiler.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 15, 2021

The COMPILER_WRAPPER option is not useful, since it injects to clang and not to emcc. Injecting to emcc is much better since it avoids needing to spawn a python interpreter altogether, which is much faster as opposed to injecting just the backend compiler.

Very interesting. In that case maybe COMPILER_WRAPPER was that wrong direction. @pfaffe @bmeurer does this match your experience? I guess goma and ccache are somewhat different approaches they might not match here.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we do land this change which is kind of like an outer version of EM_COMPILER_WRAPPER, are there other tools (such as distcc, or icecream) that might also want to use this pattern? Should we at least make it non-cache-specific?
How about EM_WRAPPER=ccache?

shutil.copy2(bat_file, dst)

generate_entry_points(entry_points, os.path.join(tools_dir, 'run_python'))
generate_entry_points(compiler_entry_points, os.path.join(tools_dir, 'run_python_compiler'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there are two new files here that need to be added?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, indeed missed adding those.

exec "$PYTHON" "$0.py" "$@"
else
unset EMCC_CCACHE
exec ccache "$0" "$@"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I get that we want this to work out-of-the-box for all emsdk users including windows users.

I'd still like find a solution that doesn't involved modifying emscripten. How about this solution:

  1. create emsdk/ccache/bin/emcc and emsdk/ccache/bin/em++ (these are bat/sh wrappers rather than symlinks)
  2. Have emsdk activate ccache put emsdk/ccache/bin first in the PATH before emscripten/bin.

This avoids changing emscripten at all and works more in like the symlink approach that exists today for ccache (but also happens to work on windows). It also works with older/current versions of emscripten too.

This means figuring out how to chain emsdk/ccache/bin/emcc -> emscripten/bin/emcc in the wrapper script. I don't know of the top of my head how to do that. If its too tricky I'm not going to resist this emscripten-side change too much, but it does seem wrong to me that we are hacking the compiler to be specifically aware of ccache (unlike gcc or clang).

@sbc100
Copy link
Collaborator

sbc100 commented Mar 15, 2021

The COMPILER_WRAPPER option is not useful, since it injects to clang and not to emcc. Injecting to emcc is much better since it avoids needing to spawn a python interpreter altogether, which is much faster as opposed to injecting just the backend compiler.

Very interesting. In that case maybe COMPILER_WRAPPER was that wrong direction. @pfaffe @bmeurer does this match your experience? I guess goma and ccache are somewhat different approaches they might not match here.

Have you actually tested the difference between using EM_COMPILER_WRAPPER=cache emcc and cache emcc? I'm curious if it really makes a difference given that emcc does very little work when its just in compile mode (-c). In both cases the cache hit eliminates the actually compilation part which I would hope is massively dominant. It seems like unless the difference is significant it would be better to use the existing documented solution rather than create a new one.

I should be really easy to enable too, I think just adding EM_COMPILER_WRAPPER=cache to the activated_env in emsdk_manifest.json should work?

@juj
Copy link
Collaborator Author

juj commented Mar 16, 2021

I'd still like find a solution that doesn't involved modifying emscripten. How about this solution:

create emsdk/ccache/bin/emcc and emsdk/ccache/bin/em++ (these are bat/sh wrappers rather than symlinks)
Have emsdk activate ccache put emsdk/ccache/bin first in the PATH before emscripten/bin.

I did implement it like that first, but then revised it in the current form to avoid needing to co-maintain emcc and em++ scripts in two places at once, which would be more brittle (make a change in emscripten repo, and ccache repo needs a matching commit). Also this kind of integration is even better since now it will work without needing to resort to shell path lookup order expansion on subprocess spawns, which is very useful for build systems.

I'd still like find a solution that doesn't involved modifying emscripten.

Quite a bit surprised at the friction to landing these wrappers. #12380 landed without much restrain, and arguably it modifies emscripten way more compared to this. With modifying these wrappers on the outside, the complexity remains unchanged inside the toolchain. That should be much preferable over #12380?

Have you actually tested the difference between using EM_COMPILER_WRAPPER=cache emcc and cache emcc?

No, I haven't. It does not make sense to me to even compare them, it is just a waste of time. When one is utilizing a tool like ccache, it is a milliseconds competition against the added ccache operation overhead vs the time saved not invoking the compiler, and a fight to keep the time tradeoff a net positive.

In such a scenario, if you have a more optimal solution (outer wrapping) vs a directly suboptimal solution (inner wrapping), it does not make sense to even look at the suboptimal solution when that optimal choice exists. The difference could be -10%, -1% or even -0.1%, but still, when using ccache is all about competing to keep the tradeoff low, it makes sense to take the optimal path without considering worse alternative. (you wouldn't ever go "meh, the time saving is only -2%, I guess I won't take it" since there are no downsides to having that -2%)

The only reason why one would want to wrap clang instead of emcc is if whatever is doing the wrapping does not understand emcc, but does understand clang. That may warrant the existence of #12380.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 16, 2021

The only reason why one would want to wrap clang instead of emcc is if whatever is doing the wrapping does not understand emcc, but does understand clang. That may warrant the existence of #12380.

Indeed that was the motivation for #12380. With the goma distributed compiler that build farm knows about clang but does not know about emcc or python.

For the record I really didn't want to add #12380 either :) I tried to push back at the time: #12340 (comment)

I'm sorry for all the push back, I just want to avoid adding two mechanisms for doing basically the same thing.

At least will you consider making it but more generic to match the existing EM_COMPILER_WRAPPER? i.e. give it a generic name that doesn't include CCACHE itself and to allow other potential wrappers at the outer level too? And maybe add to the documentation alongside EM_COMPILER_WRAPPER?

Perhaps we should even consider renaming EM_COMPILER_WRAPPER to EM_CLANG_WRAPPER and calling this new settings EM_COMPILER_WRAPPER?

@juj
Copy link
Collaborator Author

juj commented Mar 16, 2021

At least will you consider making it but more generic to match the existing EM_COMPILER_WRAPPER?

Would it be good if I renamed this to EM_OUTER_COMPILER_WRAPPER and then run generic? I am not sure if any other tool than ccache will be able to use this though.

Renaming does lose the benefit that with current EMCC_CCACHE=0/1 I can quickly do set EMCC_CACHE= to disable, and set EMCC_CACHE=1 to enable the ccache to test back and forth. (instead of having to pass the PATH to ccache each time)

@juj
Copy link
Collaborator Author

juj commented Mar 16, 2021

Ran the numbers nevertheless with inner and outer compiler wrapping on timing python embuilder.py build ALL:

(cache was nuked with rmdir /s /q ports-builds sysroot && del sysroot_install.stamp rather than emcc --clear-cache to avoid network downloads accumulating in the time)

no ccache at all:

cold build: 315 seconds

set EM_COMPILER_WRAPPER=ccache inner wrapping:

cold build: 402 seconds (+27.6% slower than no ccache)
warm build: 205.81 seconds (-34.9% reduction in time spent compared to no ccache)

set EMCC_CCACHE=1 outer wrapping:

cold build: 549 seconds (+74.2% slower than no ccache)
warm build: 107 seconds (-65.6% reduction in time spent compared to no ccache)

-> outer wrapping is twice as fast in warm case (or -50% time reduction) compared to inner wrapping.

The reason that outer wrapping has a +36.6% performance impact compared to inner wrapping that can be seen in the cold case is due to extra overhead of having to support dedicated emcc flags, versioning and environment variables (https://github.com/juj/ccache/blob/emscripten/src/emcc.cpp).

@sbc100
Copy link
Collaborator

sbc100 commented Mar 16, 2021

Ran the numbers nevertheless with inner and outer compiler wrapping on timing python embuilder.py build ALL:

(cache was nuked with rmdir /s /q ports-builds sysroot && del sysroot_install.stamp rather than emcc --clear-cache to avoid network downloads accumulating in the time)

no ccache at all:

cold build: 315 seconds

set EM_COMPILER_WRAPPER=ccache inner wrapping:

cold build: 402 seconds (+27.6% slower than no ccache)
warm build: 205.81 seconds (-34.9% reduction in time spent compared to no ccache)

set EMCC_CCACHE=1 outer wrapping:

cold build: 549 seconds (+74.2% slower than no ccache)
warm build: 107 seconds (-65.6% reduction in time spent compared to no ccache)

-> outer wrapping is twice as fast in warm case (or -50% time reduction) compared to inner wrapping.

The reason that outer wrapping has a +36.6% performance impact compared to inner wrapping that can be seen in the cold case is due to extra overhead of having to support dedicated emcc flags, versioning and environment variables (https://github.com/juj/ccache/blob/emscripten/src/emcc.cpp).

Thanks for taking the time to run the numbers.

I guess that also means the our wrapping only works with custom ccache fork? I think that is probably worth documenting too.

I'm still not very happy with this change but it does seem like you are seeing real benefits (especially for windows users who can't use symlinks). In particular, I don't like that it has ccache baked in so its not generic (can't be used to other compiler wrapper such as distcc, icecream or goma) or it forces us to make two different (mostly identical) versions our python running scripts.

But I'm not going to block it any more as I think we have been back and forth on this stuff enough.

lgtm with some documentation.

@juj juj merged commit 0786812 into emscripten-core:main Mar 16, 2021
@juj
Copy link
Collaborator Author

juj commented Mar 16, 2021

I guess that also means the our wrapping only works with custom ccache fork? I think that is probably worth documenting too.

Yeah, anything wrapping emcc outside will need to understand emcc, or otherwise it will have a hard time actually doing the wrapping.

can't be used to other compiler wrapper such as distcc, icecream or goma

If those wrappers are emcc flags aware, then they could be used to wrap outside, but I presume currently they are not? https://github.com/juj/ccache/blob/emscripten/src/emcc.cpp contains a quite comprehensive run-through of the kinds of things a wrapper needs to be aware of. (EM_CONFIG, emscripten-version.txt, contents of .emscripten file, -s flags, etc.)

A benefit of the current EMCC_CCACHE env. var is that it can still be treated as an internal implementation detail, since users are not expected to touch it. It serves as a contract between emsdk and Emscripten. However an env. var EM_OUTER_COMPILER_WRAPPER would be something to publicize. We can move EMCC_CCACHE to a EM_OUTER_COMPILER_WRAPPER later if there is another tool that can take advantage of outer wrapping, but we could not go the other way around and take such a public var private later on.

Do you think from that perspective we should publicly document EMCC_CCACHE env. var for users? I would prefer to keep it internal, but I can also add docs if that's more preferable, and/or follow up with a refactor to a more general EM_OUTER_COMPILER_WRAPPER if that's something that other tools can take advantage of?

@sbc100
Copy link
Collaborator

sbc100 commented Mar 16, 2021

If this is an internal contract between emsdk and emscripten than I'm much more happy with it.

Normal non-emsdk users can always use one of the existing cache modes of operation, and we can leave this third one mostly as a convenience of emsdk+windows users. I'm more happy with that.

If that is the case perhaps we can prefix it with an _ so folks don't start using it directly?

@juj
Copy link
Collaborator Author

juj commented Mar 16, 2021

Yeah, that works for me.

aheejin added a commit to aheejin/emscripten that referenced this pull request Mar 18, 2021
After emscripten-core#13498, `test_asyncify_onlylist_a` and `test_asyncify_onlylist_b`
started to failing. emscripten-core#13498 introduced `if ( ... ) else ( ... )` in
emcc/em++.bat, and it turns out the parentheses inside the command line
can be mixed with those `if`-`else`'s parentheses themselves. This does
not cause errors for every command line that contain parentheses, but
when there are both a comma and parentheses, this can trigger errors.

For example, `test_asyncify_onlylist_a`'s argument contains something
like this:
```
'ASYNCIFY_ONLY=["main",...,"foo(int,double)",...]'
```
Here the comma between `int` and `double` is mistaken as an item
separator within the list, and `)` after `double` is consequently
mistaken as ending `if`, because the whole body is within an `if`.

This PR fixes the problem by assigning those argument `%*` into a
variable `ARGS`, and within `if` and `else`, use it with a substitution
of `)` with the escaped end-paren `^)`.
aheejin added a commit that referenced this pull request Mar 20, 2021
After #13498, `test_asyncify_onlylist_a` and `test_asyncify_onlylist_b`
started to failing. #13498 introduced `if ( ... ) else ( ... )` in
emcc/em++.bat, and it turns out the parentheses inside the command line
can be mixed with those `if`-`else`'s parentheses themselves. This does
not cause errors for every command line that contain parentheses, but
when there are both a comma and parentheses, this can trigger errors.

For example, `test_asyncify_onlylist_a`'s argument contains something
like this:
```
'ASYNCIFY_ONLY=["main",...,"foo(int,double)",...]'
```
Here the comma between `int` and `double` is mistaken as an item
separator within the list, and `)` after `double` is consequently
mistaken as ending `if`, because the whole body is within an `if`.

This PR fixes the problem by assigning those argument `%*` into a
variable `ARGS`, and within `if` and `else`, use it with a substitution
of `)` with the escaped end-paren `^)`.
@vadimkantorov
Copy link

Does it also solve #12256 ?

@juj
Copy link
Collaborator Author

juj commented May 9, 2021

Does it also solve #12256 ?

It does not, the plan was to not to go with an external EM_COMPILER_WRAPPER approach.

@vadimkantorov
Copy link

vadimkantorov commented May 9, 2021

I mean, does it solve the usecase for #12556, i.e. specify some custom script in place of emcc for emmake invocation? In theory ccache is also such a compiler wrapper script.

Do I understand correctly that only EMCC_CCACHE=1 was implemented that would force emmake to use ccache prefix when running the compiler? So in theory one could dynamically create a local ccache symlink and trick emmake to call it?

@juj
Copy link
Collaborator Author

juj commented May 9, 2021

As per the review comments, the conclusion was to go with an internal env. var _EMCC_CCACHE=0/1 that makes emcc invoke itself with ccache prefix, and not a public facing env. var.

So in theory one could dynamically create a local ccache symlink and trick emmake to call it?

Yeah, one could I suppose.

@mszhanyi
Copy link

mszhanyi commented Dec 24, 2022

@juj , I like this PR. It looks that https://github.com/juj/ccache/tree/emscripten hasn't been merged in ccache, so do we have to build the ccache with your repo?

And shimgen could be used for symlink on Windows, will it make it easier to use?

Also while Linux and macOS could use symbolic links to inject ccache, that is not well supported on Windows.

@kleisauke
Copy link
Collaborator

@mszhanyi I cannot answer your questions, but if you want to use ccache in combination with Emscripten on Windows, you might be interested in the OCI-compliant container image provided by emsdk (which can also be used when running in WSL). See for example commit kleisauke/wasm-vips@6428d47.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants