Skip to content
kripken edited this page Nov 14, 2012 · 37 revisions

Building Projects

The Tutorial shows how emcc, the drop-in replacement for gcc, can be used to compile single files very easily into JavaScript. Building large projects with Emscripten is also very simple: You basically use emcc instead of gcc in your makefiles. This can usually be done by setting CC to emcc, or with a flag to configure, but it can be even easier than that: For example, if you normally build with

   ./configure
   make

then the process with Emscripten looks like

   emconfigure ./configure
   emmake make
   emcc [-Ox] project.bc -o project.js

where project.bc is the linked bitcode that was generated by make, so change that to the name generated by the project (note that the output bitcode might have suffix .o or .so depending on the details of the build system).

  • The first change is to run emconfigure, with the normal configure command as an argument. emconfigure runs configure but tells it to use emcc instead of gcc, and a few other useful things (for details, since the docs inside emcc). Similarly, emmake does some helpful environment var settings and so forth (typically if you use configure or cmake or such, you don't need emmake - all the info is in the configure-generated files - but if not, emmake will set default env vars for the compiler to point to emscripten and so forth).
  • The second change is, once the project is built, to add a command to convert the compiled project into JavaScript: emcc is run on the compiled project bitcode, and told to generate JavaScript output (we will see later down why [-Ox] appears there). This additional command is necessary for two main reasons: first, because emcc, when called from the makefile, will not automatically generate JavaScript during linking (if it did, there would be a lot of JavaScript generated in intermediary steps in many projects, which is unnecessary and inefficient to link and so forth), and second, because we have various options and optimizations that must apply to the entire program being compiled (we cannot compile file A with options X and file B with options Y and link them into one program - they can literally have different memory structures, for example, different typed array modes etc., so therefore all these options and optimizations are done on the final conversion from bitcode to JavaScript). So, when called from the makefile emcc will generate bitcode. A single line, as shown above, then converts the bitcode into JavaScript.
  • In other words, a conventional native code build system will generate native code in object files as an intermediate form, while building with Emscripten uses LLVM bitcode as an intermediate form.
  • In general you don't need to care about this, except for needing one extra line for the last transformation to JavaScript. However, one potentially confusing situation can occur with optimization: Assume that you compile individual files to bitcode, then link them, then compile that to JavaScript. Then when you optimize matters: Optimizations specified when compiling individual files to bitcode will not affect the bitcode to JavaScript compilation process, since that doesn't happen at that stage. Optimizations specified during the last stage will affect the bitcode to JavaScript compilation process, and those optimizations are crucial for good performance. Therefore, when building projects, you should specify -O2 or some other optimization level (see Optimizing Code) in the final additional command so that the code is fully optimized. Note that because we require specifying optimization in the last stage anyhow, bitcode is not optimized until then either: In other words, optimization flags are ignored until the last stage (this prevents unneeded work to do bitcode optimizations that must be done later anyhow).
  • Note that the output of the build system can be a static library (.a), shared library (.so) or just object files (.o or .bc). In all of these cases using emcc in the build system will cause these files to contain LLVM bitcode, even though the suffix looks the same as if gcc ran. emcc can then be used to compile the .a, .so, .o or .bc file into JavaScript.

Notes

  • It is better to generate .so files and not .a. Archives (.a) have some odd behaviors when linked with other files, the linker tries to be 'clever' and discard stuff it thinks is not needed. Shared libraries (.so) are simpler, and we do elimination of unneeded code later anyhow, so they are recommended. This is generally a simple change in your project's build system.
  • Make sure to use bitcode-aware llvm-ar instead of ar. ar may discard code.
  • If you get multiply defined symbol errors, try --remove-duplicates in emcc. This tries to emulate ld's permissive behavior that llvm-link lacks.

Manually Using emcc

As a drop-in replacement for gcc, emcc can be used in all the normal ways you would expect:

    emcc src.cpp
    # Generates a.out.js from C++. Can also take as input .ll (LLVM assembly) or .bc (LLVM bitcode)

    emcc src.cpp -c
    # Generates src.o containing LLVM bitcode.

    emcc src.cpp -o result.js
    # Generates result.js containing JavaScript.

    emcc src.cpp -o result.bc
    # Generates result.bc containing LLVM bitcode (the suffix matters).

    emcc src1.cpp src2.cpp
    # Generates a.out.js from two C++ sources.

    emcc src1.cpp src2.cpp -c
    # Generates src1.o and src2.o, containing LLVM bitcode

    emcc src1.o src2.o
    # Combine two LLVM bitcode files into a.out.js

    emcc src1.o src2.o -o combined.o
    # Combine two LLVM bitcode files into another LLVM bitcode file

For more on emcc's capabilites, do emcc --help (it can also optimize, change parameters to how Emscripten generates code, generate HTML instead of JavaScript, etc.).

Using Libraries

If your project needs a standard system library, like for example zlib or glib, then if there is not built-in support in emscripten for it, you will need to link it in manually. Built-in support exists for libc, libc++ and SDL, and for those you do not even need to add -lSDL or such - they will just work. But for other libraries, you need to build and link them.

  • To build them, you would build them normally using emcc. Build them into bitcode, not JavaScript - which is easier, basically just run make using emcc as described above, and do not do anything additional to generate JavaScript from the bitcode.
  • In your main project, as mentioned earlier in this document you need to add a command to go from bitcode to JavaScript. You should tell that command to also link in the library you built into bitcode. For example, if you built libstuff.bc, and your final build command was emcc project.bc -o final.html, then you should write emcc project.bc libstuff.bc -o final.html. (Alternatively, you could use llvm-link to link the library with your other bitcode, etc.)

Issues

Build System Self-Execution

Some large projects, as part of their build procedure, generate executables and run them in order to generate input for later parts of the build system (for example, a parser may be built and then run on a grammar, which generates C/C++ code that implements that grammar). This is a problem when cross-compiling, including with Emscripten, since you cannot directly run the code you are generating.

The simplest solution is usually to build the project twice: Once natively, and once to JavaScript. When the JavaScript build procedure then fails on not being able to run a generated executable, you then copy that executable from the native build, and continue to build normally. This works for Python, for example (for more details, see tests/python/readme.txt).

Another possible solution that makes sense in some cases is to modify the build scripts so that they build the generated executable natively. For example, this can be done by specifying two compilers in the build scripts, emcc and gcc, and using gcc just for generated executables. However, this can be more complicated than the previous solution because you need to modify the project build scripts, and also you need to work around cases where code is compiled and used both for the final result and for a generated executable (so you need to make sure it is built both natively and for JS).

Dynamic Linking

Emscripten's goal is to generate the fastest and smallest possible code, and for that reason it focuses on generating a single JavaScript file for an entire project. It is possible to link files at runtime (see Dynamic linking), but it isn't recommended.

Because of this, emcc treats dynamic linking like static linking. That is, when you dynamically link in a library, it will be statically linked. This lets you have all the necessary code at the end of the build process when you convert to JavaScript.

A potential pitfall however is if you link the same dynamic library in twice. For example, if files A and B both dynamically link in library C, and then A and B are linked together statically, you will get an error about multiply-defined symbols because the contents of library C will appear twice. If you encounter this problem, there are two ways to avoid it:

  • Modify your build system so it doesn't dynamically link the same library twice. In the example above, you can avoid linking library C to file B.
  • Run emcc with --ignore-dynamic-linking. With that flag, dynamically-linked libraries will be ignored (whereas normally emcc would link them statically). In the example above, library C would not actually be linked with either file A or B, so linking A and B together would succeed. You would then need to manually link in library C. Since an additional manual step is often required at the end anyhow (to pick JavaScript-specific options like whether to use closure compiler, stuff you wouldn't see in a normal native build system), this might be preferable to the previous option, which requires changes in the middle of the build system.
  • Another possible option is the approach in https://github.com/krasin/bitlink, a project meant for PNaCl that extends the LLVM linker to support dynamic linking.

Configure

If your project uses configure, cmake or some other portable configuration method, it may do a lot of checks during the configure phase. emcc tries to get those to pass as much as it can, but in general it may not succeed. If you encounter such a case, you may need to disable checks in configure. Often the checks are just to verify that things will work, but things will actually work even though the checks fail.

If configure does checks that help determine important paths etc. for later in the build system, you may need to manually add those paths later and so forth.

Note that in general something like configure is not a good match for a cross-compiler like Emscripten. configure works very hard to get code to build natively for whatever local setup you have. With a cross-compiler, you are ignoring the native build system and the local system headers, and instead targeting a single standard target, so just writing out the values relevant for that target makes sense.

Alternatives to emcc

You can in theory call clang, llvm-ld, etc. yourself. However, not using emcc is dangerous. One reason is that emcc will use the Emscripten bundled headers, while using Clang by itself will not, by default. This can lead to various errors. Also, using things like llvm-ld will result in unsafe/unportable LLVM optimizations being done by default. When you use emcc, it automatically handles all of that for you so that things work properly.

Examples

You can see how the large tests in tests/runner.py are built - the C/C++ projects there are built using their normal build systems, using emcc as detailed on this page. Specifically, the large tests include: freetype, openjpeg, zlib, bullet and poppler.

Also worth looking at the build scripts in the following projects, although several are not yet updated to use the new emcc tool: