Skip to content
kripken edited this page Oct 24, 2012 · 29 revisions

The original emscripten compiler was written in JavaScript, which was very useful for quickly prototyping new ideas during development of the various new methods needed for effective compilation to JavaScript (the relooper, longjmp tricks, C++ exceptions in JS, etc.). It is also quite stable at this point and generates very good code. However, it has a few downsides:

  • Compiler speed. The generated code is fast, but generating the code is not so fast. Especially with full optimizations on, builds can be quite slow. This is not an issue for tens of thousands of lines of code, and is annoying but not horrible for hundreds of thousands, but it a serious problems for millions.
  • LLVM backends integrate more closely with LLVM, and can leverage LLVM's internal code analysis and optimization. The original compiler just parses LLVM bitcode externally, so it cannot benefit from internal capabilities of LLVM.
  • An upstream LLVM backend is easier to use for people than a separate project. Compiling to JS should, as much as possible, be just another backend in a compiler.

The plan is to start experimenting with an LLVM backend during Summer 2012.

Status: Investigation and experimentation was overall successful, but this is now lower priority because of (1) a lack of resources, and (2) significant potential improvements in the current compiler that have turned up

Guidelines and issues

  • We will use the C++ Relooper implementation https://github.com/kripken/Relooper
  • Focus on the C-style memory layout method. Other approaches (no typed arrays, unaliasing typed arrays) will only be done by the original compiler.
  • When possible, do native JS function calls f(x,y,z) and not read/writes from the C stack. Tricky with varargs but perhaps possible even there with internal LLVM changes.
  • Far better to do x = (a+b)/z instead of t = a+b ; x = t/z, unclear how easy it is to do that in an LLVM backend.
  • More advanced C++ static analysis than the current compiler should allow removal of a lot of unnecessary address shifting
  • See https://bugzilla.mozilla.org/show_bug.cgi?id=771106 for some optimizations we should implement. Also https://bugzilla.mozilla.org/show_bug.cgi?id=771285#c5
  • To get started we will not create an object format for JavaScript, we can continue to use the emcc wrapper which uses clang in a way that utilizes LLVM bitcode as the intermediate object format. So the initial goal is just to generate JS in the backend directly, that is, from LLVM IR in memory.
  • We still need to support linking with JS libraries (src/library*.js in current emscripten). The reason is that JS is unique compared to other backends: No one normally writes system libraries in low-level x86 or ARM, they at most will add some inline assembly for those CPUs to a C library. But for JS, it is actually a high-level language and people do want to write system libraries in it (and we have written libc, sdl, etc. in JS in emscripten). So while as per the previous point we do not want to invent a JS object format for linking, we do want to link in symbols in a simple way like the current emscripten compiler does.
  • Some initial work by Ehsan on Emscripten support in LLVM and clang are in
  • https://github.com/ehsan/llvm/commit/ad4c8c52f68a1694cbb66fe861f325928ca04d7c
  • https://github.com/ehsan/clang/commit/3a8eff2f5646605d949222032422a12967b34790
  • LLVM already has a target triple ArchType of le32 with comment generic little-endian 32-bit CPU (PNaCl / Emscripten), we should presumably use that?
  • Of the existing backends, the simplest is CppBackend, but it might be too simple. Sparc seems to be the smallest "real" backend.
  • Should we call this+Emscripten Emscripten 2.0?
  • Should we call the LLVM backend itself "JS" or "Emscripten" internally in LLVM?

FAQ

  • Will this replace Emscripten?
  • No. If this succeeds, it could be a replacement for the core compiler part of emscripten, but that is actually a small part of the emscripten project (the toolchain, libraries, runtime support etc. are much larger in lines of code). Also, we would not replace the core compiler part, the old compiler would still be useful for testing new ideas quickly, and would support more options (the new compiler would be focused on code generation using typed arrays in aliasing alignment, for example, while the old compiler has several other options, etc.). With that said, the old compiler is basically complete, and if the new backend works as we hope then new development would focus on it.

First Steps

  • Get emcc to generate human-readable sparc assembly or cpp using the sparc or cpp backends (done)
  • Modify the sparc and cpp backends to generate something resembling JavaScript

Setting up and testing

This is still in a very very very early experimental stage, but if you want to see what the current state is, first get and build LLVM

  • git clone git://github.com/kripken/llvm-dcpu16.git
  • cd llvm-dcpu16
  • cd tools
  • git clone git://github.com/kripken/clang.git
  • cd ..
  • ./configure --enable-targets=x86,dcpu16
  • make

Then get emscripten's llvm-js branch

  • Go to emscripten directory
  • git checkout llvm-js

You can now try to run emcc, but nothing will fully work yet.

Feel free to get in touch with us through the usual channels (irc, mailing list, see main wiki page) if this interests you.