Rembulan compiles Lua sources directly to Java bytecode. In contrast to PUC-Lua, which first compiles Lua sources to the Lua bytecode (that is then interpreted), the intermediate representation used in the Rembulan compiler is not exposed to the users, and has no serialisable format.
This means that on the binary level, Rembulan is not compatible with PUC-Lua. On the other hand, however, it allows the compiler to perform some optimisations static analysis more easily.
The main class of the compiler is
net.sandius.rembulan.compiler.LuaCompiler
in the rembulan-compiler
module.
Every Lua chunk is compiled to one or more classes that extend
LuaFunction
,
i.e., every resulting class corresponds to a Lua function in the source. Nested functions
are treated analogously to static nested classes in Java.
Function upvalues are represented as fields of the type
Variable
; the number
of upvalues a function has determines the form of its constructor.
Rembulan performs basic static analysis of the Lua programs in order to simplify the generated code, and eliminate unnecessary guards against possible metamethod calls. This is important since the coroutine implementation used in Rembulan requires Lua functions that perform calls to be ready to be paused, and generating such code makes the bytecode longer and makes its control flow graph more complex. This in turn makes the JVM less likely to optimise the JIT-ed bytecode.
What follows is a description of optimisation steps performed by the compiler, done in a loop until a fixed point is reached.
Note: CPU accounting is not influenced by the optimisation steps. An optimised version of a compiled function will still clock in the same number as a plain, unoptimised one.
In the majority of cases, whether a Lua operation op may involve a metamethod call depends on the types of the operands of op. (See §2.4 of the Lua Reference Manual.) For instance, the expression
x + y
may trigger a metamethod call, unless x
and y
are number
s or string
s that can
be converted to numbers. It then follows that if we know that both x
and y
are number
s,
we can generate bytecode for numerical addition immediately, and we also know
that the type of the expression (x + y)
will be number
.
Type analysis of course also works in conjunction with control flow analysis. Suppose we have the following snippet:
local x
if condition then
x = a + b
else
x = 0
end
Here, the type of the value of the local variable x
after the if
statement will depend
on the types of a
and b
(and the value of condition
, see below). Let us look at the possible
cases:
- if
a
andb
areinteger
s (a subtype ofnumber
), thenx
will be aninteger
; - otherwise, if
a
andb
arenumber
s, thenx
will be anumber
; - otherwise,
x
will beany
(a supertype of all types).
In the first two cases, we do not have to emit the code that may need to handle
metamethods (and, by extension, coroutine switches), so this code does not introduce an
additional entry point. Additionally, in the first case, we even have enough information to keep
a
, b
and x
unboxed.
(Note: this step is not done yet.)
Relevant class: net.sandius.rembulan.compiler.analysis.Typer
Now, suppose that condition
is always false
or nil
. In that case, we can skip
the then
-branch entirely, and know that the type of x
after the if
statement
is integer
(with the value 0
).
Relevant class: net.sandius.rembulan.compiler.tf.BranchInliner
Constant values are treated as literal types. I.e., constant propagation is treated as part of the type inference step.
In the example above, if we know statically the values of a
and b
, we can compute
the result of (a + b)
at compile-time, and use it as the type of the expression.
The actual "addition" can then be safely removed from the IR representation of the function.
(This is only true for operations that do not have any side-effects.)
Relevant class: net.sandius.rembulan.compiler.tf.ConstFolder
Suppose we execute the snippet above, but never use the value of x
. In that case,
we do not need to keep x
around, and if the expression (a + b)
cannot trigger a metamethod
call, we do not even need to evaluate it.
This is done by computing the liveness of all variables (and temporary values),
and pruning out dead code based on this information.
Relevant classes:
net.sandius.rembulan.compiler.analysis.LivenessAnalyser
net.sandius.rembulan.compiler.tf.DeadCodePruner
Once a function has been optimised, it is compiled into Java bytecode. This involves several steps, and takes into account the type and liveness information computed during optimisation passes.
Slot allocation uses the liveness information to come up with a mapping from variables
and temporary values to local variables in the resulting bytecode, similar to registers
in PUC-Lua. The slots are untyped (i.e, their type is java.lang.Object
) in order to
enable slot sharing. As a consequence, values that have equivalent primitive JVM types
(i.e., Lua integers are long
s, Lua floats are doubles
, and Lua booleans are boolean
s)
need to be boxed.
Note: this is likely to change in future versions, since boxing is a significant performance drag, and can in fact be removed rather easily at this point.
Relevant class: net.sandius.rembulan.compiler.analysis.SlotAllocator
Code generation takes the slot information along with type information, and walks the
IR graph, generating Java bytecode in the process. Long methods may need to be segmented
into multiple nested methods (to avoid the 64kB limit on the size of a method in Java
bytecode), wires up the methods so as to implement LuaFunction
, and generates the
resulting Java classfile as a byte array.
This step uses the ASM framework.
Relevant classes: