Bytecode transformers for CPython inspired by the ast
module's
NodeTransformer
.
codetransformer
is a library that allows us to work with CPython's bytecode
representation at runtime. codetransformer
provides a level of abstraction
between the programmer and the raw bytes read by the eval loop so that we can
more easily inspect and modify bytecode.
codetransformer
is motivated by the need to override parts of the python
language that are not already hooked into through data model methods. For example:
- Override the
is
andnot
operators. - Custom data structure literals.
- Syntax features that cannot be represented with valid python AST or source.
- Run without a modified CPython interpreter.
codetransformer
was originally developed as part of lazy to implement
the transformations needed to override the code objects at runtime.
While this can be done as an AST transformation, we will often need to execute
the constructor for the literal multiple times. Also, we need to be sure that
any additional names required to run our code are provided when we run. With
codetransformer
, we can pre compute our new literals and emit code that is
as fast as loading our unmodified literals without requiring any additional
names be available implicitly.
In the following block we demonstrate overloading dictionary syntax to result in
collections.OrderedDict
objects. OrderedDict
is like a dict
;
however, the order of the keys is preserved.
>>> from codetransformer.transformers.literals import ordereddict_literals
>>> @ordereddict_literals
... def f():
... return {'a': 1, 'b': 2, 'c': 3}
>>> f()
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
This also supports dictionary comprehensions:
>>> @ordereddict_literals
... def f():
... return {k: v for k, v in zip('abc', (1, 2, 3))}
>>> f()
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
The next block overrides float
literals with decimal.Decimal
objects. These objects support arbitrary precision arithmetic.
>>> from codetransformer.transformers.literals import decimal_literals
>>> @decimal_literals
... def f():
... return 1.5
>>> f()
Decimal('1.5')
Pattern matched exceptions are a good example of a CodeTransformer
that
would be very complicated to implement at the AST level. This transformation
extends the try/except
syntax to accept instances of BaseException
as
well subclasses of BaseException
. When excepting an instance, the args
of the exception will be compared for equality to determine which exception
handler should be invoked. For example:
>>> @pattern_matched_exceptions()
... def foo():
... try:
... raise ValueError('bar')
... except ValueError('buzz'):
... return 'buzz'
... except ValueError('bar'):
... return 'bar'
>>> foo()
'bar'
This function raises an instance of ValueError
and attempts to catch it. The
first check looks for instances of ValueError
that were constructed with an
argument of 'buzz'
. Because our custom exception is raised with 'bar'
,
these are not equal and we do not enter this handler. The next handler looks for
ValueError('bar')
which does match the exception we raised. We then enter
this block and normal python rules take over.
We may also pass their own exception matching function:
>>> def match_greater(match_expr, exc_type, exc_value, exc_traceback):
... return math_expr > exc_value.args[0]
>>> @pattern_matched_exceptions(match_greater)
... def foo():
... try:
... raise ValueError(5)
... except 4:
... return 4
... except 5:
... return 5
... except 6:
... return 6
>>> foo()
6
This matches on when the match expression is greater in value than the first argument of any exception type that is raised. This particular behavior would be very hard to mimic through AST level transformations.
The three core abstractions of codetransformer
are:
- The
Instruction
object which represents an opcode which may be paired with some argument. - The
Code
object which represents a collection ofInstruction
s. - The
CodeTransformer
object which represents a set of rules for manipulatingCode
objects.
The Instruction
object represents an atomic operation that can be performed
by the CPython virtual machine. These are things like LOAD_NAME
which loads
a name onto the stack, or ROT_TWO
which rotates the top two stack elements.
Some instructions accept an argument, for example LOAD_NAME
, which modifies
the behavior of the instruction. This is much like a function call where some
functions accept arguments. Because the bytecode is always packed as raw bytes,
the argument must be some integer (CPython stores all arguments two in bytes).
This means that things that need a more rich argument system (like LOAD_NAME
which needs the actual name to look up) must carry around the actual arguments
in some table and use the integer as an offset into this array. One of the key
abstractions of the Instruction
object is that the argument is always some
python object that represents the actual argument. Any lookup table management
is handled for the user. This is helpful because some arguments share this table
so we don't want to add extra entries or forget to add them at all.
Another annoyance is that the instructions that handle control flow use their
argument to say what bytecode offset to jump to. Some jumps use the absolute
index, others use a relative index. This also makes it hard if you want to add
or remove instructions because all of the offsets must be recomputed. In
codetransformer
, the jump instructions all accept another Instruction
as
the argument so that the assembler can manage this for the user. We also provide
an easy way for new instructions to "steal" jumps that targeted another
instruction so that can manage altering the bytecode around jump targets.
Code
objects are a nice abstraction over python's
types.CodeType
. Quoting the CodeType
constructor docstring:
code(argcount, kwonlyargcount, nlocals, stacksize, flags, codestring, constants, names, varnames, filename, name, firstlineno, lnotab[, freevars[, cellvars]]) Create a code object. Not for the faint of heart.
The codetransformer
abstraction is designed to make it easy to dynamically
construct and inspect these objects. This allows us to easy set things like the
argument names, and manipulate the line number mappings.
The Code
object provides methods for converting to and from Python's code
representation:
from_pycode
to_pycode
.
This allows us to take an existing function, parse the meaning from it, modify it, and then assemble this back into a new python code object.
Note
Code
objects are immutable. When we say "modify", we mean create a copy
with different values.
This is the set of rules that are used to actually modify the Code
objects. These rules are defined as a set of patterns
which are a DSL used
to define a DFA for matching against sequences of Instruction
objects. Once
we have matched a segment, we yield new instructions to replace what we have
matched. A simple codetransformer looks like:
from codetransformer import CodeTransformer, instructions
class FoldNames(CodeTransformer):
@pattern(
instructions.LOAD_GLOBAL,
instructions.LOAD_GLOBAL,
instructions.BINARY_ADD,
)
def _load_fast(self, a, b, add):
yield instructions.LOAD_FAST(a.arg + b.arg).steal(a)
This CodeTransformer
uses the +
operator to implement something like
CPP
s token pasting for local variables. We read this pattern as a sequence
of two LOAD_GLOBAL
(global name lookups) followed by a BINARY_ADD
instruction (+
operator call). This will then call the function with the
three instructions passed positionally. This handler replaces this sequence with
a single instruction that emits a LOAD_FAST
(local name lookup) that is the
result of adding the two names together. We then steal any jumps that used to
target the first LOAD_GLOBAL
.
We can execute this transformer by calling an instance of it on a function object, or using it like a decorator. For example:
>>> @FoldNames()
... def f():
... ab = 3
... return a + b
>>> f()
3
codetransformer
is free software, licensed under the GNU General Public
License, version 2. For more information see the LICENSE
file.
Source code is hosted on github at https://github.com/llllllllll/codetransformer.