Skip to content

llllllllll/codetransformer

Repository files navigation

codetransformer

build status Documentation Status

Bytecode transformers for CPython inspired by the ast module's NodeTransformer.

What is codetransformer?

codetransformer is a library that allows us to work with CPython's bytecode representation at runtime. codetransformer provides a level of abstraction between the programmer and the raw bytes read by the eval loop so that we can more easily inspect and modify bytecode.

codetransformer is motivated by the need to override parts of the python language that are not already hooked into through data model methods. For example:

  • Override the is and not operators.
  • Custom data structure literals.
  • Syntax features that cannot be represented with valid python AST or source.
  • Run without a modified CPython interpreter.

codetransformer was originally developed as part of lazy to implement the transformations needed to override the code objects at runtime.

Example Uses

Overloading Literals

While this can be done as an AST transformation, we will often need to execute the constructor for the literal multiple times. Also, we need to be sure that any additional names required to run our code are provided when we run. With codetransformer, we can pre compute our new literals and emit code that is as fast as loading our unmodified literals without requiring any additional names be available implicitly.

In the following block we demonstrate overloading dictionary syntax to result in collections.OrderedDict objects. OrderedDict is like a dict; however, the order of the keys is preserved.

>>> from codetransformer.transformers.literals import ordereddict_literals
>>> @ordereddict_literals
... def f():
...     return {'a': 1, 'b': 2, 'c': 3}
>>> f()
OrderedDict([('a', 1), ('b', 2), ('c', 3)])

This also supports dictionary comprehensions:

>>> @ordereddict_literals
... def f():
...     return {k: v for k, v in zip('abc', (1, 2, 3))}
>>> f()
OrderedDict([('a', 1), ('b', 2), ('c', 3)])

The next block overrides float literals with decimal.Decimal objects. These objects support arbitrary precision arithmetic.

>>> from codetransformer.transformers.literals import decimal_literals
>>> @decimal_literals
... def f():
...     return 1.5
>>> f()
Decimal('1.5')

Pattern Matched Exceptions

Pattern matched exceptions are a good example of a CodeTransformer that would be very complicated to implement at the AST level. This transformation extends the try/except syntax to accept instances of BaseException as well subclasses of BaseException. When excepting an instance, the args of the exception will be compared for equality to determine which exception handler should be invoked. For example:

>>> @pattern_matched_exceptions()
... def foo():
...     try:
...         raise ValueError('bar')
...     except ValueError('buzz'):
...         return 'buzz'
...     except ValueError('bar'):
...         return 'bar'
>>> foo()
'bar'

This function raises an instance of ValueError and attempts to catch it. The first check looks for instances of ValueError that were constructed with an argument of 'buzz'. Because our custom exception is raised with 'bar', these are not equal and we do not enter this handler. The next handler looks for ValueError('bar') which does match the exception we raised. We then enter this block and normal python rules take over.

We may also pass their own exception matching function:

>>> def match_greater(match_expr, exc_type, exc_value, exc_traceback):
...     return math_expr > exc_value.args[0]

>>> @pattern_matched_exceptions(match_greater)
... def foo():
...     try:
...         raise ValueError(5)
...     except 4:
...         return 4
...     except 5:
...         return 5
...     except 6:
...         return 6
>>> foo()
6

This matches on when the match expression is greater in value than the first argument of any exception type that is raised. This particular behavior would be very hard to mimic through AST level transformations.

Core Abstractions

The three core abstractions of codetransformer are:

  1. The Instruction object which represents an opcode which may be paired with some argument.
  2. The Code object which represents a collection of Instructions.
  3. The CodeTransformer object which represents a set of rules for manipulating Code objects.

Instructions

The Instruction object represents an atomic operation that can be performed by the CPython virtual machine. These are things like LOAD_NAME which loads a name onto the stack, or ROT_TWO which rotates the top two stack elements.

Some instructions accept an argument, for example LOAD_NAME, which modifies the behavior of the instruction. This is much like a function call where some functions accept arguments. Because the bytecode is always packed as raw bytes, the argument must be some integer (CPython stores all arguments two in bytes). This means that things that need a more rich argument system (like LOAD_NAME which needs the actual name to look up) must carry around the actual arguments in some table and use the integer as an offset into this array. One of the key abstractions of the Instruction object is that the argument is always some python object that represents the actual argument. Any lookup table management is handled for the user. This is helpful because some arguments share this table so we don't want to add extra entries or forget to add them at all.

Another annoyance is that the instructions that handle control flow use their argument to say what bytecode offset to jump to. Some jumps use the absolute index, others use a relative index. This also makes it hard if you want to add or remove instructions because all of the offsets must be recomputed. In codetransformer, the jump instructions all accept another Instruction as the argument so that the assembler can manage this for the user. We also provide an easy way for new instructions to "steal" jumps that targeted another instruction so that can manage altering the bytecode around jump targets.

Code

Code objects are a nice abstraction over python's types.CodeType. Quoting the CodeType constructor docstring:

code(argcount, kwonlyargcount, nlocals, stacksize, flags, codestring,
      constants, names, varnames, filename, name, firstlineno,
      lnotab[, freevars[, cellvars]])

Create a code object.  Not for the faint of heart.

The codetransformer abstraction is designed to make it easy to dynamically construct and inspect these objects. This allows us to easy set things like the argument names, and manipulate the line number mappings.

The Code object provides methods for converting to and from Python's code representation:

  1. from_pycode
  2. to_pycode.

This allows us to take an existing function, parse the meaning from it, modify it, and then assemble this back into a new python code object.

Note

Code objects are immutable. When we say "modify", we mean create a copy with different values.

CodeTransformers

This is the set of rules that are used to actually modify the Code objects. These rules are defined as a set of patterns which are a DSL used to define a DFA for matching against sequences of Instruction objects. Once we have matched a segment, we yield new instructions to replace what we have matched. A simple codetransformer looks like:

from codetransformer import CodeTransformer, instructions

class FoldNames(CodeTransformer):
    @pattern(
        instructions.LOAD_GLOBAL,
        instructions.LOAD_GLOBAL,
        instructions.BINARY_ADD,
    )
    def _load_fast(self, a, b, add):
        yield instructions.LOAD_FAST(a.arg + b.arg).steal(a)

This CodeTransformer uses the + operator to implement something like CPPs token pasting for local variables. We read this pattern as a sequence of two LOAD_GLOBAL (global name lookups) followed by a BINARY_ADD instruction (+ operator call). This will then call the function with the three instructions passed positionally. This handler replaces this sequence with a single instruction that emits a LOAD_FAST (local name lookup) that is the result of adding the two names together. We then steal any jumps that used to target the first LOAD_GLOBAL.

We can execute this transformer by calling an instance of it on a function object, or using it like a decorator. For example:

>>> @FoldNames()
... def f():
...     ab = 3
...     return a + b
>>> f()
3

License

codetransformer is free software, licensed under the GNU General Public License, version 2. For more information see the LICENSE file.

Source

Source code is hosted on github at https://github.com/llllllllll/codetransformer.