An EVM-bytecode to machine-bytecode compiler using MLIR and LLVM.
Implemented opcodes (click to open)
- (0x00) STOP
- (0x01) ADD
- (0x02) MUL
- (0x03) SUB
- (0x04) DIV
- (0x05) SDIV
- (0x06) MOD
- (0x07) SMOD
- (0x08) ADDMOD
- (0x09) MULMOD
- (0x0A) EXP
- (0x0B) SIGNEXTEND
- (0x10) LT
- (0x11) GT
- (0x12) SLT
- (0x13) SGT
- (0x14) EQ
- (0x15) ISZERO
- (0x16) AND
- (0x17) OR
- (0x18) XOR
- (0x1A) BYTE
- (0x1B) SHL
- (0x1C) SHR
- (0x1D) SAR
- (0x50) POP
- (0x52) MSTORE
- (0x53) MSTORE8
- (0x56) JUMP
- (0x57) JUMPI
- (0x58) PC
- (0x5A) GAS
- (0x5B) JUMPDEST
- (0x5F) PUSH0
- (0x60) PUSH1
- (0x61) PUSH2
- (0x62) PUSH3
- (0x63) PUSH4
- (0x64) PUSH5
- (0x65) PUSH6
- (0x66) PUSH7
- (0x67) PUSH8
- (0x68) PUSH9
- (0x69) PUSH10
- (0x6A) PUSH11
- (0x6B) PUSH12
- (0x6C) PUSH13
- (0x6D) PUSH14
- (0x6E) PUSH15
- (0x6F) PUSH16
- (0x70) PUSH17
- (0x71) PUSH18
- (0x72) PUSH19
- (0x73) PUSH20
- (0x74) PUSH21
- (0x75) PUSH22
- (0x76) PUSH23
- (0x77) PUSH24
- (0x78) PUSH25
- (0x79) PUSH26
- (0x7A) PUSH27
- (0x7B) PUSH28
- (0x7C) PUSH29
- (0x7D) PUSH30
- (0x7E) PUSH31
- (0x7F) PUSH32
- (0x80) DUP1
- (0x81) DUP2
- (0x82) DUP3
- (0x83) DUP4
- (0x84) DUP5
- (0x85) DUP6
- (0x86) DUP7
- (0x87) DUP8
- (0x88) DUP9
- (0x89) DUP10
- (0x8A) DUP11
- (0x8B) DUP12
- (0x8C) DUP13
- (0x8D) DUP14
- (0x8E) DUP15
- (0x8F) DUP16
- (0x90) SWAP1
- (0x91) SWAP2
- (0x92) SWAP3
- (0x93) SWAP4
- (0x94) SWAP5
- (0x95) SWAP6
- (0x96) SWAP7
- (0x97) SWAP8
- (0x98) SWAP9
- (0x99) SWAP10
- (0x9A) SWAP11
- (0x9B) SWAP12
- (0x9C) SWAP13
- (0x9D) SWAP14
- (0x9E) SWAP15
- (0x9F) SWAP16
Not yet implemented opcodes (click to open)
- (0x19) NOT
- (0x20) KECCAK256
- (0x30) ADDRESS
- (0x31) BALANCE
- (0x32) ORIGIN
- (0x33) CALLER
- (0x34) CALLVALUE
- (0x35) CALLDATALOAD
- (0x36) CALLDATASIZE
- (0x37) CALLDATACOPY
- (0x38) CODESIZE
- (0x39) CODECOPY
- (0x3A) GASPRICE
- (0x3B) EXTCODESIZE
- (0x3C) EXTCODECOPY
- (0x3D) RETURNDATASIZE
- (0x3E) RETURNDATACOPY
- (0x3F) EXTCODEHASH
- (0x40) BLOCKHASH
- (0x41) COINBASE
- (0x42) TIMESTAMP
- (0x43) NUMBER
- (0x44) DIFFICULTY
- (0x45) GASLIMIT
- (0x46) CHAINID
- (0x47) SELFBALANCE
- (0x48) BASEFEE
- (0x49) BLOBHASH
- (0x4A) BLOBBASEFEE
- (0x51) MLOAD
- (0x54) SLOAD
- (0x55) SSTORE
- (0x59) MSIZE
- (0x5C) TLOAD
- (0x5D) TSTORE
- (0x5E) MCOPY
- (0xA0) LOG0
- (0xA1) LOG1
- (0xA2) LOG2
- (0xA3) LOG3
- (0xA4) LOG4
- (0xF0) CREATE
- (0xF1) CALL
- (0xF2) CALLCODE
- (0xF3) RETURN
- (0xF4) DELEGATECALL
- (0xF5) CREATE2
- (0xFA) STATICCALL
- (0xFD) REVERT
- (0xFE) INVALID
- (0xFF) SELFDESTRUCT
- Linux or macOS (aarch64 included) only for now
- LLVM 18 with MLIR: On debian you can use apt.llvm.org, on macOS you can use brew
- Rust
- Git
This step applies to all operating systems.
Run the following make target to install the dependencies (both Linux and macOS):
make deps
Since Linux distributions change widely, you need to install LLVM 18 via your package manager, compile it or check if the current release has a Linux binary.
If you are on Debian/Ubuntu, check out the repository https://apt.llvm.org/ Then you can install with:
sudo apt-get install llvm-18 llvm-18-dev llvm-18-runtime clang-18 clang-tools-18 lld-18 libpolly-18-dev libmlir-18-dev mlir-18-tools
If you decide to build from source, here are some indications:
Install LLVM from source instructions
# Go to https://github.com/llvm/llvm-project/releases
# Download the latest LLVM 18 release:
# The blob to download is called llvm-project-18.x.x.src.tar.xz
# For example
wget https://github.com/llvm/llvm-project/releases/download/llvmorg-18.1.4/llvm-project-18.1.4.src.tar.xz
tar xf llvm-project-18.1.4.src.tar.xz
cd llvm-project-18.1.4.src.tar
mkdir build
cd build
# The following cmake command configures the build to be installed to /opt/llvm-18
cmake -G Ninja ../llvm \
-DLLVM_ENABLE_PROJECTS="mlir;clang;clang-tools-extra;lld;polly" \
-DLLVM_BUILD_EXAMPLES=OFF \
-DLLVM_TARGETS_TO_BUILD="Native" \
-DCMAKE_INSTALL_PREFIX=/opt/llvm-18 \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DLLVM_PARALLEL_LINK_JOBS=4 \
-DLLVM_ENABLE_BINDINGS=OFF \
-DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DLLVM_ENABLE_LLD=ON \
-DLLVM_ENABLE_ASSERTIONS=OFF
ninja install
Setup a environment variable called MLIR_SYS_180_PREFIX
, LLVM_SYS_180_PREFIX
and TABLEGEN_180_PREFIX
pointing to the llvm directory:
# For Debian/Ubuntu using the repository, the path will be /usr/lib/llvm-18
export MLIR_SYS_180_PREFIX=/usr/lib/llvm-18
export LLVM_SYS_180_PREFIX=/usr/lib/llvm-18
export TABLEGEN_180_PREFIX=/usr/lib/llvm-18
Run the deps target to install the other dependencies.
make deps
The makefile deps
target (which you should have ran before) installs LLVM 18 with brew for you, afterwards you need to execute the env-macos.sh
script to setup the environment.
source scripts/env-macos.sh
To run the compiler, call cargo run
while passing it a file with the EVM bytecode to compile.
There are some example files under programs/
, for example:
cargo run programs/push32.bytecode
To generate the necessary artifacts, you need to run cargo run <filepath>
, with <filepath>
being the path to a file containing the EVM bytecode to compile.
Writing EVM bytecode directly can be a bit difficult, so you can edit src/main.rs, modifying the program
variable with the structure of your EVM program. After that you just run cargo run
.
An example edit would look like this:
fn main() {
let program = vec![
Operation::Push32([0; 32]),
Operation::Push32([42; 32]),
Operation::Add,
];
let output_file = "some_other_filename";
compile_binary(program, output_file).unwrap();
}
The most useful ones to inspect are the MLIR-IR (<name>.mlir
) and Assembly (<name>.asm
) files. The first one has a one-to-one mapping with the operations added in the compiler, while the second one contains the instructions that are executed by your machine.
The other generated artifacts are:
- Semi-optimized MLIR-IR (
<name>.after-pass.mlir
) - LLVM-IR (
<name>.ll
) - Object file (
<name>.o
) - Executable (
<name>
)
Once we have the executable, we can run it with a debugger (here we use lldb
, but you can use others). To run with lldb
, use lldb <name>
.
To run until we reach our main function, we can use:
br set -n main
run
thread step-inst
All registers: register read
The x0
register: register read x0
To inspect the memory at <address>
: memory read <address>
To inspect the memory at the address given by the register x0
: memory read $x0
To pretty-print the EVM stack at address X
: memory read -s32 -fu -c4 X
Reference:
- The
-s32
flag groups the bytes in 32-byte chunks. - The
-fu
flag interprets the chunks as unsigned integers. - The
-c4
flag includes 4 chunks: the one at the given address plus the three next chunks.
To restart the program, just use run
again.