compiler

Assume that X = A.new has type A in # typed: true files (sorbet…

Dec 8, 2022

f3043ad · Dec 8, 2022

This branch is 1340 commits behind sorbet/sorbet:master.

Name	Name	Last commit message	Last commit date
parent directory ..
Core	Core	make dependencies in `common/` finer-grained (sorbet#6528 )	Nov 1, 2022
Errors	Errors	fail harder when lowering sorbet_i_getRuby{Class,Constant} (sorbet#4449 )	Aug 3, 2021
IREmitter	IREmitter	Assume that `X = A.new` has type `A` in `# typed: true` files (sorbet…	Dec 8, 2022
Linker	Linker	Improve document formatting performance with `posix_spawn` (sorbet#6003 )	Jul 13, 2022
ObjectFileEmitter	ObjectFileEmitter	Initialize Timer::endTime and log for >1w (sorbet#5196 )	Jan 31, 2022
Passes	Passes	make fewer assumptions when lowering `sorbet_i_getRuby{Class,Constant…	Apr 4, 2022
ruby-static-exports	ruby-static-exports	Add an intrinsic for Hash#dig (sorbet#4953 )	Dec 2, 2021
BUILD	BUILD	Avoid building llvm in test-static-sanitized (sorbet#4437 )	Aug 10, 2021
README.md	README.md	Fix doc link for compiler (sorbet#5293 )	Feb 14, 2022
main.cc	main.cc	Add the sorbet compiler	Jul 28, 2021

README.md

The Sorbet Compiler

This repository contains the Sorbet compiler. It uses Sorbet and LLVM to emit shared objects that conform to Ruby's C extension API given Ruby source files.

It also contains Sorbet Ruby, a set of scripts that build Ruby in a specific way so that it can run shared objects that the Sorbet Compiler emits.

This README contains documentation specifically for contributing to the compiler portion of Sorbet.

If you are at Stripe, you might also want to see http://go/types/internals for docs about Stripe-specific development workflows and historical Stripe context.

Quickstart

For the best chance of success:

Run sudo apt-get install libncurses-dev libssl-dev

Then run:

./bazel test //test:compiler

This will take a long time, as it's downloading everything and building everything (including LLVM) from scratch.

The Sorbet Compiler is tested on a very narrow set of machines and rarely built from scratch. It's likely that the first time you try to build it on a new machine you will encounter problems, and need to either install certain system-wide dependencies, or switch operating systems.

Learning how the Sorbet Compiler works

TODO(jez) compiler internals

Running Sorbet Ruby and the Sorbet Compiler

For a more complete walk through, see docs/running-compiled-code.md.

There's a fair amount of set up before you can run Sorbet Ruby on a file or load a compiled Ruby file:

Using sig and T depends on sorbet-runtime
Compiling one or more Ruby files requires a folder for the LLVM IR and *.so files.
Running a compiled Ruby file requires passing that folder and also including some Ruby monkeypatches to require find the artifacts.

There are some short scripts that handle all this bookkeeping:

# Interpret a file with Sorbet Ruby
test/run_ruby.sh     [-d] test/testdata/compiler/hello.rb

# Compile a file with the Sorbet compiler
test/run_sorbet.sh   [-d] test/testdata/compiler/hello.rb

# Compile and run a file
test/run_compiled.sh [-d] test/testdata/compiler/hello.rb

The optional -d flag to these scripts will launch the important command for that script in lldb with all the right args, so you can set breakpoints before running it.

Running the tests

To run all the tests:

bazel test //test:compiler //test/cli/compiler --config=dbg

By default, all test output goes into files. To also print it to the screen:

bazel test //test:compiler //test/cli/compiler --config=dbg --test_output=errors

If any test failed, you will see two pieces of information printed:

1.  //test:test_testdata/compiler/hello                                      FAILED in 0.1s
2.    /home/jez/.cache/bazel/.../test/test_testdata/compiler/hellow/test.log

the test's target (in case you want to run just this test again with bazel test <target>)
a (runnable) file containing the test's output

To see the failing output, either:

Re-run bazel test with the --test_output=errors flag
Copy/paste the *.log file and run it (the output will open in less)

Writing tests

We write tests by adding files to subfolders of the test/ directory. Every compiler test works like this:

Write a Ruby file
Interpret the Ruby file with Sorbet Ruby
Compile and run the compiled output
Compare the resulting exit codes, stdouts, and stderrs (they must match to pass)

These Ruby file tests live in test/testdata/.

Thus, it's very important to print things in tests. If a test doesn't print anything, then both the interpreted and compiled output will be empty and the test will pass even if the compiler did something completely wrong.

Verifying LLVM IR

We rely on LLVM's FileCheck to verify that the structure of the generated IR conforms to our expectations. If you are writing a test where you would like to verify the structure of the generated code, you need to add a # run_filecheck: <pass> comment to the file. This comment will typically be added in the top header block of comments.

<pass> in the above comment can be one of three values:

INITIAL, for verifying the LLVM IR immediately after it is generated by the Sorbet compiler;
LOWERED, for verifying the LLVM IR after the Sorbet compiler's custom lowerings have been run;
OPT, for verifying the LLVM IR after LLVM's suite of optimizations has been run.

<pass> corresponds to the value given to the -check-prefix option of FileCheck.

Please note that FileCheck will verify that your checks occur in the same order in the LLVM IR file as they are written in the test file. So, as an example, if a file contains:

# OPT-LABEL: "func_Object#foo"
# OPT-NOT: call i64 @sorbet_getMethodBlockAsProc
# OPT: call i64 @rb_yield_values_kw
# OPT-NOT: call i64 @sorbet_getMethodBlockAsProc
# OPT{LITERAL}: }

The following checks will be run in order on the optimized LLVM IR:

"func_Object#foo" appears exactly once in the file;
call i64 @sorbet_getMethodBlockAsProc does not appear between the above line and the next check;
call i64 @rb_yield_values_kw appears;
call i64 @sorbet_getMethodBlockAsProc does not appear between the previous line and the next check;
a literal } appears on a line.

The first check and the last check are intended to limit the scope of the contained checks to a single function definition. The quotes in the first check are intended to limit the check to the actual function definition and avoid matching ancilliary data associated with the function. If we used INITIAL-LABEL:, INITIAL-NOT:, INITIAL:, and so forth, the checks would be run against the initial LLVM IR generated by the Sorbet compiler.

It is generally not a good idea to run checks for only a single pass: running checks against two passes enables you to check whether the initial structure was what you expected (and make things robust against e.g. function renames) and whether the transformation you expected to take place actually did take place.

Comparing LLVM IR (deprecated)

A test can also have a *.llo.exp file. If this file is present, the test runner will use the llvm-diff command line tool to assert that the optimized LLVM IR output for a given *.rb file matches what's in the *.llo.exp file. For example:

The llvm-diff command is not a textual diff. It is slightly aware of functionally equivalent files (e.g., names of variables are different but they're initialized and used the same way means they match).

The LLVM IR diff tests fail are expected to fail for most new changes. The idea is that you should skim the diffs, make sure they look appropriate given what was meant to change, and then record the new exp files with this command:

# Can only be run on Linux.
tools/scripts/update_compiler_exp.sh

The expectations files are somewhat platform dependent, so they must be recorded on a Linux machine.

After the expectation files have been edited on disk, use

git diff --color-words

for a better git diff that ignores irrelevant parts that have changed.

Comparing stderr

Frequently, a test fails because the output on stderr is different. For example, interpreted vs compiled backtraces usually are slightly different.

Rather than letting the test chunder an exception backtrace to stderr, consider catching the expected exception and only printing the message.

If the stderr difference is not due to exceptions (or the behavior of the test changes when introducing exception handling), add this special comment to the top of the test:

# skip_stderr_check

With this comment, only the exit code and stdout of a test must match.

Debugging

It is hard to debug something inside the Bazel test sandbox directly. Instead, use the test/run_*.sh scripts (discussed above). These scripts have a -d option that will drop into lldb before running.

`.lldbinit`

We have a custom .lldbinit file with some helper commands. The new commands it makes available inside lldb are:

rubysourcemaps
- set up source maps Ruby VM source files (e.g., when looking at a ruby C-level backtrace)
llvmsourcemaps
- set up source maps for LLVM source files (e.g., when looking at exceptions thrown when compiling a Ruby file)
rubystack
- dump the current Ruby stack if stopped in a Ruby VM breakpoint

If you see a message about local .lldbinit files when running lldb, you can either:

ignore all local .lldbinit files:

# ~/.lldbinit
settings set target.load-cwd-lldbinit false

run all local .lldbinit files:

# ~/.lldbinit
settings set target.load-cwd-lldbinit true

make the choice run-by-run:

lldb (--local-lldbinit|--no-lldbinit) <normal lldb args...>

Debugging the Ruby VM

Ruby code compiled by Sorbet is run by the Ruby VM. It helps to have an idea of what functions are available in the Ruby VM to break on or call when stopped:

Suggested breakpoints:

rb_longjmp - break on nearly any Ruby exception being raised
- There are a handful of cases where breaking on rb_longjmp won't stop when a Ruby exception is raised, but I don't remember what they are.
Find something in insns.def
- All Ruby VM bytecode instructions' implementations are in insns.def
- You can break on like insns.def:762

Suggested functions to call while stopped:

rb_backtrace()
- Print a Ruby-level backtrace right now
rb_id2name
- Convert an ID (Ruby interned string) to a C string
sorbet_dbg_p(...)
- Print the given Ruby VALUE like you had called p ... in Ruby

Other:

To debug Ruby running via Rbenv:

lldb -- $(rbenv which ruby) [<args>...]

You can use the stop_in_debugger gem to set a C-level debugger breakpoint from Ruby source code.

C++ in the compiler

In Sorbet, any exception is a crash (Sorbet never raises exceptions for expected behavior).

But this is not true in the Sorbet compiler. Certain user-expected outcomes are powered by exceptions (see AbortCompilation). This means that an exception might be thrown to interrupt you during normal execution and be recovered from higher up. For example:

unique_ptr<int> foo(unique_ptr<int> x) {
    auto *y = x.release();
    codeThatRaisesAbortCompilation();
    auto z = unique_ptr<int>(y);
    return z;
}

In this case, ownership of x is released, but an exception is thrown before ownership of y is transfered to z. This means that the memory associated with y will never be freed, but the AbortCompilation will be caught and execution will continue, and this memory will have leaked.

It's best to just never use unique_ptr::release in the compiler.

Editor and environment

Nearly all of the Editor and environment tips from the main README apply here too.

Especially the --disk_cache suggestions. If you're working on a Stripe devbox, you'll want to make the .bazelrc.local file on the devbox--it's gitignored by default, so you'll want to make it directly on the devbox.

Sorbet compiler-specific recommendations:

Get yourself a syntax highlighting plugin for LLVM IR. For example:
- https://github.com/rhysd/vim-llvm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

compiler

compiler

README.md

The Sorbet Compiler

Table of Contents

Quickstart

Learning how the Sorbet Compiler works

Running Sorbet Ruby and the Sorbet Compiler

Running the tests

Writing tests

Verifying LLVM IR

Comparing LLVM IR (deprecated)

Comparing stderr

Debugging

`.lldbinit`

Debugging the Ruby VM

C++ in the compiler

Editor and environment

Files

compiler

Directory actions

More options

Directory actions

More options

Latest commit

History

compiler

Folders and files

parent directory

README.md

The Sorbet Compiler

Table of Contents

Quickstart

Learning how the Sorbet Compiler works

Running Sorbet Ruby and the Sorbet Compiler

Running the tests

Writing tests

Verifying LLVM IR

Comparing LLVM IR (deprecated)

Comparing stderr

Debugging

.lldbinit

Debugging the Ruby VM

C++ in the compiler

Editor and environment

`.lldbinit`