dbc aims to replicate of Ken Thompson's implementation of B for the PDP-11 (see here), with a few differences:
-
Instead of constructing a "reverse Polish threaded code interpreter" as described in section 12.0 of the manual,
dbc
generates LLVM bitcode. This is for two reasons: First, so I could get my feet wet with LLVM; second, sodbc
can leverage the existing optimizers for LLVM. -
dbc
targets x86 and x86-64, so values may be larger than on the 16-bit PDP-11. This also means character literals may contain as many as 8 characters instead of 2.
This project is still rough around the edges, but it should be functional enough to play around with the B language. The example program from section 9.2, which calculates 4000 digits of e, compiles and runs flawlessly (well, it does for me). Run make test
to try it out.
-
All operators and statements are implemented, although they haven't all been thoroughly tested. I suspect some edge cases with oddly placed labels may generate invalid LLVM IR.
-
Every library function has been implemented, although
gtty()
andstty()
differ slightly from their descriptions in the manual: they both require a 4-word vector as a second argument. They are equivalent totcgetattr
andtcsetattr
, respectively, but use a vector instead ofstruct termios
to hold the four flag fields. -
The error diagnostics still need some work—several different errors may be printed for the same line, and the line number is not always correct.
-
Indirectly assigning to a global variable will have strange results (e.g.
foo = 40
is fine, but*&foo = 40
will actually set foo to 5, not 40). This issue does not affect local variable assignment, nor does it affect assignment to array indices (i.e. iffoo
is a global vector,foo[3] = 40
works as expected). The problem stems from a kludge which is necessary to achieve correct pointer arithmetic semantics. -
A simple definition may contain at most one value in its initializer list. I have not yet found a reasonable way to implement the semantics described in section 7.1 of the manual; use a vector definition instead (e.g.
foo[5] 1, 2, 3, 4, 5;
instead offoo 1 2 3 4 5;
). Incidentally, this same restriction seemed to be present in the H6070 implementation of B. -
Global initializer lists may not contain strings or names of other globals (yet).
You'll need a few things to get started:
- A computer running Linux or FreeBSD¹ (x86 or x86-64)²
- LLVM libraries and header files
- Your favorite C compiler (tested with clang 3.8 and gcc 6.1)
- A C++ compiler of your choice (for linking with LLVM; clang++ and g++ should both work)
- yacc (tested with bison and byacc)
- lex (tested with flex)
- GNU binutils (ld, ar, as, and objcopy)
- GNU make
- This source code
Check your distribution's repositories for any missing components (Ubuntu, for example, also requires the zlib1g-dev
package)
Once you have everything installed, cd
to the directory containing the source code for dbc
and run make
. If all goes well, you should now be the proud owner of a binary named dbc
. While you're at it, also run make libb.a
to generate the B standard library.
Now that you've compiled the compiler, you're just a few steps away from compiling your first B program. The observant reader will notice that these steps closely resemble the procedure from section 10.0 of the B manual:
./dbc sample.b sample.bc
- Compile your B source code into an intermediate language (LLVM bitcode)llc sample.bc
- Convert the bitcode into assembly languageas sample.s -o sample.o
- Compile the assembly source into an object fileld sample.o libb.a
- Link your object file against the B library./a.out
- Run your first B program!
To make the process less tedious, the dbrc
script may be used to execute the same sequence of commands: ./dbrc sample.b
(Don't expect anything fancy—it runs that exact sequence of commands, nothing more, nothing less).
¹FreeBSD and Linux use different conventions for making syscalls on x86; run brandelf -t Linux a.out
before executing a binary compiled by dbrc
on FreeBSD (note that this is only necessary for x86, not x86_64)
²ARM is also supported, but compiled programs must be linked against libc due to the lack of an integer division instruction on ARM
- Reversed assignment operators: B uses
x =+ 2
, notx += 2
(although the former style was legal in older C programs, according to K&R) - B also has assignment operators for comparisons:
x =< 2
stores the result ofx < 2
in x;x === 2
(that's right, triple=
) stores the result ofx == 2
in x - The escape character is
*
, not\
:'*n'
is a newline character - No types: all values are the same width
- As a consequence of the previous point, character literals may contain more than one character, so
putchar('ab')
is perfectly fine; interestingly, so-called "multichars" are also valid C, since character literals in C are actually integers (althoughputchar
on my system only prints the last character) - Variables must be declared with either
auto
orextrn
, unlike in C where the storage class specifier is usually implicit (since B doesn't have types, those keywords are necessary to indicate a declaration) - No floating point numbers
- No structs or unions
- The PDP-11 implementation of B lacks the keywords
for
,do
,break
,continue
, anddefault
, as well as the~
and^
operators, although they do make an appearance in the Honeywell 6070 version - No
&&
and||
operators, only bitwise&
and|
; the H6070 documentation mentions that&
and|
were overloaded to perform logical operations when used inside ofif
,while
, or a conditional expression, but the PDP-11 manual does not assign any additional behavior to the bitwise operators - Function definitions don't require braces for a single statement:
main() return(0);
is a valid B program - A
return
statement requires parentheses:return(x)
- ...but
switch
does not:switch x { /* do stuff */ }
argv
is a global variable, not an argument tomain()
;argv[0]
contains the number of command line arguments, not the program name
-
Users' Reference to B - An HTML version of the B Manual for PDP-11, transcribed and edited by none other than Dennis Ritchie
-
The Programming Language B - Additional B references written by Steve Johnson and Brian Kernighan; they describe a later version of B for the Honeywell 6070 (also noteworthy: Kernighan's tutorial contains what may be the first "hello world")
-
B man page - The
man
page for B from Version 1 Unix -
LLVM Tutorial - A handy guide for building your own compiler for a simple programming language
-
LLVM-C API - These just look like auto-generated docs, so it may be more convenient to read the headers in
/usr/include/llvm-c