A place to jot down ideas and/or commentary about the work in progress.
The single pass straight code generation from parsing approach was working but feeling increasingly brittle in the face of trying to get basic optimizations going, etc. Decided to pause that and shift to compiler2.c to start fresh with the existing lexer and rebuilding the parser to generate an Abstract Syntax Tree instead.
This allows for multiple passes, simplifies some optimizations, and paves the way for generating three-argument IR, SSA, etc.
FUNCDEF 'fib'
PARAM 'n' i32
RETURNS i32
BLOCK
IF
BINOP <
NAME 'n'
U32 0x2
BLOCK
RETURN
NAME 'n'
The new compiler can parse all the existing test and demo code, but does not yet do any codegen.
Implemented basic global scope enum support and changed remaining #defines in compiler.c to enums.
Yanked out the lexer and built up rewriter.c that does a pretty limit C to whatever language this is transform by mostly passing code through but rewriting simple patterns like:
- void name(at an, bt bn) { -> func name(an at, bn bt) {
- type name = ... -> var name types = ...
- struct name { ... -> type name struct { ...
Discovered that demo/life.src bitrotted. Git bisect identifies the guilty commit as cfe2d815bea0dd47d0ebeae933d673f4bf16cd6d "compiler: logical and/or improvements.
Also d30153ccc6a1f5e61d13094999e703504153aa94 "copmiler: scanner fetches input data in chunks" seems to have broken the correct printing of error lines.
Considering replacing varargs error io... error("file %s has %u typos",fn,n) em("file "); es(fn); es(" has "); eu(n); ee(" typos"); kinda gross.
1030-flow-control shows inefficient branch patterns around if-x-continue usage:
000000ac: 88e00004 ldw r8, [sp, 4]
000000b0: 48840001 and r8, r8, 1
000000b4: 49000001 mov r9, 1
000000b8: 08890009 sub r8, r8, r9
000000bc: e9000002 bne 0xc8
if ((n & 1) == 1) {
000000c0: e7fffff3 b 0x90
continue;
}
000000c4: e7000000 b 0xc8
This also gets me thinking that adopting the C convention that integer types are treated as bools (0 = false, othewise true) would be more efficent in a number of places,
All of these from demo/life.src
What's up with this mov'ing the constant to r9 instead of just sub r8, r8, 5
?
00000098: f7ffffdb bl 0x8
0000009c: 08000000 mov r8, r0
000000a0: 4884000f and r8, r8, 15
000000a4: 49000005 mov r9, 5
000000a8: 08890009 sub r8, r8, r9
000000ac: ed00000a bge 0xd8
if ((xorshift32() & 15) < 5) {
Hmm...
gen_load_reg 0x8, <Reg: r=0,a=8,b=0,t=int32>
gen_code '0000009c: 08000000 mov r8, r0'
gen_load_reg<<< <Reg: r=8,a=0,b=0,t=int32>
gen_mul_op 0x3, <Reg: r=8,a=0,b=0,t=int32>, <Const: r=0,a=15,b=0,t=int32>
gen_code '000000a0: 4884000f and r8, r8, 15'
gen_rel_op 0x2, <Reg: r=8,a=0,b=0,t=int32>, <Const: r=0,a=5,b=0,t=int32>
gen_load_reg 0x9, <Const: r=0,a=5,b=0,t=int32>
gen_code '000000a4: 49000005 mov r9, 5'
gen_load_reg<<< <Reg: r=9,a=0,b=0,t=int32>
gen_branch_cond 0x0, <Comp: r=2,a=8,b=9,t=int32>
gen_code '000000a8: 08890009 sub r8, r8, r9'
gen_code '000000ac: ed000000 bge 0xb0'
Hm. RHS of the rel-op is insufficiently lazy, it looks.
It loads 5 into a register ahead of gen_branch_cond
And the add r8, r8, 4
at 0xc0 looks fishy too...
000000b0: 48d80004 add r8, sb, 4
000000b4: 89e00018 ldw r9, [sp, 24]
000000b8: 499a0050 mul r9, r9, 80
000000bc: 08880009 add r8, r8, r9
000000c0: 48880004 add r8, r8, 4
000000c4: 89e00014 ldw r9, [sp, 20]
000000c8: 08880009 add r8, r8, r9
000000cc: 49000001 mov r9, 1
000000d0: b9800004 stb r9, [r8, 4]
grid[y][x] = 1;
gen_item_from_obj<<< <RegInd: r=13,a=4,b=0,t=[25][80]byte>
gen_item_from_obj<<< <RegInd: r=14,a=24,b=0,t=int32>
gen_index <RegInd: r=13,a=4,b=0,t=[25][80]byte>, <RegInd: r=14,a=24,b=0,t=int32>
gen_address_reg 0x8, <RegInd: r=13,a=4,b=0,t=[25][80]byte>
gen_code '000000b0: 48d80004 add r8, sb, 4'
gen_load_reg 0x9, <RegInd: r=14,a=24,b=0,t=int32>
gen_code '000000b4: 89e00018 ldw r9, [sp, 24]'
gen_load_reg<<< <Reg: r=9,a=0,b=0,t=int32>
gen_code '000000b8: 499a0050 mul r9, r9, 80'
gen_code '000000bc: 08880009 add r8, r8, r9'
gen_index<<< <RegInd: r=8,a=4,b=0,t=[80]byte>
gen_item_from_obj<<< <RegInd: r=14,a=20,b=0,t=int32>
gen_index <RegInd: r=8,a=4,b=0,t=[80]byte>, <RegInd: r=14,a=20,b=0,t=int32>
gen_address_reg 0x8, <RegInd: r=8,a=4,b=0,t=[80]byte>
gen_code '000000c0: 48880004 add r8, r8, 4'
gen_load_reg 0x9, <RegInd: r=14,a=20,b=0,t=int32>
gen_code '000000c4: 89e00014 ldw r9, [sp, 20]'
gen_load_reg<<< <Reg: r=9,a=0,b=0,t=int32>
gen_code '000000c8: 08880009 add r8, r8, r9'
gen_index<<< <RegInd: r=8,a=4,b=0,t=byte>
gen_gen_store <Const: r=0,a=1,b=0,t=int32>, <RegInd: r=8,a=4,b=0,t=byte>
gen_load_reg 0x9, <Const: r=0,a=1,b=0,t=int32>
gen_code '000000cc: 49000001 mov r9, 1'
gen_load_reg<<< <Reg: r=9,a=0,b=0,t=int32>
gen_code '000000d0: b9800004 stb r9, [r8, 4]'
Quick check to see how much libc the compiler is depending on...
Not too bad.
unistd/syscall: open close read write exit
buffered: fopen fclose fgets fputs fwrite
formatted: printf fprintf vfprintf
memory: malloc memcmp memcpy memset strlen
misc: putchar puts exit abort\
When a break / continue / return takes control out of a scope, we're not smart enough to discard stuff like the jump-around-else emitted at the end of an if block, resulting in a dead branch op:
if (n < 2) {
return n;
00000020: 80e00004 ldw r0, [sp, 4]
00000024: e700000d b 0x5c
00000028: e700000c b 0x5c
We move return values from r0 to a temp register to cover cases where
we may immediately need r0 again (another call, etc) but that means
extra instructions and copies in other cases like return fib(n - 1) + fib(n - 2);
0000002c: 88e00004 ldw r8, [sp, 4] # get n
00000030: 40890001 sub r0, r8, 1 # r0 = n - 1
00000034: f7fffff3 bl 0x4 # r0 = fib(r0)
00000038: 08000000 mov r8, r0 # r0 -> tmp0
0000003c: 89e00004 ldw r9, [sp, 4] # get n
00000040: 40990002 sub r0, r9, 2 # r0 = n - 2
00000044: a8e00008 stw r8, [sp, 8] # save tmp0
00000048: f7ffffee bl 0x4 # r0 = fib(r0)
0000004c: 88e00008 ldw r8, [sp, 8] # restore tmp0
00000050: 09000000 mov r9, r0 # r0 -> tmp1
00000054: 00880009 add r0, r8, r9 # r0 = tmp0 + tmp1
00000058: e7000000 b 0x5c # jump to epilogue (return)
Up til now I had been passing around a "compiler context" pointer, so
almost every method looked like parse_xyz(Ctx ctx, ...)
with a sort of
explicit this pointer as the first argument.
This caused a lot of clutter, and it will not help performance in any useful way once it's self-hosting (since globals are loaded relative to a global pointer register, whereas pointers first have to be grabbed relative to sp and then further dereferenced).
It is useful from a namespace clutter standpoint to keep the struct rather than 20-some stray globals.
See: commit 1086c2aa1
As a big goal of the project is for the compiler to be self-hosted, able to compile itself, and the bigger and more complex it and the language gets the more work converting the source base over is likely to be, I've been thinking about the simplest path to a MVP self-hosting compiler.
Expressions and simple flow control (if/else, while, break, return) will be nearly identical to C. Basic function and structure declarations will be very similar (mostly flipping the name/type order and some slight punctuation changes).
My thinking is by writing the compiler in a strict subset of C that is very regular it should be pretty simple to mechanically translate the compiler with a script. Ideally entirely by script so I could maintain the compiler in both languages long enough to do some "co-simulation" style testing to ensure I'm getting identical compilation from both.
The remaining pieces needed to hit the MVP target are, I believe:
- finish if/while flow control
- support for enums, or convert enums to consts
- support for structs
- support for arrays
- support for pointers
- support for globals, init'ing gp
- support for open/close/read/write "syscalls"
- mini std library (malloc-analogue, alloc-only heap, string utils)
- rewrite lexer to use if/else instead of case
- emulator support for commandline args passing
Okay, that's still a decent chunk of work. But most of it is straight forward for a base implementation.
We can survive for a while without floating point, structure extension, whatever case/match mechanism ends up looking like, etc, etc. Data structures are all about dumb, single-linked lists for now. Hashtables or Trees can come much later in the process.
I need to make some decisions on pointers vs references.
I should just make ctx global -- passing it like a this pointer clutters up the code and doesn't buy much. And it'll be slower for quite a while, since for now param -> ldw -> pointer-in-reg -> ldw -> value in struct is more work than just loading relative to the global pointer.