In this assignment you'll implement a small language called Cobra,
which implements an en-Co-ded B-inary R-epresent-a-tion
of different values. It also uses C
function calls to implement some
user-facing operations, like printing and reporting run-time errors.
-
Use the fork link from the class website to create your private clone of the starter code.
-
Do
git clone https://github.com/ucsd-cse131-fa18/00-warmup-XXX
whereXXX
is your private repo. -
Link your clone to the "upstream" to get any updates
$ make upstream
after this you can get "updates" (in case we modify the starter code), with
$ make update
- Save (and submit) your work with:
$ git commit -a -m MESSAGE
$ git push
As usual, there are a few pieces that go into defining a language for us to compile.
-
A description of the concrete syntax – the text the programmer writes
-
A description of the abstract syntax – how to express what the programmer wrote in a data structure our compiler uses.
-
A description of the semantics — or behavior —of the abstract syntax, so our compiler knows what the code it generates should evaluate.
The concrete syntax of cobra
is:
<expr> :=
| let <bindings> in <expr>
| if <expr>: <expr> else: <expr>
| <binop-expr>
<binop-expr> :=
| <identifier>
| <number>
| true
| false
| add1(<expr>)
| sub1(<expr>)
| isNum(<expr>)
| isBool(<expr>)
| print(<expr>)
| <expr> + <expr>
| <expr> - <expr>
| <expr> * <expr>
| <expr> < <expr>
| <expr> > <expr>
| <expr> == <expr>
| ( <expr> )
<bindings> :=
| <identifier> = <expr>
| <identifier> = <expr>, <bindings>
}
The abstract syntax is defined by the data type Expr
:
data Prim1
= Add1
| Sub1
| Print
| IsNum
| IsBool
data Prim2
= Plus
| Minus
| Times
| Less
| Greater
| Equal
data Expr a
= Number !Integer a
| Id !Id a
| Prim1 !Prim1 !(Expr a) a
| Prim2 !Prim2 !(Expr a) !(Expr a) a
| Let !(Bind a) !(Expr a) !(Expr a) a
| If !(Expr a) !(Expr a) !(Expr a) a
data Bind a
= Bind !Id a
An immediate expression is a
- constant
Number n
, or - identifier
Id x
.
{-@ measure isImm @-}
isImm :: Expr a -> Bool
isImm (Number _ _) = True
isImm (Id _ _) = True
isImm _ = False
{-@ type ImmExpr a = {v:Expr a | isImm v} @-}
An A-Normal Form (ANF) expression is one where every argument
for a Prim1
or Prim2
or conditional (in an If
) is immediate
{-@ measure isAnf @-}
isAnf :: Expr a -> Bool
isAnf (Number _ _) = True
isAnf (Id _ _) = True
isAnf (Prim1 _ e _) = isImm e
isAnf (Prim2 _ e e' _) = isImm e && isImm e'
isAnf (If c t e _) = isImm c && isAnf t && isAnf e
isAnf (Let _ e e' _) = isAnf e && isAnf e'
{-@ type AnfExpr a = {v:Expr a| isAnf v} @-}
The above is very similar to the constructs for boa
-- the main additions are
the new primitives.
With the addition of two types to the language, there are two main changes that ripple through the implementation:
- Representation of Values
- Possibility of Errors
There is one other major addition, which is the print
primitive, discussed
more below.
The representation of values requires a definition. We'll use the following representations for the Cobra runtime:
true
will be represented as the constant0xFFFFFFFF
false
will be represented as the constant0x7FFFFFFF
- numbers will be represented with a zero in the rightmost bit, as in class.
So, for example,
2
is represented as0x00000004
.
You will fix the provided print
function in main.c
to print these
values correctly: true
and false
should print as those words, and numbers
should print out as the underlying number being represented.
// c-bits/main.c
int print(int val) {
printf("Unknown value: %#010x\n", val);
return val;
}
A print
expression in cobra
should pass the value of its subexpression to
the print
function in main.c
, and evaluate to the same value (print
in
main.c
helps out here by returning its argument). The main work you need
to do here is similar to when calling an error function; evaluate the argument, push it onto the stack with push
, and then call
into print
.
You should raise errors in the following cases:
-
-
,+
,*
,==
,<
, and>
should raise an error (by printing it out) with the substring"expected a number"
if the operation doesn't get two numbers (you can print more than this if you like, but it must be a substring) -
add1
andsub1
should raise an error with the substring"expected a number"
if the argument isn't a number -
+
,-
, and*
should raise an error with the substring"arithmetic overflow"
if the result overflows, and falls outside the range representable in 31 bits. Thejo
instruction (not to be confused with the Joe Instructor) which jumps if the last instruction overflowed, is helpful here. -
if
should raise an error with the substring"expected a boolean"
if the conditional value is not a boolean.
The code for the above will be part of your implementation of:
-- lib/Language/Cobra/Compiler.hs
-- | TBD: Implement code for `Prim1` with appropriate type checking
compilePrim1 :: Tag -> Env -> Prim1 -> IExp -> [Instruction]
compilePrim1 l env op v = error "TBD:compilePrim1"
-- | TBD: Implement code for `Prim2` with appropriate type checking
compilePrim2 :: Tag -> Env -> Prim2 -> IExp -> IExp -> [Instruction]
compilePrim2 l env op = error "TBD:compilePrim2"
-- | TBD: Implement code for `If` with appropriate type checking
compileIf :: Tag -> Env -> IExp -> AExp -> AExp -> [Instruction]
compileIf l env v e1 e2 = error "TBD:compileIf"
These error messages should be printed on standard error, so use a call like:
fprintf(stderr, "Error: expected a number")
I recommend raising an error by
- Adding fixed code to the end of your generated assembly that
- Calls into
error
functions that you will implement inmain.c
. - Also don't forget to declare the functions from
main.c
that you call asextern
in theheader
variable inAsm.hs
.
For example, you might insert code like:
internal_error_non_number:
push eax
call error_non_number
Which will store the value in eax
on the top of the stack, move esp
appropriately, and perform a jump into error_non_number
function,
which you will write in main.c
as a function of one argument.
The other operators, isNum
, isBool
, and print
, cannot raise
errors, and always succeed.
let x = 1 in
let y = print(x + 1) in
print(y + 2)
# will output
2
4
4
The first 2 comes from the first print expression. The first 4 comes from the second print expression. The final line prints the answer of the program as usual, so there's an “extra” 4.
if 54: true else: false
# prints (on standard error) something like:
Error: expected a boolean in if, got 54
In order to set up the stack properly to call C functions, like print
and
your error functions, it's necessary to make a few changes to what we
had in Boa.
-
Allocating stack space ahead of time: At the start of our generated code (which we now recognize is a function body participating in a C runtime stack), we need to make sure we make enough stack space for all variables we create, and reserve that space ahead of time. To do this, we move
esp
to point to a location that is N words away (so N * 4 bytes for us), where N is the greatest number of variables we need at once. This is actually tricky to compute to be fully optimal (teaser for later in the semester: by “tricky” I mean NP-hard), but it's easy to get an OK heuristic – we can compute the maximum depth of nested definitions.To do this, we need the
countVars
function, which we implemented in class, so I've provided it. You need to add instructions to the provided spot infunEntry
in order to make sure the correct space is allocated on the stack by subtracting the right amount fromesp
. -
Using the Base Pointer: In addition, this means that all variable references need to happen from
ebp
rather than fromesp
. This is becauseesp
can change while we are pushing arguments onto the stack for other function calls, soebp
is the place we should trust for consistent offsets. -
Participating in the C stack: As a C function callee (from
main
) and caller (of error functions andprint
), our code has some responsibilities. First, we need to store the old base pointer upon entry, and update the base pointer to hold the current top of the stack (which includes the return pointer into main, for example). This is why the typical top two lines of most C functions are:push ebp mov ebp, esp
Similarly, when we're done with the function, we need to restore the stack pointer to its old location, and put the old base pointer value back. This is why the last lines before a
ret
in a C function are often:mov esp, ebp pop ebp
-
Other Responsibilities: If we were using registers beyond
eax
,ebp
, andesp
, we'd be responsible for storing some of them as callee, and some as caller. But we're not going to go into those details for this assignment. Since we aren't using those registers, it has no effect on our code's behavior.
You need to account for the above by filling in appropriate implementations for
-- lib/Language/Cobra/Compiler.hs
-- | TBD: insert instructions for setting up stack-frame for `n` local vars
funEntry :: Int -> [Instruction]
funEntry n = error "TBD:funEntry"
-- | TBD: cleaning up stack-frame after function finishes
funExit :: [Instruction]
funExit = error "TBD:funExit"
-
Sized
You may run into errors that report that the size of an operation is ambiguous. This could happen if you write, for example:
mov [ebp-8], 0
Because the assembler doesn't know if the program should move a four-byte zero, a one-byte zero, or something in between into memory starting at
[ebp-8]
. To solve this, you can supply a size:mov [ebp-8], DWORD 0
This tells the assembler to use the “double word” size for 0, which corresponds to 32 bits. A
WORD
corresponds to 16 bits, and aBYTE
corresponds to 8 bits. To get a sized argument, you can use theSized
constructor fromarg
. -
HexConst
Sometimes it's nice to read things in hex notation as opposed to decimal constants. I've provided a new
HexConst
arg
that's useful for this case. -
IPush
,IPop
These two instructions manage values on the stack.
push
adds a value at the current location ofesp
, and incrementsesp
to point past the added value.pop
decrementsesp
and moves the value at the locationesp
was pointing to into the provided arg. -
ICall
A call does two things:
- Pushes the next _code_ location onto the stack (just like a `push`), which becomes the return pointer - Performs an unconditional `jmp` to the provided label
call
does not affectebp
, which the program must maintain on its own. -
IShr
,IShl
: Bit shifting operations -
IAnd
,IOr
,IXor
: Bit masking operations -
IJo
,IJno
: Jump to the provided label if the last arithmetic operation did/did not overflow
As usual, full summaries of the instructions we use are at this assembly guide.
Make sure your error messages match the error messages from the instructions and/or tests exactly!
The testing code is the same as for boa
. ANF conversion is provided,
and hasn't changed aside from the addition of new primitives,
so your tests should focus on the compilation-and-execution tests.
As usual, fill in new tests in yourTests.json
following the earlier format.
An old friend is helpful here, too: valgrind
. You can run
$ valgrind output/some_test.run
in order to get a little more feedback on tests that fail with -10
as their
exit code (which usually indicates a SEGFAULT). This can sometimes tip you off
quite well as to how memory is off, since sometimes you'll see code trying to
jump to a constant that's in your code, or other obvious tells that there's
something off in the stack. Also, if you've done all your stack management
correctly, valgrind
will report a clean run for your program!
The creators of the 5 hardest tests get 5% of total points as extra credit
-
You can add up to 10 new tests in
tests/yourTests.json
-
Feel free to add more, we'll just take the first 10.
-
A test's "score" is the total number of compilers (submitted by the entire class) on which the test produces the wrong output.
-
That is, the harder a test, the higher its score.
Here's one order in which you could consider tackling the implementation:
-
Fix the
print
function inmain.c
so that it prints out the right output. It will need to check for the tag using C bitwise operators, and useprintf
to print the right value. -
Take a first shot at figuring out how to increase the stack appropriately by using
countVars
and implementingfunEntry
andfunExit
-
Fill in the
Prim1
case for everything butprint
, and figure out how to check for errors, and call the "non-number" error reporting function. Test as you go. Be aware that if the function call SEGFAULTS, it may be because you need to refine step 2. -
Implement
print
by compiling it to with a call to theprint
after pushing appropriate arguments. Be aware that if the call doesn't work, it may be because of step 2 again. Test as you go; be aware that you should test interesting sequences ofprint
expressions and let-bindings to make sure your stack integrity is good before and after calls. -
Fill in all of the
Prim2
cases, using the error-reporting from the last step. Test as you go. -
Complete the
If
case and test as you go.
First, make sure that your code works with the provided test cases by running the command
make test
When you're ready, run the following command to submit your homework:
make turnin
Before submitting your code, you have to fill this form to register your groups. You need to fill it in for each assignment.
We will use this form to match your github account username to your student id, so you must fill it even if you have worked on the assignment individually.
To submit your code, just do:
$ make turnin
This will simply do a git commit
followed by a git push
to send us your code.
We will use the most recent commit of your code (on master
branch) as your submission.