Skip to content

Latest commit

 

History

History
31 lines (22 loc) · 2.5 KB

README.md

File metadata and controls

31 lines (22 loc) · 2.5 KB

C Compiler from Scratch

This is a basic C compiler that I wrote over the summer after my sophomore year. The goal of this project was to gain a deeper understanding of how compilers work. The only part that I offloaded to a library is the tokenizer which uses lex. Eventually I would like to replace it with the tokenizer that I wrote more recently https://github.com/nickeb96/tokenizer.

The compiler was written and tested on a mac, but since it outputs x86_64 assembly in GAS syntax it should work on GNU/Linux as well. The only caveat is that on Darwin all C symbols are prepended with an underscore once they are assembled, which this compiler respects. This doesn't happen on Linux which may cause linking issues on that platform.

This is not a fully compliant C compiler, there are several features that I did not have time to complete while I was working on it. Here's a list of some of the most important ones that are or aren't in.

  • while loops
  • for loops - I just emulate them with a while loop in the examples.
  • do while loops - Emulate with a while loop and an if statement with a break at the end.
  • arithmetic operators
  • comparison operators
  • bitwise operators - These would be relatively simple to add.
  • dereference and address of operators
  • expression grouping with parenthesis, for instance (x + 1) * y
  • if statements with optional else clause
  • switch statements - A chain of if else statements is a temporary alternative.
  • break and continue statements
  • global variables/constants
  • subscript operator - Emulate with pointer arithmetic and a dereference.
  • logical and/or operators with proper short circuiting
  • compound assignment (+=, /=, etc.) - Just do the operation and then the assignment.

Another incomplete part is the type system. Unfortunately the only type available is a 64-bit quadword that can serve as either an integer or a pointer. The type system was the last thing that I worked on, but it was not fully integrated into the rest of the compiler. A side effect of this is that pointer arithmetic is not automatically scaled by the size of the type. For example (array + i) would need to be written as (array + i * 8), assuming array is an array of quadwords.

This compiler does not strip comments or perform any preprocessor directives, however I wrote a C preprocessor here https://github.com/nickeb96/preprocessor that does this and could be run on files before passing them to the compiler.