Skip to content

Alessio2405/C-Compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

C Compiler (from scratch)

A simple C compiler written in C that generates x86-64 assembly code. This compiler supports a subset of the C language including functions, variables, control flow, and basic expressions.

Features

  • Data Types: int, char, void, pointers, and arrays
  • Operators: Arithmetic (+, -, *, /, %), comparison (==, !=, <, >, <=, >=), logical (&&, ||, !), and bitwise operators
  • Control Flow: if/else, while, for, break, continue
  • Functions: Function definitions and calls with parameters
  • Variables: Local and global variable declarations with initializers
  • Arrays: Array declarations and access
  • Pointers: Basic pointer operations (address-of &, dereference *)
  • String Literals: Basic string literal support

Project Structure

├── src/
│   ├── main.c          # Main compiler entry point
│   ├── lexer.c         # Lexical analyzer (tokenizer)
│   ├── parser.c        # Parser (creates AST)
│   ├── codegen.c       # Code generation (x86-64 assembly)
│   ├── symbol.c        # Symbol table management
│   └── types.c         # Type system utilities
├── include/
│   ├── compiler.h      # Common definitions and structures
│   ├── lexer.h         # Lexer interface
│   ├── parser.h        # Parser interface
│   ├── codegen.h       # Code generation interface
│   ├── symbol.h        # Symbol table interface
│   └── types.h         # Type system interface
├── examples/
│   ├── factorial.c     # Factorial function example
└── README.md          # This file

Building

Prerequisites

  • GCC compiler
  • Make utility

Build Instructions

Build the Compiler (using GCC):

gcc -std=c99 -Wall -Wextra \
    compiler.c parser.c lexer.c symbol.c type.c codegen.c \
    -o compiler

Example

Compile factorial.c example:

./compiler examples/factorial.c factorial.s && gcc -o factorial factorial.s

Grammar

Program Structure

program = function*
function = type identifier '(' parameter_list? ')' block
parameter_list = type identifier (',' type identifier)*

Statements

statement = block
          | declaration
          | expression_statement
          | if_statement
          | while_statement
          | for_statement
          | return_statement
          | break_statement
          | continue_statement

block = '{' statement* '}'
declaration = type identifier ('[' number ']')? ('=' expression)? ';'
expression_statement = expression ';'
if_statement = 'if' '(' expression ')' statement ('else' statement)?
while_statement = 'while' '(' expression ')' statement
for_statement = 'for' '(' expression? ';' expression? ';' expression? ')' statement
return_statement = 'return' expression? ';'

Expressions

expression = assignment
assignment = logical_or ('=' assignment)?
logical_or = logical_and ('||' logical_and)*
logical_and = equality ('&&' equality)*
equality = comparison (('==' | '!=') comparison)*
comparison = term (('<' | '>' | '<=' | '>=') term)*
term = factor (('+' | '-') factor)*
factor = unary (('*' | '/' | '%') unary)*
unary = ('!' | '-' | '+' | '~' | '*' | '&')? primary
primary = number | string | identifier | function_call | array_access | '(' expression ')'

Examples

Hello World

int main() {
    printf("Hello, World!\n");
    return 0;
}

Factorial Function

int factorial(int n) {
    if (n <= 1) {
        return 1;
    }
    return n * factorial(n - 1);
}

int main() {
    int result = factorial(5);
    printf("5! = %d\n", result);
    return 0;
}

Array Example

int main() {
    int arr[5];
    int i;
    
    for (i = 0; i < 5; i++) {
        arr[i] = i * i;
    }
    
    for (i = 0; i < 5; i++) {
        printf("%d ", arr[i]);
    }
    printf("\n");
    
    return 0;
}

Current Limitations

  • No preprocessor support
  • No struct or union types
  • No float or double types
  • No standard library functions (except basic ones)
  • No dynamic memory allocation
  • Limited error recovery
  • No optimization passes
  • No debugging information generation

Architecture

The compiler follows a traditional multi-pass design:

  1. Lexical Analysis: Converts source code into tokens
  2. Syntax Analysis: Builds an Abstract Syntax Tree (AST)
  3. Semantic Analysis: Type checking and symbol resolution
  4. Code Generation: Generates x86-64 assembly code

Key Data Structures

  • Token: Represents lexical elements (keywords, operators, literals)
  • ASTNode: Represents syntax tree nodes
  • Type: Represents data types in the type system
  • Symbol: Represents identifiers in the symbol table

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests for new features
  5. Submit a pull request

TODO

  • Add more comprehensive error messages
  • Implement struct and union types
  • Add floating-point support
  • Implement preprocessor
  • Add optimization passes
  • Improve debugging information
  • Add more built-in functions
  • Implement static analysis warnings

About

Simple C Compiler (Made in C) from scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages