Skip to content

A compiler with Lexical and Parser generators implemented with C++ and tested on Java language rules.

Notifications You must be signed in to change notification settings

AhmedMaghawry/Compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CompilerProject

  • A compiler with Lexical and Parser generators implemented with C++ and tested on Java language rules.
  • This project id divided into 3 phases :
  1. Lexical Analyzer Generator.
  2. Parser Generator.
  3. Java Byte Code Generation.

1. Lexical Analyzer Generator :

This phase of the assignment aims to practice techniques for building automatic lexical analyzer generator tools.

Description :

  • The lexical analyzer generator is required to automatically construct a lexical analyzer from a regular expression description of a set of tokens. The tool is required to construct a nondeterministic finite automata (NFA) for the given regular expressions, combine these NFAs together with a new starting state, convert the resulting NFA to a DFA, minimize it and emit the transition table for the reduced DFA together with a lexical analyzer program that simulates the resulting DFA machine.

  • The generated lexical analyzer has to read its input one character at a time, until it finds the longest prefix of the input, which matches one of the given regular expressions. It should create a symbol table and insert each identifier in the table. If more than one regular expression matches some longest prefix of the input, the lexical analyzer should break the tie in favor of the regular expression listed first in the regular specifications. If a match exists, the lexical analyzer should produce the token class and the attribute value. If none of the regular expressions matches any input prefix, an error recovery routine is to be called to print an error message and to continue looking for tokens.

Lexical Rules:

-The tokens of the given language are: identifiers, numbers, keywords, operators and punctuation symbols.

-The token id matches a letter followed by zero or more letters or digits.

  • The token num matches an unsigned integer or a floating- point number. The numberconsists of one or more decimal digits, an optional decimal point followed by one or more digits and an optional exponent consisting of an E followed by one or more digits.

  • Keywords are reserved. The given keywords are: int, float, boolean, if, else, while.

  • Operators are: +, -, *, /, =, <=, <, >, >=, !=, ==

  • Punctuation symbols are parentheses, curly brackets, commas and semicolons.

  • Blanks between tokens are optional.

Lexical Rules Input File Format:

  • Lexical rules input file is a text file.

  • Regular definitions are lines in the form LHS = RHS

  • Regular expressions are lines in the form LHS: RHS

  • Keywords are enclosed by { } in separate lines.

  • Punctuations are enclosed by [ ] in separate lines

  • \L represents Lambda symbol.

  • The following symbols are used in regular definitions and regular expressions with the meaning discussed in class: - | + * ( )

  • Any reserved symbol needed to be used within the language, is preceded by an escape backslash character.


2. Parser Generator :

This phase of the assignment aims to practice techniques for building automatic parsergenerator tools.

Description :

  • The parser generator expects an LL (1) grammar as input. It should compute First and Follow sets and uses them to construct a predictive parsing table for the grammar.

  • The table is to be used to drive a predictive top-down parser. If the input grammar is not LL (1), an appropriate error message should be produced.

  • The generated parser is required to produce some representation of the leftmost derivation for a correct input. If an error is encountered, a panic-mode error recovery routine is to be called to print an error message and to resume parsing.

  • The parser generator is required to be tested using the given context free grammar of a small subset of Java. Of course, you have to modify the grammar to allow predictive parsing.

  • Combine the lexical analyzer generated in phase1 and parser such that the lexical analyzer is to be called by the parser to find the next token.

  • Automatically eliminating grammar left recursion and performing left factoring before generating the parser.

CFG Input File Format:

  • CFG input file is a text file.

  • Production rules are lines in the form LHS ::= RHS

  • Production rule can be expanded over many lines.

  • Terminal symbols are enclosed in single quotes.

  • \L represents Lambda symbol.

  • The symbol | is used in RHS of production rules with the meaning discussed in class.

  • Any reserved symbol needed to be used within the language, is preceded by an escape backslash character.


3.Java Byte Code Generation :

This phase of the assignment aims to practice techniques of constructing semantics rules to generate intermediate code.

Description :

  • Generated bytecode must follow Standard bytecode instructions defined in Java Virtual Machine Specification

http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html

http://en.wikipedia.org/wiki/Java_bytecode

Proposed grammars are required to cover the following features:

• Primitive types (int, float) with operations on them (+, - , *, / )

• Boolean Expressions (Bonus marks)

• Arithmetic Expressions

• Assignment statements

• If-else statements

• for loops (Bonus marks)

• while loops

About

A compiler with Lexical and Parser generators implemented with C++ and tested on Java language rules.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published