-
Notifications
You must be signed in to change notification settings - Fork 13
1. IMPLEMENTATION (en)
The following document describes how the Tamgu code (탐구) was implemented. This document is intended for anyone wishing to discover the inner working principles of the interpreter. While reading this document may shed light on Tamgu's implementation choices, it is not mandatory to read it in order to use the language.
The interpreter of a computer language is generally divided into three parts:
- Building the syntactic tree from the code. This operation is also called "parsing".
- Compiling the code that transforms the syntactic tree into an executable structure
- Code execution
The construction of the syntactic tree is done using the class: bnf_tamgu whose code is stored in the files: codeparse.cxx and codeparse.h.
Warning: These two files were not written by hand but were automatically generated out of the tamgu file in the directory: bnf. The bnf directory also contains scripts in Python to regenerate new versions of codeparse if necessary.
The description of the BNF language used is recorded in the tamgu file itself.
This file contains the classes to save the syntactic tree: x_node.
A x_node contains its name (token), value (value), position in the initial text (start, end) and child nodes.
The x_nodes are produced by codeparse.
On the other hand, the input structure of codeparse is the result of a tokenization. The file contains the class x_rules and its derivations x_reading and x_wreading. The class x_rules allows the recording and use of tokenization rules.
These tokenization rules divide the input code into tokens while maintaining both their position in the file and their identity as defined by a rule such as:
rules.push_back("%d+(.%d+)(e([- +])%d+)=3");
This rule allows to recognize among other things a number with exponent. When such a token is identified, the code "3" is returned, which identifies the nature of this token as a number. We can then refer to this code in the BNF grammar to generate the corresponding correct node:
# Numbers as parsed by the tokenizer: "3" is the code yielded by the tokenizer.
^3 anumber := .
By default, these rules directly recognize strings between quotes or separators such as";",":".
# punctuation as parsed by the tokenizer
^0 punct := .
# String definitions as parsed by the tokenizer
^1 astringdouble : = .
^2 astringsimple : = .
^5 afullstring : = .
# Numbers as parsed by the tokenizer
^3 anumber := .
#token as parsed by the tokenizer
^4 word := = .
^4 typename := .
^4 astring := .
#regular expressions. In some cases, the code is a simple character.
^9 atreg := .
^a astreg : = .
^b apreg : = .
^c aspreg : = .
Once the syntactic tree is correctly constructed, it is passed to the compiler. The compiler is implemented as a class: TamguCode which contains everything you need to traverse the syntactic tree and build the internal representations.
The TamguGlobal class contains a special dictionary that associates a token name with a TamguCode compiling method: parseFunctions. The method RecordCompileFunctions records all these methods in this dictionary.
The main method when the syntactic tree is compiled is: Traverse.
This method includes two parameters: the current node in the syntactic tree and the object being built.
This method examines the current node and looks at whether its token is associated with a compiling method. If this is not the case, it examines the child nodes: A token may not be associated with a method.
When a compiling method is chosen, it will enrich the current object with the necessary information by analyzing the current node and its sub-nodes.
The Tamgu interpreter contains almost no global variables with one exception: globalTamgu of type TamguGlobal.
The TamguGlobal class allows you to centralize the creation of objects to perform their destruction. Tamgu does not include a Garbage Collector as such, but keeps track of the number of objects that the interpreter may need during execution. For example, all instructions are stored in the tracked atomic vector.
This class also keeps track of threads and manages their creation or deletion. TamguGlobal also manages the variable stack.
Each data structure is an independent class derived from TamguReference, recorded in their own file. Each object exposes at least the following methods:
- The following two methods are used to record methods associated with an object. In particular, InitializationModule is called at startup to register all the basic objects and their methods. The method AddMethod allows to associate a method name on the Tamgu side with a method of the corresponding class.
static void AddMethod(TamguGlobal* g, string name, objectMethod func, unsigned long arity, string infos);
static bool InitializationModule(TamguGlobal* g, string version);
- The following two methods are used to return or modify the value of an object
Tamgu* Eval(Tamgu* context, Tamgu* value, short idthread);
Tamgu* Put(Tamgu* context, Tamgu* value, short idthread);
- The following methods are used to return atomic values:
long Integer();
string String();
etc.
- The following methods are used to compare values with each other
Tamgu* less(Tamgu*)
Tamgu* lessequal(Tamgu*)
Tamgu* more(Tamgu*)
Tamgu* moreequal(Tamgu*)
Tamgu* same(Tamgu*)
Tamgu* different(Tamgu*)
- The following methods are used to perform operations with other values
Tamgu* minus(Tamgu*)
Tamgu* plus(Tamgu*)
Tamgu* multiply(Tamgu*)
Tamgu* divide(Tamgu*)
etc.
The template directory allows you to build your own objects and libraries. Just give the name of your object to the script and it automatically generates a directory in which it places the source and include files as well as the Makefiles. All you have to do is fill in these templates with your own code.