-
Notifications
You must be signed in to change notification settings - Fork 0
/
designP1.txt
118 lines (99 loc) · 6.94 KB
/
designP1.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
Interpreter design:
Language:
I chose to use C++11 for this interpreter. Unfortunately, c++11 isn't supported by stdlinux, so
c++0x it is.
Main function:
The main function reads the input from std::cin up to a $, and returns it as a string. the string is then
tokenized and returned as a vector of vectors of tokens, and then each individual
vector of tokens corresponding to a separate lisp expression is converted to internal representation
as a single S-expression, and printed. Both tokenization and conversion can generate errors, so
the tokenization step and conversion/print step in the main contain error handling logic which
allows them to print the error and continue onto the next s-expression.
Classes:
The interpreter has two structs, token and SExp.
Token:
Token is an aggregate struct that contains two strings, tokenType, and tokenText.
tokenType is the type of token, which can be one of the following:
"whitespace", "lParen", "rParen", "int", "symbolId", "dot"
Notice, that "$", and "$$", are valid symbols. These are used by the tokenizer to
separate s-expressions, and are not needed for evaluation so I got rid of them.
tokenText is the text that generated that token. for instance, "1" will have a token
type of "int", and a tokenText of "1". This is so the token can know and write it's
generating string. Only symbolid and int actually use this.
SExp:
This class is very similar to the version presented in class. It has an int type,
int val, string name, and SExp* left and right. Type {1,2,3} denotes the type of the
expression, 1 for int, 2 for symbolId, and 3 for SExp. Then the appropriate variables
are used to hold the contents of that s-expression, be it an integer, string, or
two SExp*'s representing the car and cdr of the s-expression.
SExp also has an unordered_map which holds SExp*'s uniquely identified by symbolId names.
This allows us to hold known name and their SExp*s for use in the conversion function.
Reading input:
The interpreter first waits for input using std::cin. It reads input line by line separated by
newline characters until it sees a "$", at which point, it stops reading input returns the input string.
Tokenizer:
Once input is read and returned as a string, Each expression is then searched for identifiers of
tokens. The tokenizer uses a vector to record the string of tokens. The function iterates over the
string lisp expression and finds tokens as below:
'(', ')', '.':
If any of these are found, the appropriate token is added to the back of the tokenizedStr
vector with generic tokenText.
ints, strings, whitespace:
If a character is found to be a digit, a token of type int is created. The tokenizer
continues to iterate along the string until it sees a non-digit. It sets the tokenText
to the value of the entire int, from the first char of the int to the iterator's end
position, then moves on to the next token.
SymbolIds work similarly. If a character is found to be a letter, a procedure much like int
initiates. The only difference is that (as mentioned in class) symbolIds can contain alpha-
numeric characters, but must start with a letter. The iterator iterates along the string
until it finds a character that's not alphanumeric, then sets the tokenText to the entire
token's value.
Whitespace uses the same procedure, and uses the cctype function isspace to capture tabs,
linefeeds, spaces, and more in one function call.
The tokenizer is run on each string from the read input routine's vector, and pushes each token
vector onto a vector of tokenVectors called tokenizedExprs, which is fed to the routine
which converts the token vectors to the internal representation
Convert to internal representation:
When the conversion function is called, a new SExp is created. The function checks to see
whether the token is an int, symbolId, or SExpression. For an int or symbolId, a simple
SExp of tha type is created and returned. for symbolIds, the symbolname is checked against
all existing symbolNames. If the symbol name exists, the extant version is returned instead of
creating a new one. If it doesn't exist, the function creates a new one and returns that, adding
it also to the internal hash table in SExp which holds the symbolId's corresponding SExps.
If the token vector is an s-expression, an s-expression of type 3 is created. the conversion function
sends the token vector to findCarAndCdr, which returns the car and cdr of the s-expression, and each
of those are added to the new SExp's corresponding variables. If the cdr of the s-expression starts
with a whitespace character (which would have been trimmed off of the front/back of the s-exp before
evaluation in other circumstances), then we know that the s-expression is a list. under normal
circumstances, the SExp*'s right and left are just set by a recursive call to the conversion function,
but if it's a list, then the car is set by a recursive call, and the cdr is set by a different
function specifically for lists.
convert to List:
Convert to list takes a vector of tokens and converts it to list notation. it first checks to see
if the vector of strings starts with "atom, whitespace" as its first two characters. If so, the atom
is added to the right side of the s-expression returned by this function, and the right side is
created by a recursive call with the recent atom removed from the vector.
If the vector starts with an l-paren, then it is an s-expression. The function pulls the entire
s-expression into it's own vector, and evaluates it using the normal conversion function. If it's
a list, the conversion function will send it back to the list conversion function lower down the call
stack. If not, it will be evaluated normally. This allows for a mix of list and normal notation to be
used throughout the lisp expression.
Errors:
If anything ourside of the ordiary occurs in any of these functions, including tokenization,
an error can be thrown. Errors are handled by the main function. If an error occurs durring
tokenization, the string being tokenized is ignored, and the next string is tokenized.
If an error occurs durring conversion, the vector of tokens is ignored, and the next vector
is converted. Either way, an error is printed.
Print:
The print function is run after an SExp is created and returned by the conversion function. An
SExp* is read in, and it's type is checked. If it is of type 1 or 2, it's name is printed. If it's
of type 3, an lParen is prined, then e->left is sent recursively to the print function. After that
a . is printed, then e->right is sent recursively to the print function, and finally an rParen is
printed.
Utility functions:
Trim:
A function that accepts a vector of tokens and removes leading and trailing whitespace
isAtom:
A function that takes a token and returns whether or not it is atomic
print, for tokens:
Prints a vector of tokens, useful for debugging.