forked from ksundberg/CS5300
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCPSL.tex
390 lines (332 loc) · 15.9 KB
/
CPSL.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
\documentclass{book}
\title{Compiler Project Source Language}
\author{Kenneth Sundberg}
\date{\today}
\usepackage{hyperref}
\usepackage{syntax}
\begin{document}
\frontmatter
\maketitle
\cleardoublepage
\thispagestyle{empty}
\vspace*{\stretch{1}}
\begin{flushright}
\itshape
Dedicated to Dr. Steven J. Allan
\end{flushright}
\vspace{\stretch{3}}
\cleardoublepage
\chapter{Abstract}
CPSL (Compiler Project Source Language) is the programming language implemented in the Complier Construction course at Utah State University (CS 5300).
CPSL is a strongly typed, static scoped, block structured, high level language in the spirit of Pascal.
\chapter{Acknowledgement}
With permission, this work is largely derived from the CPSL language created for CS 5300 by Dr. Allan. The mistakes are mine.
\tableofcontents
\listoftables
\mainmatter
\chapter{Definitions}
The character set of CPSL is ASCII.
In the following \textit{letter} denotes any upper- or lower-case letter, and \textit{digit} denotes any of the ten decimal digits 0 through 9.
The syntax of CPSL is expressed in a EBNF-like format as follows:
\begin{grammar}
<Nonterminal> $\rightarrow$ <Nonterminal> "token" <token with state>
\alt <Optional>? <Repeated>*
\alt <empty>
\end{grammar}
Within this format non-terminal symbols will be formatted in mixed case and enclosed in angle brackets. Terminals without state, such as keywords, will be formatted in bold. Terminals with state, such as identifiers, will be formatted in lower case and enclosed in angle brackets. Symbol groups will be indicated with parenthesis. Optional symbols or symbol groups will be indicated with a question mark. Alternatives for a symbol or symbol group will be indicated with a vertical bar. The terminal \textit{empty} in a production indicates an empty production.
\chapter{Lexical Structure}
\section{Introduction}
The following is a detailed description of the lexical structure of CPSL.
Any lexeme not described herein is an error and should result in an appropriate diagnostic message.
\section{Keywords}
Keywords are strings with predefined meaning.
They can not be redefined by the user.
These keywords are either all capitals or all lower-case,
mixed-case variations are not keywords.
For example, BEGIN and begin are keywords while Begin is not.
\begin{table}[h!]
\begin{center}
\begin{tabular}{llllll}
array & begin & chr & const & do & downto \\
else & elseif & end & for & forward & function \\
if & of & ord & pred & procedure & read \\
record & repeat & return & stop & succ & then \\
to & type & until & var & while & write \\
\end{tabular}
\end{center}
\caption{Keywords of CPSL}
\end{table}
\section{Identifiers}
Identifiers in CPSL consist of a letter, followed by a sequence of zero or more letters, digits, or underscores.
Upper- and lower-case letters are considered \textbf{distinct}.
There is \textbf{no limit} on the length of identifiers in CPSL.
\section{Operators and Delimiters}
CPSL has a set of symbols used as operators and delimiters.
These characters are not a valid part of any keyword or identifier.
\begin{table}[h!]
\begin{center}
\begin{tabular}{cccccccc}
$+$ & $-$ & $*$ & $/$ & \& & $|$ & \textasciitilde & = \\
$<>$ & $<$ & $<=$ & $>$ & $>=$ & . & , & : \\
; & ( & ) & [ & ] & := & \% & \\
\end{tabular}
\end{center}
\caption{Operators and Delimiters of CPSL}
\end{table}
\section{Constants}
\subsection{Integer Constants}
Integer Constants in CPSL are of three forms, each denoting an integer in a different base.
\begin{itemize}
\item A sequence of digits beginning with a 0 is interpreted as an octal number
\item A 0x followed by a sequence of digits is interpreted as a hexadecimal number
\item Any other sequence of digits is interpreted as a decimal value
\end{itemize}
\subsection{Character Constants}
A character constant represents a \textit{single} character and is enclosed in a pair of single quotes. A character constant may not be blank, the lexeme consisting of a pair of single quotes is an error.
\subsection{String Constants}
A string constant represents a multi-character sequence and is enclosed in a pair of double quotes. String constants may not contain double quotes.
A string constant may be empty.
\subsection{Representing Characters in Character and String Constants}
The newline character (ASCII 10) may not appear between the single or double quotes of a character or string constant.
Any printable character (ASCII 32 to 126 inclusive) can be represented
as itself with the exception of \textbackslash.
Also such a constant can contain a \textbackslash followed by any printable character. Such a \textbackslash - escaped sequence is interpreted as the character after the \textbackslash with the following exceptions:
\begin{description}
\item[\textbackslash n] line feed
\item[\textbackslash r] carriage return
\item[\textbackslash b] backspace
\item[\textbackslash t] tab
\item[\textbackslash f] form feed
\end{description}
\subsection{Comments}
A comment in CPSL begins with a \$ and continues to the end of the line.
\subsection{Blanks, Tabs, Spaces, and New Lines}
Blanks, tabs, spaces, and new lines (white space) delimit other tokens but are otherwise ignored. This does not hold inside character and string constants where such characters are interpreted as themselves.
\chapter{Syntactic Structure}
The overall structure of a CPSL program is as follows:
\begin{grammar}
<Program> $\rightarrow$ <ConstantDecl>? <TypeDecl>? <VarDecl>? \\
( <ProcedureDecl> | <FunctionDecl> )* <Block> "."
\end{grammar}
\section{Declarations}
\subsection{Constant Declarations}
\begin{grammar}
<ConstantDecl> $\rightarrow$ "const" (<ident> "=" <Expression> ";")+
\end{grammar}
\subsection{Procedure and Function Declarations}
Procedure and Function Declarations have the following structure:
\begin{grammar}
<ProcedureDecl> $\rightarrow$
"procedure" <ident> "(" <FormalParameters> ")" ";" forward ";"
\alt "procedure" <ident> "(" <FormalParameters> ")" ";" "<Body>" ";"
<FunctionDecl> $\rightarrow$
"function" <ident> "(" <FormalParameters> ")" ":" <Type> ";" forward ";"
\alt "function" <ident> "(" <FormalParameters> ")" ":" <Type> ";" "<Body>" ";"
<FormalParameters> $\rightarrow$ <empty>
\alt "var"? <IdentList> ":" <Type> (";" "var"? <IdentList> ":" <Type>)*
<Body> $\rightarrow$ <ConstantDecl>? <TypeDecl>? <VarDecl>? <Block>
<Block> $\rightarrow$ "begin" <StatementSequence> "end"
\end{grammar}
Notice that procedure and function definitions \textit{cannot} be nested.
In other words, we have only a global environment and a local environment.
Procedures and functions can only be defined in the global environment.
In the local environment we can only define constants, types, and variables.
Notice also the provision for accommodating \textbf{forward} references to procedures and functions.
Notice also that both procedures and functions requires parentheses
in the definition even if there are no parameters.
This is also true for their invocations.
\subsection{Type Declarations}
In CPSL there are four predefined types (see figure \ref{predefinedident}), and two type constructors.
The type constructors are array and record.
\begin{grammar}
<TypeDecl> $\rightarrow$ "type" (<ident> "=" <Type> ";")+
<Type> $\rightarrow$ <SimpleType> \alt <RecordType> \alt <ArrayType>
<SimpleType> $\rightarrow$ <ident>
<RecordType> $\rightarrow$ "record" (<IdentList> ":" <Type> ";")* "end"
<ArrayType> $\rightarrow$ "array" "[" <Expression> ":" <Expression> "]" "of" <Type>
<IdentList> $\rightarrow$ <ident> ("," <ident>)*
\end{grammar}
\subsection{Variable Declarations}
\begin{grammar}
<VarDecl> $\rightarrow$ "var" (<IdentList> ":" <Type> ";")+
\end{grammar}
\section{CPSL Statements}
Statements in CPSL have the following syntax:
\begin{grammar}
<StatementSequence> $\rightarrow$ <Statement> (";" <Statement>)*
<Statement> $\rightarrow$ <Assignment>
\alt <IfStatement>
\alt <WhileStatement>
\alt <RepeatStatement>
\alt <ForStatement>
\alt <StopStatement>
\alt <ReturnStatement>
\alt <ReadStatement>
\alt <WriteStatement>
\alt <ProcedureCall>
\alt <NullStatement>
<Assignment> $\rightarrow$ <LValue> ":=" <Expression>
<IfStatement> $\rightarrow$ "if" <Expression> "then" <StatementSequence> ("elseif" <Expression> "then" <StatementSequence>)* ("else" <StatementSequence>)? "end"
<WhileStatement> $\rightarrow$ "while" <Expression> "do" <StatementSequence> "end"
<RepeatStatement> $\rightarrow$ "repeat" <StatementSequence> "until" <Expression>
<ForStatement> $\rightarrow$ "for" <ident> ":=" <Expression> ("to"|"downto") <Expression> "do" <StatementSequence> "end"
<StopStatement> $\rightarrow$ "stop"
<ReturnStatement> $\rightarrow$ "return" <Expression>?
<ReadStatement> $\rightarrow$ "read" "(" <LValue> ("," <LValue>)* ")"
<WriteStatement> $\rightarrow$ "write" "(" <Expression> ("," <Expression>)* ")"
<ProcedureCall> $\rightarrow$ <ident> "(" (<Expression> ("," <Expression>)*)? ")"
<NullStatement> $\rightarrow$ <empty>
\end{grammar}
\section{Expressions}
\begin{grammar}
<Expression> $\rightarrow$ <Expression> "|" <Expression>
\alt <Expression> "\& " <Expression>
\alt <Expression> "=" <Expression>
\alt <Expression> "<>" <Expression>
\alt <Expression> "<=" <Expression>
\alt <Expression> ">=" <Expression>
\alt <Expression> "<" <Expression>
\alt <Expression> ">" <Expression>
\alt <Expression> "+" <Expression>
\alt <Expression> "-" <Expression>
\alt <Expression> "*" <Expression>
\alt <Expression> "/" <Expression>
\alt <Expression> "\%" <Expression>
\alt "~" <Expression>
\alt "-" <Expression>
\alt "(" <Expression> ")"
\alt <ident> "(" (<Expression> ("," <Expression>)*)? ")"
\alt "chr" "(" <Expression> ")"
\alt "ord" "(" <Expression> ")"
\alt "pred" "(" <Expression> ")"
\alt "succ" "(" <Expression> ")"
\alt <LValue>
<LValue> $\rightarrow$ <ident> (( "." <ident>)|( "[" <Expression> "]"))*
\end{grammar}
To resolve ambiguities in this grammar the following suffices:
\begin{itemize}
\item Arithmetic and Boolean binary operators are left-associative
\item Relational operators are non-associative
\item Unary minus and Boolean not are right-associative
\item Operators have the following precedence (decreasing order)
\begin{itemize}
\item Unary $-$ (negation)
\item * / \%
\item $+$ $-$
\item $=$ $<>$ $<$ $<=$ $>$ $>=$
\item \textasciitilde
\item \&
\item $|$
\end{itemize}
\end{itemize}
\chapter{Semantic Structure}
\section{Constant Expressions}
Some of the grammar rules involving expressions are constrained to only use constant expressions.
Constant expressions are expressions whose values can be determined at compile time.
The declaration of constants and the bounds of an array type must be constant expressions.
\begin{grammar}
<ConstantDecl> $\rightarrow$ "const" (<ident> "=" <ConstExpression> ";")+
<ArrayType> $\rightarrow$ "array" "[" <ConstExpression> ":" <ConstExpression> "]" "of" <Type>
\end{grammar}
Constant expressions are a subset of expressions and have much of the same syntactic structure.
Note that the property of constant-ness is context sensitive and can not be determined from the grammar of CPSL alone.
\begin{grammar}
<ConstExpression> $\rightarrow$ <ConstExpression> "|" <ConstExpression>
\alt <ConstExpression> "\& " <ConstExpression>
\alt <ConstExpression> "=" <ConstExpression>
\alt <ConstExpression> "<>" <ConstExpression>
\alt <ConstExpression> "<=" <ConstExpression>
\alt <ConstExpression> ">=" <ConstExpression>
\alt <ConstExpression> "<" <ConstExpression>
\alt <ConstExpression> ">" <ConstExpression>
\alt <ConstExpression> "+" <ConstExpression>
\alt <ConstExpression> "-" <ConstExpression>
\alt <ConstExpression> "*" <ConstExpression>
\alt <ConstExpression> "/" <ConstExpression>
\alt <ConstExpression> "\%" <ConstExpression>
\alt "~" <ConstExpression>
\alt "-" <ConstExpression>
\alt "(" <ConstExpression> ")"
\alt <intconst>
\alt <charconst>
\alt <strconst>
\alt <ident>
\end{grammar}
\subsection{Expression Types}
The binary expression operators are not defined for strings and user defined types.
The result of a relational operator is always a boolean expression.
The logical operators and, or, and not are only defined for boolean expressions.
The arithemetic operators are only defined for integers.
Mixed type expressions are also not defined.
\section{Intrinsic Functions}
\subsection{chr}
This intrinsic changes the type of an expression from integer to character.
It is not defined for other types.
No code needs to be emitted.
\subsection{ord}
This intrinsic changes the type of an expression from character to integer.
It is not defined for other types.
No code needs to be emitted.
\subsection{pred}
This intrinsic decrements the value of an expression.
It is not defined for strings or user defined types.
For boolean expressions the predecessor of true is false, and the predecessor of false is true.
\subsection{succ}
This intrinsic increments the value of an expression.
It is not defined for strings or user defined types.
For boolean expressions the successor of true is false, and the successor of false is true.
\section{Predefined Identifiers}
CPSL has a small set of predefined identifiers.
Unlike keywords, the meaning of these identifiers may be altered by the user.
\begin{table}[h!]
\begin{center}
\begin{tabular}{ll|r}
Identifier & & Meaning \\
\hline
integer & INTEGER & Basic Integer type \\
char & CHAR & Basic Character type \\
boolean & BOOLEAN & Basic Boolean type \\
string & STRING & Basic String type \\
true & TRUE & Boolean constant \\
false & FALSE & Boolean constant \\
\end{tabular}
\end{center}
\caption{Predefined Identifiers in CPSL}
\label{predefinedident}
\end{table}
\section{Simple Statements}
\subsection{Stop}
The stop statement terminates execution of a CPSL program.
\subsection{Read}
The read statement calls the appropriate system calls to fill the given L-Values from user input. Only L-Values of integer or character type can be so filled.
\subsection{Write}
The write statement calls the appropriate system calls to output given expressions to the console.
This statement is not defined for user defined types.
Boolean values are written as if they were integer values.
\section{Control Statements}
\subsection{If}
The if statement evaluates a given expression and then branches either to the then statement sequence or to the next else if.
The else if is evaluated in the same fashion as an if.
If no if or else if expression is true then control passes to the else statement.
\subsection{While}
The while statement evaluates a given expression and if the expression is true processes the statement list.
After processing the statement list the expression is again evaluated.
\subsection{Repeat}
Processes a statement list then evaluates and expression. If the expression is true then the statement list and evaluation occur again.
\subsection{For}
Introduces a new variable which takes on each value in a range of values.
Either in an monotonically increasing or decreasing fashion depending on whether the keyword to or downto was used. For each value of the new variable a list of statements is executed.
\section{Function and Procedure Calls}
\subsection{Parameters}
Parameters in CPSL are always passed by value.
No function or procedure can have side effects, except for output or through global variables.
\subsection{Return}
The return statement passes control back to a function or procedure caller. If it is a function the expression returned becomes the expression of the function call in the callers scope.
\section{User Defined Types}
\subsection{Arrays}
Arrays are a contiguously allocated, homogeneously typed set of variables.
Though it is a logic error to reference an out of bounds array element, CPSL makes no guarantees about run-time or compile-time checking of this error.
Be aware that CPSL arrays are not zero based, rather they are given a low to high inclusive range.
\subsection{Records}
Records are a contiguously allocated set of variables that may be of heterogeneous types. Each variable in the set, called a field, has both a type and name associated with it. Field names exist in their own namespace and do not conflict with names in any other scope, or defined by any other record type.
\end{document}