forked from SMLFamily/The-Definition-of-Standard-ML-Revised
-
Notifications
You must be signed in to change notification settings - Fork 0
/
whatisnew.tex
843 lines (779 loc) · 34.8 KB
/
whatisnew.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
%From [email protected] Tue Apr 9 18:01:55 1996
%Return-Path: [email protected]
%To: [email protected]
%Subject: Comments on Appendix F: What is new?
%Date: Tue, 09 Apr 1996 17:01:42 +0100
%From: Robin Milner <[email protected]>
%
%
% My additions/replacements start with <<<<<
% and end with >>>>>. Comment lines begin %%%%% .
% I think its good. -- Robin
%\documentstyle[11pt,fullpage]{article}
%\input{mac}
%\begin{document}
\section{Appendix: What is New?}
\label{whatisnew-app}
This appendix gives an overview of how the present Definition
differs from the 1990 Definition of Standard ML\cite{mth90}.
For the purpose of this appendix, we write SML '90 for the
language defined by the 1990 Definition and SML '97 for the
present language.
For each major change, we give its rationale and an overview
of its practical implications. Also, the index
% Disable index printing
%(page \pageref{index-sec} ff.)
may be used for locating changes.
\note{\thenewpreface}{This appendix is new!}
\subsection{Type Abbreviations in Signatures}
\label{typabbr.sec}
%\subsubsection{Rationale}
There are cases of type sharing which cannot be expressed
in SML '90 signatures although they arise in structures. For example,
there is no SML '90 signature which precisely describes the relationship
between {\tt s} and {\tt t} in
\begin{verbatim}
structure a =
struct
datatype s = C
type t = s * s
end
\end{verbatim}
In SML '97, one can write {\sl type abbreviations} in signatures, e.g.,
\begin{verbatim}
signature A =
sig
type s
type t = s * s
end
\end{verbatim}
The need for type abbreviations in signatures was clear when SML '90 was
defined. However, type abbreviations were not included since, in the
presence of both structure sharing and type abbreviations, principal
signatures do not exist\cite{mt91} -- and the SML '90 Definition
depended strongly upon the notion of principal signature.
Subsequently, Harper's and Lillibridge's work on translucent sums\cite{hl94}
and Leroy's work on modules\cite{leroy94} showed that, in the absence of
structure sharing and certain other features of the SML '90 signatures,
type abbreviations in signatures are possible.
Type abbreviations in signatures were implemented by
David MacQueen in {\em SML/NJ} 0.93 and by Xavier Leroy in
{\em Caml Special Light\/}\cite{Caml-Special-Light}.
%\subsubsection{Informal description}
In SML '97, structure sharing has been removed (see Section~\ref{nosharing.sec} below).
Type abbreviations are not included directly, but they arise as a derived form,
as follows. First, a new form of signature expression is allowed:
$$\wheretypesigexp$$
Here $\longtycon$ has to be specified by $\sigexp$. The type expression
$\ty$ may refer to type constructors which are
present in the basis in which the whole signature expression is elaborated, but not
to type constructors specified in $\sigexp$.
The effect of the {\tt where type} is, roughly speaking, to instantiate $\longtycon$
to $\ty$. For example, the following sequence of declarations is legal:
\begin{verbatim}
signature SIG1 = sig type t; val x: t end;
signature SIG2 = SIG1 where type t = int*int;
structure S1: SIG1 = struct type t = real; val x = 1.0 end;
structure S2: SIG2 = struct type t = int*int; val x = (5, 7) end;
\end{verbatim}
%The {\tt own} construct can be used for imposing
%additional sharing on a previously declared signature:
%\begin{verbatim}
% signature SIG = sig type t type u val x: t*u end;
% signature SIG1 = SIG where type u = own t*t;
%\end{verbatim}
Next, a type abbreviation is a derived form. For example,
\boxml{type u = t*t} is equivalent to
\boxml{include sig type u end where type u = t*t}~.
In SML '97 it is allowed to \hbox{\tt include} an
arbitrary signature expression, not just a signature identifier.
\subsection{Opaque Signature Matching}
%\subsubsection{Rationale}
In imposing a signature on a structure, one often wants
the types of the resulting structure to be ``abstract'' in order to hide
their implementation. (Signature matching in SML '90
hides components, but does not hide type sharing.)
MacQueen originally suggested and implemented an {\tt abstraction} declaration
for this purpose\cite{mq84}. In the Commentary\cite{mt91} it was
pointed out that the issue is the semantics of matching.
SML '97 provides two
kinds of matching, as new forms of structure
expression:
$$\strexp\boxml{ : }\sigexp$$
$$\strexp\boxml{ :> }\sigexp$$
The first (\boxml{:}) is the SML '90 signature matching;
the second (\boxml{:>}) is {\sl opaque matching}.
Opaque matching can be applied to the result structure of a functor;
thus it is more general than MacQueen's {\tt abstraction} declaration.
In CAML Special Light, all signature matching is opaque.
%\subsubsection{Informal description}
With opaque matching, types in the resulting
structure will be abstract, to precisely the degree expressed
in $\sigexp$. Thus
\begin{verbatim}
signature Sig =
sig
type t = int
val x: t
type u
val y: u
end;
structure S1:> Sig =
struct type t = int
val x = 3
type u = real
val y = 3.0
end
val r = S1.x + 1
\end{verbatim}
is legal, but a subsequent declaration \boxml{val s = S1.y + 1.5}
will fail to elaborate. Similarly, consider the functor declaration:
\begin{verbatim}
functor Dict(type t; val leq: t*t->int):>
sig type u = t*t
type 'a dict
end =
struct
type u = t*t
type 'a dict = (t * 'a) list
end
\end{verbatim}
When applied, {\tt Dict} will propagate
the identity of the type
{\tt t} from argument to result, but it will produce a fresh
{\tt dict} type upon each application.
Types which are specified as ``abstract'' in a opaque functor
result signature give rise to generation of fresh type names
upon each application of the functor, even if the functor body
is a constant structure. For example, after the elaboration of
\begin{verbatim}
structure A = struct type t = int end
functor f():> sig type t end = A
structure B = f()
and C = f();
\end{verbatim}
the two types {\tt B.t} and {\tt C.t} are different.
\subsection{Sharing}
\label{nosharing.sec}
%\subsection{Rationale}
Structure sharing is a
key idea in MacQueen's original Modules design\cite{mq84}.
The theoretical aspects of structure sharing have been the
subject of considerable research
attention\cite{hmt87,tofte88,aponte93,tofte94,MacTo94}.
However, judging from experience, structure sharing
is not often used in its full generality, namely to ensure identity of
values. Furthermore, experience from teaching suggests that
the structure sharing concept is somewhat hard to grasp. Finally, the semantic
accounts of structure sharing that have been proposed are rather
complicated.
%\subsection{Informal Description of change}
The static semantics of SML '97
has no notion of structure sharing.
However, SML '97 does provide a weaker form of structure sharing
constraints, in which structure sharing is
regarded as a derived form, equivalent to a collection of type sharing
constraints.
\subsubsection{Type Sharing}
In SML '90, a type sharing constraint
\boxml{sharing type $\longtycon_1$ = $\cdots$ = $\longtycon_n$}
was an admissible form of specification. In SML '97 such a constraint
does not stand by itself as a specification, but may be used to qualify
a specification. Thus there is a new form of specification, which we
shall call a {\sl qualified} specification:
\[
\newsharingspec
\]
Here the long type constructors have to be specified by $\spec$.
The type constructors may have been specified by {\tt type}, {\tt eqtype}
or {\tt datatype} specifications, or indirectly through signature
identifiers and {\tt include}. In order for the specification to be
legal, all the type constructors must denote flexible type names. More precisely,
let $B$ be the basis in which the qualified specification
is elaborated.
Let us say that a type name $\t$ is {\sl rigid (in $B$)} if $\t\in\of{T}{B}$
and that $\t$ is {\sl flexible (in $B$)} otherwise.
For example {\tt int} is rigid in the initial basis and every datatype
declaration introduces additional rigid type names into the basis. For
the qualified specification
to elaborate in basis $B$,
it is required that each $\longtycon_i$ denotes a type name
which is flexible in $B$. In particular, no $\longtycon_i$
may denote a type function
which is not also
a type name (e.g., a $\longtycon$
must not denote $\Lambda().\boxml{s}*\boxml{s}$).
For example, the two signature expressions
\begin{verbatim}
sig sig
type s type s
type t datatype t = C
sharing type s = t sharing type s = t
end end
\end{verbatim}
are both legal. By contrast, the signature expressions
\begin{verbatim}
sig sig
type s type s = int
type t = s*s datatype t = C
sharing type s = t sharing type s = t
end end
\end{verbatim}
are both illegal.
\bigskip
\subsubsection{The equality attribute of specified types}
If $\newsharingspec$ elaborates successfully, then all $n$ type constructors
will thereafter denote the same type name.
This type name will admit equality, if $\spec$ associates an equality
type name with one of the type constructors.
Thus
\begin{verbatim}
eqtype t
type u
sharing type t = u
\end{verbatim}
is legal and both {\tt t} and {\tt u} are equality types after the sharing
qualification.
The mechanism for inferring equality attributes for datatype specifications
is the same as for inferring equality attributes for datatype declarations.
Thus the specification
\begin{verbatim}
datatype answer = YES | NO
datatype 'a option = Some of 'a | None
\end{verbatim}
specifies two equality types. Every specification
of the form $\datatypespec$
introduces one type name for each type constructor described
by $\datdesc$. The equality attribute of such a type name is
determined at the point where the specification occurs.
Thus, in
\begin{verbatim}
type s
datatype t = C of s
\end{verbatim}
the type name associated with {\tt t}
will not admit equality, even if {\tt s} later is
instantiated to an equality type. Type names associated with datatype
specifications can be instantiated to other type names
by subsequent {\tt type sharing} or {\tt where type}
qualifications. In this case, no effort is made to
ban type environments that do not respect equality. For example,
\begin{verbatim}
sig
eqtype s
datatype t = C of int -> int
sharing type s = t
end
\end{verbatim}
is legal in SML '97, even though it cannot be matched by any real structure.
\subsubsection{Structure Sharing}
\label{strsharing.sec}
For convenience, structure sharing constraints are provided,
but only as a shorthand for type sharing constraints.
There is a derived form of specification
\[
\boxml{$\spec$ sharing $\longstrid_1$ = $\cdots$ = $\longstrid_k$}\qquad(k\geq 2)
\]
Here $\spec$ must specify $\longstrid_1,\ldots,\longstrid_k$.
The equivalent form consists of $\spec$ qualified by all
the type sharing constraints
\[\boxml{sharing type $\longstrid_i.\longtycon$ =
$\longstrid_j.\longtycon$}\]
($1\leq i< j\leq k$) such that both
$\longstrid_i.\longtycon$ and $\longstrid_j.\longtycon$ are specified
by $\spec$.
In SML '90, structure sharing constraints are transitive, but
in SML '97 they are not. For example,
\begin{verbatim}
structure A: sig type t end
structure B: sig end
structure C: sig type t end
sharing A=B=C
\end{verbatim}
induces type sharing on {\tt t}, whereas
\begin{verbatim}
structure A: sig type t end
structure B: sig end
structure C: sig type t end
sharing A=B sharing B=C
\end{verbatim}
induces no type sharing. Thus a structure sharing constraint
in some cases induces less sharing in SML '97 than in SML '90.
Next, SML '97 does not allow structure
sharing equations which refer to ``external'' structures.
For example, the
program
\begin{verbatim}
structure A= struct end;
signature SIG = sig structure B : sig end
sharing A = B
end;
\end{verbatim}
is not legal in SML '97, because the sharing constraint now only
qualifies the specification \boxml{structure B: sig end}, which does
not specify $\boxml{A}$. Thus not all legal SML '90 signatures are
legal in SML '97.
The removal of structure sharing has a dramatic simplifying
effect on the semantics. Most importantly,
the elaboration rules can be made monogenic (i.e., ``deterministic''),
up to renaming of new type names.
The need for the notion of principal signature (and even
equality-principal signature) disappears. The notions of structure name, structure consistency
and well-formed signature are no longer required. The notion of cover
can be deleted. Only one kind of realisation, namely type realisation,
remains. The notion of type-explication has been
removed, since it can be proved that signatures automatically are type-explicit
in the revised language.
\subsection{Value Polymorphism}
%\subsection{Rationale}
The imperative types of SML '90 were somewhat subtle,
and they propagated into signatures
in an unpleasant way.
Experiments on existing code suggest
that the power of imperative types is rarely used fully and that
{\sl value polymorphism}, which can in fact be seen as a restriction of
the imperative type discipline,
usually suffices\cite{wright95}.
%\subsection{Informal Description}
With value polymorphism, there is only one kind of type variable. The definition of
non-expansive expressions (see~\ref{nonexpnew.sec} below) is relaxed to admit more expressions.
In a declaration
$$\boxml{val $x$ = $\exp$}$$
the variable $x$ will only be given a non-trivial polymorphic type scheme
(i.e., a type scheme which is not also a type) if $\exp$ is non-expansive.
This applies even if there is no application of {\tt ref} in the entire
program.
Example: in the declaration {\tt val x = [] @ []}, x can be assigned
type $\atyvar\;\LIST$, but not the type scheme
$\forall \atyvar. \atyvar\;\LIST$ (since
{\tt [] @ []} is an expansive expression). Consequently, {\tt (1::x, true::x)}
will not elaborate in the scope of the declaration. Also, if the
declaration appears at top level, the compiler may refuse elaboration
due to a top-level free type variable (see~\ref{prinenv}).
Thus the top-level phrase {\tt [] @ []} may fail, since it abbreviates
{\tt val it = [] @ []}. But of course it will not fail if a monotype
is explicitly ascribed, e.g. {\tt [] @ []:int list}.
On the other hand, in {\tt fun f() = [] @ []} (or {\tt val f = fn () => [] @ []}),
{\tt f} can be assigned type scheme $\forall\atyvar.\UNIT\to\atyvar\;\LIST$ so that,
for example, $\boxml{(1::f(),true::f())}$ elaborates.
This transformation ($\eta$-conversion) often
gives the desired polymorphism. But beware that $\eta$-conversion
can change the meaning of the program, if $exp$ does not terminate
or has side-effects.
\subsection{Identifier Status}
%
%\subsubsection{Rationale}
The 1990 Definition treated identifier status informally (in Section~2.4);
a fuller definition was
given in the Commentary\cite[Appendix B]{mt91}. However, some
problems with the handling of exception constructors remained\cite[Sect.~10.3]{kahrs93}.
%\subsubsection{Description of change}
In the present document, we have collapsed
the three identifier classes $\Var$, $\Exn$ and $\Con$ into a single
class, $\VId$, of {\sl value identifiers}.
The semantic objects $\VE$ previously called variable environments are
replaced by {\sl value environments}.
%The variable environment $\VE$
%has changed name to {\sl value environment}.
A value environment maps
value identifiers to pairs of the form $(o,\is)$, where $o$ is some semantic
object and $\is$ is an {\sl identifier status} ($\is\in\{\isv,\isc,\ise\}$)
indicating whether the identifier should be regarded as a value variable
($\isv$), a value constructor ($\isc$) or an exception constructor ($\ise$).
These changes have been carried out both in the static and in the dynamic
semantics, for both Core and Modules. Thus the assignment of identifier
status is incorporated formally in the present Definition.
The definition of enrichment has been modified to allow an
identifier that has been specified as a value to be matched by a
value constructor or an exception constructor.
However, a
specification of a value or exception constructor must be matched by
a value or exception constructor, respectively.
Thus, the status descriptor says more than just what the lexical
status of the identifier is --- it is a statement about the
value in the corresponding dynamic environment: if the
status of $\id$ in the static environment is $\isc$, then the value
in a matching dynamic environment must be a value
constructor. Similarly, if the status of $\id$ in the static environment
is $\ise$, then the value
in a matching dynamic environment must be an exception name.
If the status of $\id$ is just $\isv$, however, the corresponding
value in the dynamic environment can be any kind of value (of the appropriate
type), including
a value constructor and an exception name.
The exception environment ($\EE$) has been deleted from the semantics,
since it is no longer required for the definition of enrichment.
Also, the constructor environment $\CE$ in the static semantics has
been replaced by a value environment in which every identifier
has status $\isc$.
The new handling of identifier status admits some {\tt val rec} declarations
that were illegal in SML~'90 (see the comment to Rule~\ref{recvalbind-rule}).
%
\subsection{Replication of Datatypes}
SML~'97 allows {\sl datatype replication}, i.e. declarations
and specifications of the form
$$\datatyperepldec$$
When elaborated, this binds type constructor $\tycon$ to the
entire type structure (value constructors included)
to which $\longtycon$ is bound in the context. Datatype replication
does not generate a new datatype: the original and the
replicated datatype share.
Here is an example of a use of the new construct:
%\pagebreak
\begin{verbatim}
signature MYBOOL =
sig
type bool
val xor: bool * bool -> bool
end;
structure MyBool: MYBOOL =
struct
datatype bool = datatype bool (* from the initial basis *)
fun xor(true, false) = true
| xor(false, true) = true
| xor _ = false
end;
val x = MyBool.xor(true, false);
\end{verbatim}
Here {\tt MyBool.xor(true, false)} evaluates to {\tt true}. Note the
use of transparent signature matching; had opaque matching been used
instead, the declaration of {\tt x} would not have elaborated.
A datatype replication implicitly introduces the value
constructors of $\longtycon$ into the current scope.
This is significant for signature matching. For example, the
following program is legal: %\pagebreak
\begin{verbatim}
datatype t0 = C;
structure A : sig type t val C: t end =
struct
datatype t = datatype t0
end;
\end{verbatim}
Note that {\tt C} is specified as a value in the signature;
the datatype replication copies the value environment of {\tt t0}
into the structure and that is why the structure contains the
required {\tt C} value.
To make it possible for datatype replication to copy value environments
associated with type constructors,
the dynamic semantics has been modified so that environments now contain
a $\TE$ component (see Figure~\ref{comp-dyn-obj}, page~\pageref{comp-dyn-obj}).
Further, in the dynamic semantics of modules, the $\downarrow$ operation,
which is used for cutting down structures when they are matched against
signatures, has been extended to cover the $\TE$ component (see page~\pageref{downarrowdef}). In the above example, the value environment assigned to
{\tt A.t} will be empty, signifying that the type has no value constructors.
Had the signature instead been
$$\boxml{sig datatype t val C: t end}$$
then the signature matching would have assigned {\tt A.t} a value environment
with domain $\{\boxml{C}\}$, indicating that {\tt A.t} has value constructor
{\tt C}.
When the datatype replication is used as a specification, $\longtycon$
can refer to a datatype which has been introduced either by
declaration or by specification. Here is an example of the former:
%\pagebreak
\begin{verbatim}
datatype t = C | D;
signature SIG =
sig
datatype t = datatype t (* replication is not recursive! *)
val f: t -> t
end
\end{verbatim}
%Thus the signature declaration below is illegal
%\begin{verbatim}
% datatype t = C | D;
% signature SIG =
% sig
% datatype t = datatype t
% val C: t
% end
%\end{verbatim}
%for {\tt C} is specified twice as a value identifier
%(once by {\tt datatype t = datatype t} and once by {\tt val C: t}).
\subsection{Local Datatypes}
%\subsubsection{Rationale}
This change is concerned with
expressions of the form
\boxml{let $\dec$ in $\exp$ end}
in which $\dec$ contains a {\tt datatype} declaration.
Let us refer to such a datatype declaration as a {\sl local} datatype
declaration.
There are two reasons why changes to the handling of local datatype
declarations
are necessary.
The first is that the rule given for elaboration of
{\tt let}-expressions in the 1990 Definition
is unsound\cite{kahrs93}; the problem has to do with the ability to export
type names of locally declared datatypes out of scope.
The second is that the static semantics relies on the following invariant
about all contexts, $C$, which arise in elaboration from the
initial basis:
\[
\TyNamesFcn \C\subseteq\of{T}{\C}
\]
This invariant is used, for example, in the rule for elaborating
datatype declarations, where type names are picked ``fresh'' with
respect to $\of{T}{C}$. As pointed out by Kahrs, the second premise
of rule 16 in the 1990 Definition violates the above invariant.
%\subsubsection{Discussion of Change}
To solve the first problem,
the rule for elaborating {\tt let}-expressions (rule \ref{let-rule} in the present
document) has been
provided with a side-condition which prevents the type of $\exp$ from
containing type names generated
by $\dec$. For example,
\begin{verbatim}
let datatype t = C in C end
\end{verbatim}
was legal SML '90 but is not legal SML '97.
To solve the second problem, a side-condition has been added in the rule
for matches and the rule for {\tt val rec} (rules~\ref{mrule-rule}
and \ref{recvalbind-rule} of the present document).
As a consequence, again fewer programs elaborate.
For example, the expression
\begin{verbatim}
fn x => let datatype t = C
val _ = if true then x else C
in 5
end
\end{verbatim}
is not legal SML '97, although it was legal SML '90.
%We remark that another possible change to the same effect would have been to change
%$\C+\VE$ in rules 16 and 27 to $\C\oplus\VE$; however, this change
%would (wrongly) suggest that the elaboration of a pattern may generate
%new datatypes.
%
% The above changes have consequences for the preservation of elaboration
% under substitution. For example, letting $\C_0$ be the initial
% static context $\of{C}{B_0}$ one will be able to infer
% $$\C_0\ts \boxml{let in fn x=> x end}: \boxml{'a}\to\boxml{'a}$$
% but not
% $$\C_0\ts \boxml{let in fn x => x end}: \boxml{t}\to\boxml{t}$$
% for all type names $\boxml{t}$, since $\boxml{t}$ need not
% be a member of $\of{\T}{\C_0}$.
%\subsubsection{Remarks about implementation}
%One possibility is to use the following
%scheme.
%Every type variable has an attribute, the {\it {\tt let}-level}, which
%is a reference to $\boxml{nat}^\infty$, where $\boxml{nat}^\infty$
%denotes the set of natural numbers extended with infinity.
%Moreover, every type name has a (non-updatable) {\tt let}-level.
%Elaboration takes
%place in a {\sl current}
%{\tt let}-level. The current {\tt let}-level of an occurrence
%of a phrase $\phrase$ in a program
%$\program$ is the number of unmatched {\tt let}
%that occur between the beginning of $\program $ and the beginning of $\phrase$.
%In instantiating type schemes, fresh type variables are given the current
%{\tt let}-level. When choosing a fresh type name for a datatype, the
%type name is given the current {\tt let}-level of the datatype declaration.
%During unification, in the case where one wants to make a
%substitution instance of a type variable
%$\alpha$, which has level $n$, say, to a type $\tau$, it is checked that
%$\tau$ does not contain
%a type name which has a higher {\tt let}-level than $n$ and
%those type variables in $\tau$ that have {\tt let}-levels greater than $n$
%have their {\tt let}-levels lowered to $n$.
%
%
\subsection{Principal Environments}
\label{prinenv}
In SML '90, the elaboration rule which allows any $\dec$ to
appear as a $\strdec$ is
$$
\frac{ \of{\C}{\B}\ts\dec\ra\E
\quad\E\ {\rm principal\ for\ \dec\ in\ } (\of{\C}{\B})
%version 2: \quad\imptyvars\E=\emptyset
}
{ \B\ts\dec\ra\E }
$$
The side-condition forces the type scheme in $E$ to be as general
as possible. However, this side-condition would be undesirably
restrictive in SML '97, since the new definition of the $\cl{}{}$
operation admits less polymorphism than the one used in SML '90.
For example, neither
\begin{verbatim}
val f = (fn x => x)(fn x => x)
structure A = struct end
val y = f 7
\end{verbatim}
(where the presence of the $\STRUCTURE$ declaration forces each $\VAL$
declaration to be parsed as a $\strdec$),
nor
\begin{verbatim}
structure A: sig val f: int -> int end =
struct
val f = (fn x => x)(fn x => x)
end
\end{verbatim}
would be legal in SML '97, if the side-condition were enforced.
(A type-checker may at first infer the type $\boxml{'a}\to\boxml{'a}$ from
the declaration of {\tt f}, but since {\tt (fn x => x)(fn x => x)}
is expansive, the generalisation to $\forall\boxml{'a}.\boxml{'a}\to\boxml{'a}$
is not allowed.) By dropping the side-condition, it becomes possible
to have the textual context of a structure-level declaration constrain
free type variables to monotypes. Thus both the above examples can
be elaborated.
Rather than lifting the notion of principal environments to the
modules level, we have chosen to drop the requirement of principality.
Since the notion of principal environments is no longer used in the
rules, even the definition of principal environments has been removed.
In practice, however, type checkers still have to infer types that
are as general as possible, since implementations should not reject
programs for which successful elaboration is possible.
In order to avoid reporting free type variables to users,
rules~\ref{strdectopdec-rule} and \ref{fundectopdec-rule}
require that the environment
to which a $\topdec$ elaborates must not contain free type
variables. It is possible to satisfy this side-condition
by replacing such type variables by arbitrary
monotypes; however, implementers may instead choose to
refuse elaboration in such situations.
\subsection{Consistency and Admissibility}
The primary purpose of consistency in SML '90 was to allow
a very simple elaboration rule for structure sharing. A secondary
purpose was to ban any signature which, because it specifies a datatype in
inconsistent ways (e.g. with different constructors), can never
be matched.
With the removal of structure sharing,
the primary purpose of consistency has gone away. In our experience,
the secondary purpose has turned out not to be very significant
in practice. Textual copying of datatype specifications in different signatures
is best avoided, since changes in the datatype will have to be
done several places. In practice, it is better to specify a datatype in
one signature and then access it elsewhere using structure specifications
or {\tt include}. In SML '90 one could specify sharing between a
datatype specification and an external (i.e., declared) datatype, and
a consistency check was useful in this case.
But in SML '97
this form of sharing is not allowed, so there remains no strong
reason for preserving consistency; therefore it has been dropped.
In SML '90, admissibility was imposed partly to ensure the existence
of principal signatures (which are no longer needed) and partly
to ban certain unmatchable signatures. In SML '90, admissibility was
the conjunction of well-formedness, cycle-freedom and consistency.
Cycle-freedom is no longer relevant, since there is no structure sharing.
We have already discussed consistency. Well-formedness of signatures
is no longer relevant, but the notion of well-formed type structures
is still relevant. It turns out that well-formedness only needs to be
checked in one place (in rule~\ref{wheretype-rule}). Otherwise,
well-formedness is preserved by the rules (in a sense which can be
made precise). Thus one can avoid a global well-formedness requirement
and dispense with admissibility. This we have done.
\subsection{Special Constants}
The class of special constants has been extended with {\tt word}
and {\tt char} constants
and with hexadecimal notation. Also, there are
additional escape sequences in strings and support for UNICODE
characters. See Section~\ref{cr:speccon}.
%
\subsection{Comments}
A clarification concerning unmatched comment brackets was presented
in the Commentary; subsequently, Stefan Kahrs discovered a problem with
demanding that an unmatched \boxml{*)} be reported by the compiler.
In SML '97, we therefore simply demand that an unmatched
\boxml{(*} must be reported by the compiler.
\subsection{Infixed Operators}
The rules for associativity of infix operators at the same level
of precedence have been modified, to avoid confusion between right- and left-associative
operators with the same binding precedence (see Section~\ref{infop.sec}).
%As a consequence,
%\boxml{x :: l1 @ l2} now parses as \boxml{(x :: l1) @ l2}.
%The remarks in Section 3.3 about a possible liberalisation of
%infix directives have been removed.
\subsection{Non-expansive Expressions}
\label{nonexpnew.sec}
The class of non-expansive expressions (Section~\ref{expansive-sec}) has been extended,
to compensate for the loss of polymorphism which value polymorphism
entails.
\subsection{Rebinding of built-in identifiers}
In SML '97, no $\datbind$, $\valbind$ or $\exnbind$ may bind $\TRUE$, $\FALSE$,
$\boxml{nil}$, $\boxml{::}$ or $\boxml{ref}$ and no
$\datbind$ or $\exnbind$ may bind {\tt it} (Section~\ref{synres.sec}).
Similarly, no $\datdesc$, $\valdesc$ or $\exndesc$
may describe
$\TRUE$, $\FALSE$,
$\boxml{nil}$, $\boxml{::}$ or $\boxml{ref}$ and no
$\datdesc$ or $\exndesc$ may describe {\tt it}
(Section~\ref{synresmod.sec}).
These changes are made in order to fix the meaning of derived forms
and to avoid ambiguity in the handling of {\tt ref} in
the dynamic semantics of the Core.
\subsection{Grammar for Modules}
There are several new derived forms for modules, see Appendix A
(Figures 18 and 19).
%A note has been inserted in Figure 6, in order to resolve semantically
%significant ambiguities concerning {\tt local} and sequential declarations.
The grammar for $\topdec$ has been modified, so that there is no
longer any need to put semicolons at the end of signature and functor
declarations.
Empty and sequential signature and functor declarations have been removed, as they no
longer serve any purpose.
SML '97 has neither functor signature expressions nor
functor specifications, since they could not occur in programs
and did not gain wide acceptance.
%Thus Section 5.15 and parts of Section 5.9 (Signature Instantiation)
%and 5.12 (Signature Matching) have been removed.
%The syntactic restrictions (Section 3.5)
%have been revised, to reflect other changes.
\subsection{Closure Restrictions}
Section 3.6 of the 1990 Definition has been deleted.
%, since the closure restrictions
%are not necessary and implementors never exploited them.
\subsection{Specifications}
{\tt open} and {\tt local} specifications have been criticised on
the grounds of programming method\-ology\cite{appel93}. Also, they are no longer
needed for defining the derived forms for functors and they conflict
with a desire to have all signatures be type-explicit.
SML '97 therefore admits neither {\tt open} nor {\tt local} in
specifications. Moreover, sequential specifications must not
specify the same identifier twice.
As a consequence, the definition of type-explication
has been removed: type-explication is automatically
preserved by elaboration (if one starts in the initial basis) so
there is no need to impose type-explicitness explicitly.
\subsection{Scope of Explicit Type Variables}
A binding construct for explicit type variables has been
introduced at {\tt val} and {\tt fun}
(see Figure~\ref{dec-gram}).
For example, one can declare the polymorphic
identity function by
$$\boxml{fun 'a id(x:'a) = x}$$
There is no requirement that all explicit type variables be bound by
this binding construct. For those that are not, the scope rules of
the 1990 Definition apply. The explicit binding construct
has no impact on the dynamic semantics. In particular,
there are no explicit type abstractions or applications in
the dynamic semantics.
\subsection{The Initial Basis}
To achieve a clean interface to the new Standard ML Basis Library\cite{mllib96},
the initial basis (Appendices C and D) has been cut down to a bare
minimum. The present Definition only provides what is necessary in order
to define the derived forms
and special constants of type {\tt int}, {\tt real}, {\tt word},
{\tt char} and {\tt string}. The following identifiers are no longer
defined in the initial basis:
\boxml{<>}, \verb+^+, \verb+!+, \boxml{@}, \boxml{Abs},
\boxml{arctan}, \boxml{chr}, \boxml{Chr}, \boxml{close\_in},
\boxml{close\_out}, \boxml{cos}, \boxml{Diff}, \boxml{Div},
\boxml{end\_of\_stream}, \boxml{exp}, \boxml{Exp},
\boxml{explode},
\boxml{floor}, \boxml{Floor},
\boxml{implode}, \boxml{input}, \boxml{instream}, \boxml{Interrupt}, \boxml{Io},
\boxml{ln}, \boxml{Ln}, \boxml{lookahead},
\boxml{map}, \boxml{Mod},
\boxml{Neg}, \boxml{not},
\boxml{real} (the coercion function), \boxml{rev},
\boxml{sin}, \boxml{size}, \boxml{sqrt}, \boxml{Sqrt}, \boxml{std\_in}, \boxml{std\_out}, \boxml{Sum},
\boxml{output}, \boxml{outstream},
\boxml{Prod}, \boxml{Quot}.
%Poor basis; all is gone
The corresponding basic values have also been deleted.
\subsection{Overloading}
The Standard ML Basis Library\cite{mllib96}
rests on an overloading scheme for special constants and
pre-defined identifiers. We have adopted this scheme
(see Appendix~\ref{overload.sec}).
\subsection{Reals}
{\tt real} is no longer an equality type and real constants are no longer allowed
in patterns. The Basis Library provides IEEE equality operations on reals.
%\end{document}