-
Notifications
You must be signed in to change notification settings - Fork 32
/
spec.emu
84 lines (76 loc) · 4.67 KB
/
spec.emu
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
<!doctype html>
<meta charset="utf8">
<link rel="stylesheet" href="./spec.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/8.4/styles/github.min.css">
<script src="./spec.js"></script>
<pre class="metadata">
title: RegExp.escape
stage: 3
contributors:
- Jordan Harband
- Kevin Gibbons
</pre>
<emu-clause id="sec-text-processing" number="22">
<h1>Text Processing</h1>
<emu-clause id="sec-regexp-regular-expression-objects" number="2">
<h1>RegExp (Regular Expression) Objects</h1>
<emu-clause id="sec-properties-of-the-regexp-constructor" number="5">
<h1>Properties of the RegExp Constructor</h1>
<ins>
<emu-clause id="sec-regexp.escape" number="1">
<h1>RegExp.escape ( _S_ )</h1>
<p>This function returns a copy of _S_ in which characters that are potentially special in a regular expression |Pattern| have been replaced by equivalent escape sequences.</p>
<p>It performs the following steps when called:</p>
<emu-alg>
1. If _S_ is not a String, throw a *TypeError* exception.
1. Let _escaped_ be the empty String.
1. Let _cpList_ be StringToCodePoints(_S_).
1. For each code point _c_ of _cpList_, do
1. If _escaped_ is the empty String and _c_ is matched by either |DecimalDigit| or |AsciiLetter|, then
1. NOTE: Escaping a leading digit ensures that output corresponds with pattern text which may be used after a `\0` character escape or a |DecimalEscape| such as `\1` and still match _S_ rather than be interpreted as an extension of the preceding escape sequence. Escaping a leading ASCII letter does the same for the context after `\c`.
1. Let _numericValue_ be the numeric value of _c_.
1. Let _hex_ be Number::toString(𝔽(_numericValue_), 16).
1. Assert: The length of _hex_ is 2.
1. Set _escaped_ to the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS), *"x"*, and _hex_.
1. Else,
1. Set _escaped_ to the string-concatenation of _escaped_ and EncodeForRegExpEscape(_c_).
1. Return _escaped_.
</emu-alg>
<emu-note>
<p>Despite having similar names, EscapeRegExpPattern and `RegExp.escape` do not perform similar actions. The former escapes a pattern for representation as a string, while this function escapes a string for representation inside a pattern.</p>
</emu-note>
<emu-clause id="sec-encodeforregexpescape" type="abstract operation">
<h1>
EncodeForRegExpEscape (
_c_: a code point,
): a String
</h1>
<dl class="header">
<dt>description</dt>
<dd>It returns a string representing a |Pattern| for matching _c_. If _c_ is white space or an ASCII punctuator, the returned value is an escape sequence. Otherwise, the returned value is a string representation of _c_ itself.</dd>
</dl>
<emu-alg>
1. If _c_ is matched by |SyntaxCharacter| or _c_ is U+002F (SOLIDUS), then
1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and UTF16EncodeCodePoint(_c_).
1. Else if _c_ is the code point listed in some cell of the “Code Point” column of <emu-xref href="#table-controlescape-code-point-values"></emu-xref>, then
1. Return the string-concatenation of 0x005C (REVERSE SOLIDUS) and the string in the “ControlEscape” column of the row whose “Code Point” column contains _c_.
1. Let _otherPunctuators_ be the string-concatenation of *",-=<>#&!%:;@~'`"* and the code unit 0x0022 (QUOTATION MARK).
1. Let _toEscape_ be StringToCodePoints(_otherPunctuators_).
1. If _toEscape_ contains _c_, _c_ is matched by either |WhiteSpace| or |LineTerminator|, or _c_ has the same numeric value as a leading surrogate or trailing surrogate, then
1. Let _cNum_ be the numeric value of _c_.
1. If _cNum_ ≤ 0xFF, then
1. Let _hex_ be Number::toString(𝔽(_cNum_), 16).
1. Return the string-concatenation of the code unit 0x005C (REVERSE SOLIDUS), *"x"*, and StringPad(_hex_, 2, *"0"*, ~start~).
1. Let _escaped_ be the empty String.
1. Let _codeUnits_ be UTF16EncodeCodePoint(_c_).
1. For each code unit _cu_ of _codeUnits_, do
1. Set _escaped_ to the string-concatenation of _escaped_ and UnicodeEscape(_cu_).
1. Return _escaped_.
1. Return UTF16EncodeCodePoint(_c_).
</emu-alg>
</emu-clause>
</emu-clause>
</ins>
</emu-clause>
</emu-clause>
</emu-clause>