@@ -102,18 +102,86 @@ the processor does not support DTD validation.</error></para>
102
102
document is an XPath data model document consisting of a single text node.)</para >
103
103
104
104
<para ><error code =" D0060" >It is a <glossterm >dynamic error</glossterm > if the
105
- <option >content-type</option > specifies an encoding, which is not supported
106
- by the processor.</error ></para >
105
+ <option >content-type</option > specifies a charset (sometimes called the
106
+ character encoding) that is not supported by the processor.</error ></para >
107
107
108
108
<para ><impl >Text parameters are <glossterm >implementation-defined</glossterm >.
109
109
</impl ></para >
110
110
111
+ <section xml : id =" text-bom" >
112
+ <title >Byte order marks</title >
113
+
114
+ <para >UTF-8 and UTF-16 inputs can begin with a byte order mark. The byte order
115
+ mark is not considered part of the text and is not included in the text
116
+ document.</para >
117
+
118
+ <para >In order to identify the byte order mark, it is first necessary to
119
+ identify the charset that is being
120
+ used. The charset of an external resource is determined as follows:</para >
121
+
122
+ <orderedlist >
123
+ <listitem >
124
+ <para >external charset information is used if available (for example, if the
125
+ resource is loaded with HTTP or HTTPS and the server provided a charset), otherwise
126
+ </para >
127
+ </listitem >
128
+ <listitem >
129
+ <para >the charset from the content type if specified, otherwise
130
+ </para >
131
+ </listitem >
132
+ <listitem >
133
+ <para >the processor may use implementation-defined heuristics to determine the
134
+ likely charset, otherwise
135
+ </para >
136
+ </listitem >
137
+ <listitem >
138
+ <para >UTF-8 is assumed.</para >
139
+ </listitem >
140
+ </orderedlist >
141
+
142
+ <para >Processors <rfc2119 >must</rfc2119 > support UTF-8 and UTF-16. <impl >Support
143
+ for other charsets is <glossterm >implementation-defined</glossterm >.</impl ></para >
144
+
145
+ <para >If the encoding is UTF-8, UTF-16, UTF-16LE, or UTF-16BE, a byte order mark
146
+ may be present. (For any other encoding, there is no byte order mark and all of the text
147
+ is returned.)</para >
148
+
149
+ <itemizedlist >
150
+ <listitem >
151
+ <para >If the encoding is UTF-8 and the document begins with the bytes
152
+ EF BB BF, those bytes are the byte order mark. They are discarded.</para >
153
+ </listitem >
154
+ <listitem >
155
+ <para >If the encoding is UTF-16 and the document begins with the bytes
156
+ FE FF or FF FE, those bytes are the byte order mark. They are discarded.</para >
157
+ </listitem >
158
+ <listitem >
159
+ <para >If the encoding is UTF-16LE and the document begins with the bytes
160
+ FF FE, those bytes are the byte order mark. They are discarded.</para >
161
+ </listitem >
162
+ <listitem >
163
+ <para >If the encoding is UTF-16BE and the document begins with the bytes
164
+ FE FF, those bytes are the byte order mark. They are discarded.</para >
165
+ </listitem >
166
+ <listitem >
167
+ <para >If the encoding isn’t specified, but the file begins with a byte
168
+ order mark (FE FF or FF FE), treat the charset as UTF-16 and
169
+ discard the byte order mark.</para >
170
+ </listitem >
171
+ <listitem >
172
+ <para >Otherwise, there is no byte order mark, nothing is discarded.
173
+ </para >
174
+ </listitem >
175
+ </itemizedlist >
176
+
177
+ </section >
111
178
</section >
112
179
113
180
<section xml : id =" c.load.json" >
114
181
<title >Loading JSON data</title >
115
182
116
- <para >For a JSON media type, the content is loaded and parsed as JSON.</para >
183
+ <para >For a JSON media type, the content is loaded and parsed as JSON
184
+ <biblioref linkend =" rfc8259" />.</para >
117
185
118
186
<para >The parameters specified for the <code >fn:parse-json</code > function
119
187
in <biblioref linkend =" xpath31-functions" />
@@ -133,6 +201,15 @@ map contains an entry whose key is defined in the specification of
133
201
<code >fn:parse-json</code > and whose value is not valid for that key, or if it contains
134
202
an entry with the key fallback when the parameter <option >escape</option > with
135
203
<literal >true()</literal > is also present.</error ></para >
204
+
205
+ <section xml : id =" json-bom" >
206
+ <title >Byte order mark</title >
207
+
208
+ <para >JSON data transmitted with the UTF-8 encoding may begin with a byte order
209
+ mark. If it does, the byte order mark is discarded before parsing the
210
+ input.</para >
211
+ </section >
212
+
136
213
</section >
137
214
138
215
<section xml : id =" c.load.html" >
0 commit comments