encoding.bs

<pre class=metadata>
Group: WHATWG
H1: Encoding
Shortname: encoding
Text Macro: TWITTER encodings
Text Macro: LATESTRD 2022-06
Abstract: The Encoding Standard defines encodings and their JavaScript API.
Translation: ja https://triple-underscore.github.io/Encoding-ja.html
Markup Shorthands: css off
Translate IDs: dictdef-textdecoderoptions textdecoderoptions,dictdef-textdecodeoptions textdecodeoptions,index section-index
</pre>

<link rel=stylesheet href=visualization-colors.css>


<h2 id=preface>Preface</h2>

<p>The UTF-8 encoding is the most appropriate encoding for interchange of Unicode, the
universal coded character set. Therefore for new protocols and formats, as well as
existing formats deployed in new contexts, this specification requires (and defines) the
UTF-8 encoding.

<p>The other (legacy) encodings have been defined to some extent in the past. However,
user agents have not always implemented them in the same way, have not always used the
same labels, and often differ in dealing with undefined and former proprietary areas of
encodings. This specification addresses those gaps so that new user agents do not have to
reverse engineer encoding implementations and existing user agents can converge.

<p>In particular, this specification defines all those encodings, their algorithms to go
from bytes to scalar values and back, and their canonical names and identifying labels.
This specification also defines an API to expose part of the encoding algorithms to
JavaScript.

<p>User agents have also significantly deviated from the labels listed in the
<a href=https://www.iana.org/assignments/character-sets/character-sets.xhtml>IANA Character Sets registry</a>.
To stop spreading legacy encodings further, this specification is exhaustive about the
aforementioned details and therefore has no need for the registry. In particular, this
specification does not provide a mechanism for extending any aspect of encodings.


<h2 id=security-background>Security background</h2>

<p>There is a set of encoding security issues when the producer and consumer do not agree on the
encoding in use, or on the way a given encoding is to be implemented. For instance, an attack was
reported in 2011 where a <a>Shift_JIS</a> lead byte 0x82 was used to “mask” a 0x22 trail byte in a
JSON resource of which an attacker could control some field. The producer did not see the problem
even though this is an illegal byte combination. The consumer decoded it as a single U+FFFD and
therefore changed the overall interpretation as U+0022 is an important delimiter. Decoders of
encodings that use multiple bytes for scalar values now require that in case of an illegal byte
combination, a scalar value in the range U+0000 to U+007F, inclusive, cannot be “masked”. For the
aforementioned sequence the output would be U+FFFD U+0022. (As an unfortunate exception to this, the
<a>gb18030 decoder</a> will “mask” up to one such byte at <a>end-of-queue</a>.)

<p>This is a larger issue for encodings that map anything that is an <a>ASCII byte</a> to something
that is not an <a>ASCII code point</a>, when there is no lead byte present. These are
“ASCII-incompatible” encodings and other than <a>ISO-2022-JP</a> and <a>UTF-16BE/LE</a>, which are
unfortunately required due to deployed content, they are not supported. (Investigation is
<a href=https://github.com/whatwg/encoding/issues/8 lt="Add more labels to the replacement encoding">ongoing</a>
whether more labels of other such encodings can be mapped to the <a>replacement</a> encoding, rather
than the unknown encoding fallback.) An example attack is injecting carefully crafted content into a
resource and then encouraging the user to override the encoding, resulting in, e.g., script
execution.

<p>Encoders used by URLs found in HTML and HTML's form feature can also result in slight information
loss when an encoding is used that cannot represent all scalar values. E.g., when a resource uses
the <a>windows-1252</a> encoding a server will not be able to distinguish between an end user
entering “💩” and “&amp;#128169;” into a form.

<p>The problems outlined here go away when exclusively using UTF-8, which is one of the many reasons
that is now the mandatory encoding for all things.

<p class=note>See also the <a href=#browser-ui>Browser UI</a> chapter.


<h2 id=terminology>Terminology</h2>

<p>This specification depends on the Infra Standard. [[!INFRA]]

<p>Hexadecimal numbers are prefixed with "0x".

<p>In equations, all numbers are integers, addition is represented by "+", subtraction by "&minus;",
multiplication by "×", integer division by "/" (returns the quotient), modulo by "%" (returns the
remainder of an integer division), logical left shifts by "&lt;&lt;", logical right shifts by ">>",
bitwise AND by "&amp;", and bitwise OR by "|".

<p>For logical right shifts operands must have at least twenty-one bits precision.

<hr>

<p>An <dfn id=concept-stream export>I/O queue</dfn> is a type of <a for=/>list</a> with
<a for=list>items</a> of a particular type (i.e., <a>bytes</a> or <a>scalar values</a>).
<dfn id="end-of-stream" export>End-of-queue</dfn> is a special <a for=list>item</a> that can be
present in <a for=/>I/O queues</a> of any type and it signifies that there are no more
<a for=list>items</a> in the queue.

<div class=note>
 <p>There are two ways to use an <a for=/>I/O queue</a>: in immediate mode, to represent I/O data
 stored in memory, and in streaming mode, to represent data coming in from the network. Immediate
 queues have <a>end-of-queue</a> as their last item, whereas streaming queues need not have it, and
 so their <a for="I/O queue">read</a> operation might block.

 <p>It is expected that streaming <a for=/>I/O queues</a> will be created empty, and that new
 <a for=list>items</a> will be <a for="I/O queue">pushed</a> to it as data comes in from the
 network. When the underlying network stream closes, an <a>end-of-queue</a> item is to be
 <a for="I/O queue">pushed</a> into the queue.

 <p>Since reading from a streaming <a for=/>I/O queue</a> might block, streaming
 <a for=/>I/O queues</a> are not to be used from an <a for=/>event loop</a>. They are to be used
 <a>in parallel</a> instead.
</div>

<p>To <dfn id=concept-stream-read for="I/O queue" export>read</dfn> an <a for=list>item</a> from an
<a for=/>I/O queue</a> <var>ioQueue</var>, run these steps:

<ol>
 <li><p>If <var>ioQueue</var> is <a for=list>empty</a>, then wait until its <a for=list>size</a> is
 at least 1.

 <li><p>If <var>ioQueue</var>[0] is <a>end-of-queue</a>, then return <a>end-of-queue</a>.

 <li><p><a for=list>Remove</a> <var>ioQueue</var>[0] and return it.
</ol>

<p>To <a for="I/O queue">read</a> a number <var>number</var> of <a for=list>items</a> from
<var>ioQueue</var>, run these steps:

<ol>
 <li><p>Let <var>readItems</var> be an empty list.

 <li>
  <p>Perform the following step <var>number</var> times:

  <ol>
   <li><p><a for=list>Append</a> to <var>readItems</var> the result of
   <a for="I/O queue">reading</a> an item from <var>ioQueue</var>.
  </ol>
 </li>

 <li><p><a for=list>Remove</a> <a>end-of-queue</a> from <var>readItems</var>.

 <li><p>Return <var>readItems</var>.
</ol>

<p>To <dfn for="I/O queue" export>peek</dfn> a number <var>number</var> of <a for=list>items</a>
from an <a for=/>I/O queue</a> <var>ioQueue</var>, run these steps:

<ol>
 <li><p>Wait until either <var>ioQueue</var>'s <a for=list>size</a> is equal to or greater than
 <var>number</var>, or <var>ioQueue</var> <a for=list>contains</a> <a>end-of-queue</a>, whichever
 comes first.

 <li><p>Let <var>prefix</var> be an empty list.

 <li>
  <p><a for=list>For each</a> <var>n</var> in <a>the range</a> 1 to <var>number</var>, inclusive:

  <ol>
   <li><p>If <var>ioQueue</var>[<var>n</var>] is <a>end-of-queue</a>, <a>break</a>.

   <li><p>Otherwise, <a for=list>append</a> <var>ioQueue</var>[<var>n</var>] to <var>prefix</var>.
  </ol>
 </li>

 <li><p>Return <var>prefix</var>.
</ol>

<p>To <dfn id=concept-stream-push for="I/O queue" export>push</dfn> an <a for=list>item</a>
<var>item</var> to an <a for=/>I/O queue</a> <var>ioQueue</var>, run these steps:

<ol>
 <li>
  <p>If the last <a for=list>item</a> in <var>ioQueue</var> is <a>end-of-queue</a>, then:

  <ol>
   <li><p>If <var>item</var> is <a>end-of-queue</a>, do nothing.

   <li><p>Otherwise, <a for=list>insert</a> <var>item</var> before the last <a for=list>item</a> in
   <var>ioQueue</var>.
  </ol>
 </li>

 <li><p>Otherwise, <a for=list>append</a> <var>item</var> to <var>ioQueue</var>.
</ol>

<p>To <a for="I/O queue">push</a> a sequence of items to an <a for=/>I/O queue</a>
<var>ioQueue</var> is to push each item in the sequence to <var>ioQueue</var>, in the given order.

<p>To <dfn id=concept-stream-prepend for="I/O queue">prepend</dfn> an <a for=list>item</a> other
than <a>end-of-queue</a> to an <a for=/>I/O queue</a>, perform the normal <a for=/>list</a>
<a for=list>prepend</a> operation. To prepend a sequence of items not containing
<a>end-of-queue</a>, insert those items, in the given order, before the first item in the queue.

<p class=example id=example-tokens>Inserting the sequence of scalar value items
<code>&amp;#128169;</code> in an I/O queue of scalar values "<code> hello world</code>", results in
an I/O queue "<code>&amp;#128169; hello world</code>". The next item to be read would be
<code>&amp;</code>. <!-- 💩 -->

<p>To <dfn for="from I/O queue">convert</dfn> an <a for=/>I/O queue</a> <var>ioQueue</var> into a
<a for=/>list</a>, <a>string</a>, or <a>byte sequence</a>, return the result of
<a for="I/O queue">reading</a> an indefinite number of <a for=list>items</a> from
<var>ioQueue</var>.

<p>To <dfn for="to I/O queue">convert</dfn> a <a for=/>list</a>, <a>string</a>, or
<a>byte sequence</a> <var>input</var> into an <a for=/>I/O queue</a>, run these steps:

<ol>
 <li><p>Assert: if <var>input</var> is a <a for=/>list</a>, then it does not <a for=list>contain</a>
 <a>end-of-queue</a>.

 <li><p>Return an <a for=/>I/O queue</a> containing the <a for=list>items</a> in <var>input</var>,
 in order, followed by <a>end-of-queue</a>.
</ol>

<p class=XXX>The Infra standard is expected to define some infrastructure around type conversions.
See <a href="https://github.com/whatwg/infra/issues/319">whatwg/infra issue #319</a>. [[INFRA]]

<p class=note><a for=/>I/O queues</a> are defined as <a for=/>lists</a>, not
<a spec=infra>queues</a>, because they feature a <a for="I/O queue">prepend</a> operation. However,
this prepend operation is an internal detail of the algorithms in this specification, and is not to
be used by other standards. Implementations are free to find alternative ways to implement such
algorithms, as detailed in [[#implementation-considerations]].


<h2 id=encodings>Encodings</h2>

<p>An <dfn export>encoding</dfn> defines a mapping from a <a>scalar value</a> sequence to
a <a>byte</a> sequence (and vice versa). Each <a for=/>encoding</a> has a
<dfn id=name export for=encoding>name</dfn>, and one or more
<dfn id=label export for=encoding lt=label>labels</dfn>.

<p class="note no-backref">This specification defines three <a for=/>encodings</a> with the same
names as <i>encoding schemes</i> defined in the Unicode standard: <a>UTF-8</a>, <a>UTF-16LE</a>, and
<a>UTF-16BE</a>. The <a for=/>encodings</a> differ from the <i>encoding schemes</i> by byte order
mark (also known as BOM) handling not being part of the <a for=/>encodings</a> themselves and
instead being part of wrapper algorithms in this specification, whereas byte order mark handling is
part of the definition of the <i>encoding schemes</i> in the Unicode Standard. <a>UTF-8</a> used
together with the <a>UTF-8 decode</a> algorithm matches the <i>encoding scheme</i> of the same name.
This specification does not provide wrapper algorithms that would combine with <a>UTF-16LE</a> and
<a>UTF-16BE</a> to match the similarly-named <i>encoding schemes</i>. [[UNICODE]]


<h3 id=encoders-and-decoders>Encoders and decoders</h3>

<p>Each <a for=/>encoding</a> has an associated <dfn>decoder</dfn> and most of them have an
associated <dfn>encoder</dfn>. Instances of <a for=/>decoders</a> and <a for=/>encoders</a> have a
<dfn>handler</dfn> algorithm and might also have state. A <a>handler</a> algorithm takes an input
<a for=/>I/O queue</a> and an <a for=list>item</a>, and returns
<dfn>finished</dfn>, one or more <a for=list>items</a>, <dfn>error</dfn>
optionally with a <a>code point</a>, or <dfn>continue</dfn>.

<p class="note no-backref">The <a>replacement</a> and <a>UTF-16BE/LE</a> <a for=/>encodings</a> have
no <a for=/>encoder</a>.

<p>An <dfn>error mode</dfn> as used below is "<code>replacement</code>" or "<code>fatal</code>" for
a <a for=/>decoder</a> and "<code>fatal</code>" or "<code>html</code>" for an <a for=/>encoder</a>.

<p class=note>An XML processor would set <a for=/>error mode</a> to "<code>fatal</code>".
[[XML]]

<p class=note>"<code>html</code>" exists as <a for=/>error mode</a> due to HTML forms requiring a
non-terminating legacy <a for=/>encoder</a>. The "<code>html</code>" <a for=/>error mode</a> causes
a sequence to be emitted that cannot be distinguished from legitimate input and can therefore lead
to silent data loss. Developers are strongly encouraged to use the <a>UTF-8</a>
<a for=/>encoding</a> to prevent this from happening. [[HTML]]

<hr>

<p>To <dfn lt="process a queue|processing a queue" id=concept-encoding-run>process a queue</dfn>
given an <a for=/>encoding</a>'s <a for=/>decoder</a> or <a for=/>encoder</a> instance
<var>encoderDecoder</var>, <a for=/>I/O queue</a> <var>input</var>, <a for=/>I/O queue</a>
<var>output</var>, and <a for=/>error mode</a> <var>mode</var>:

<ol>
 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>result</var> be the result of <a>processing an item</a> with the result of
   <a>reading</a> from <var>input</var>, <var>encoderDecoder</var>, <var>input</var>,
   <var>output</var>, and <var>mode</var>.

   <li><p>If <var>result</var> is not <a>continue</a>, then return <var>result</var>.
  </ol>
</ol>

<p>To <dfn lt="process an item|processing an item" id=concept-encoding-process>process an item</dfn>
given an <a for=list>item</a> <var>item</var>, <a for=/>encoding</a>'s <a for=/>encoder</a> or
<a for=/>decoder</a> instance <var>encoderDecoder</var>, <a for=/>I/O queue</a> <var>input</var>,
<a for=/>I/O queue</a> <var>output</var>, and <a for=/>error mode</a> <var>mode</var>:

<ol>
 <li><p>Assert: if <var>encoderDecoder</var> is an <a for=/>encoder</a> instance, <var>mode</var> is
 not "<code>replacement</code>".

 <li><p>Assert: if <var>encoderDecoder</var> is a <a for=/>decoder</a> instance, <var>mode</var> is
 not "<code>html</code>".

 <li><p>Assert: if <var>encoderDecoder</var> is an <a for=/>encoder</a> instance, <var>item</var> is
 not a <a>surrogate</a>.

 <li><p>Let <var>result</var> be the result of running <var>encoderDecoder</var>'s <a>handler</a> on
 <var>input</var> and <var>item</var>.

 <li>
  <p>If <var>result</var> is <a>finished</a>:

  <ol>
   <li><p><a>Push</a> <a>end-of-queue</a> to <var>output</var>.

   <li><p>Return <var>result</var>.
  </ol>
 </li>

 <li>
  <p>Otherwise, if <var>result</var> is one or more <a for=list>items</a>:

  <ol>
   <li><p>Assert: if <var>encoderDecoder</var> is a <a for=/>decoder</a> instance, <var>result</var>
   does not contain any <a>surrogates</a>.

   <li><p><a>Push</a> <var>result</var> to <var>output</var>.
  </ol>

 <li>
  <p>Otherwise, if <var>result</var> is an <a>error</a>, switch on <var>mode</var> and run the
  associated steps:

  <dl class=switch>
   <dt>"<code>replacement</code>"
   <dd><a>Push</a> U+FFFD (�) to <var>output</var>.

   <dt>"<code>html</code>"
   <dd><a>Push</a> 0x26 (&amp;), 0x23 (#), followed by the shortest sequence of 0x30 (0) to
   0x39 (9), inclusive, representing <var>result</var>'s <a>code point</a>'s
   <a for="code point">value</a> in base ten, followed by 0x3B (;) to <var>output</var>.

   <dt>"<code>fatal</code>"
   <dd>Return <var>result</var>.
  </dl>

 <li><p>Return <a>continue</a>.
</ol>


<h3 id=names-and-labels>Names and labels</h3>

<p>The table below lists all <a for=/>encodings</a>
and their <a>labels</a> user agents must support.
User agents must not support any other <a for=/>encodings</a>
or <a>labels</a>.

<p class=note>For each encoding, <a lt="ASCII lowercase">ASCII-lowercasing</a> its
<a for=encoding>name</a> yields one of its <a for=encoding>labels</a>.

<p>Authors must use the <a>UTF-8</a> <a for=/>encoding</a> and must use the
<a>ASCII case-insensitive</a> "<code>utf-8</code>" <a>label</a> to
identify it.

<p>New protocols and formats, as well as existing formats deployed in new contexts, must
use the <a>UTF-8</a> <a for=/>encoding</a> exclusively. If these protocols and
formats need to expose the <a for=/>encoding</a>'s <a>name</a> or
<a>label</a>, they must expose it as "<code>utf-8</code>".
<!-- “UTF-8 or death” — Emil A Eklund -->

<p>To
<dfn export lt="get an encoding|getting an encoding" id=concept-encoding-get>get an encoding</dfn>
from a string <var>label</var>, run these steps:

<ol>
 <li><p>Remove any leading and trailing <a>ASCII whitespace</a> from
 <var>label</var>.

 <li><p>If <var>label</var> is an <a>ASCII case-insensitive</a> match for any of the <a>labels</a>
 listed in the table below, then return the corresponding <a for=/>encoding</a>; otherwise return
 failure.
</ol>

<p class="note no-backref">This is a more basic and restrictive algorithm of mapping <a>labels</a>
to <a for=/>encodings</a> than
<a href=https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching>section 1.4 of Unicode Technical Standard #22</a>
prescribes, as that is necessary to be compatible with deployed content.

<table>
 <thead>
  <tr>
   <th><a>Name</a>
   <th><a>Labels</a>
 <tbody>
  <tr><th colspan=2><a href=#the-encoding>The Encoding</a>
  <tr>
   <td rowspan=6><a>UTF-8</a>
   <td>"<code>unicode-1-1-utf-8</code>"
  <tr><td>"<code>unicode11utf8</code>"
  <tr><td>"<code>unicode20utf8</code>"
  <tr><td>"<code>utf-8</code>"
  <tr><td>"<code>utf8</code>"
  <tr><td>"<code>x-unicode20utf8</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-single-byte-encodings>Legacy single-byte encodings</a>
  <tr>
   <td rowspan=4><a>IBM866</a>
   <td>"<code>866</code>"
  <tr><td>"<code>cp866</code>"
  <tr><td>"<code>csibm866</code>"
  <tr><td>"<code>ibm866</code>"
  <tr>
   <td rowspan=9><a>ISO-8859-2</a>
   <td>"<code>csisolatin2</code>"
  <tr><td>"<code>iso-8859-2</code>"
  <tr><td>"<code>iso-ir-101</code>"
  <tr><td>"<code>iso8859-2</code>"
  <tr><td>"<code>iso88592</code>"
  <tr><td>"<code>iso_8859-2</code>"
  <tr><td>"<code>iso_8859-2:1987</code>"
  <tr><td>"<code>l2</code>"
  <tr><td>"<code>latin2</code>"
  <tr>
   <td rowspan=9><a>ISO-8859-3</a>
   <td>"<code>csisolatin3</code>"
  <tr><td>"<code>iso-8859-3</code>"
  <tr><td>"<code>iso-ir-109</code>"
  <tr><td>"<code>iso8859-3</code>"
  <tr><td>"<code>iso88593</code>"
  <tr><td>"<code>iso_8859-3</code>"
  <tr><td>"<code>iso_8859-3:1988</code>"
  <tr><td>"<code>l3</code>"
  <tr><td>"<code>latin3</code>"
  <tr>
   <td rowspan=9><a>ISO-8859-4</a>
   <td>"<code>csisolatin4</code>"
  <tr><td>"<code>iso-8859-4</code>"
  <tr><td>"<code>iso-ir-110</code>"
  <tr><td>"<code>iso8859-4</code>"
  <tr><td>"<code>iso88594</code>"
  <tr><td>"<code>iso_8859-4</code>"
  <tr><td>"<code>iso_8859-4:1988</code>"
  <tr><td>"<code>l4</code>"
  <tr><td>"<code>latin4</code>"
  <tr>
   <td rowspan=8><a>ISO-8859-5</a>
   <td>"<code>csisolatincyrillic</code>"
  <tr><td>"<code>cyrillic</code>"
  <tr><td>"<code>iso-8859-5</code>"
  <tr><td>"<code>iso-ir-144</code>"
  <tr><td>"<code>iso8859-5</code>"
  <tr><td>"<code>iso88595</code>"
  <tr><td>"<code>iso_8859-5</code>"
  <tr><td>"<code>iso_8859-5:1988</code>"
  <tr>
   <td rowspan=14><a>ISO-8859-6</a>
   <td>"<code>arabic</code>"
  <tr><td>"<code>asmo-708</code>"
  <tr><td>"<code>csiso88596e</code>"
  <tr><td>"<code>csiso88596i</code>"
  <tr><td>"<code>csisolatinarabic</code>"
  <tr><td>"<code>ecma-114</code>"
  <tr><td>"<code>iso-8859-6</code>"
  <tr><td>"<code>iso-8859-6-e</code>"
  <tr><td>"<code>iso-8859-6-i</code>"
  <tr><td>"<code>iso-ir-127</code>"
  <tr><td>"<code>iso8859-6</code>"
  <tr><td>"<code>iso88596</code>"
  <tr><td>"<code>iso_8859-6</code>"
  <tr><td>"<code>iso_8859-6:1987</code>"
  <tr>
   <td rowspan=12><a>ISO-8859-7</a>
   <td>"<code>csisolatingreek</code>"
  <tr><td>"<code>ecma-118</code>"
  <tr><td>"<code>elot_928</code>"
  <tr><td>"<code>greek</code>"
  <tr><td>"<code>greek8</code>"
  <tr><td>"<code>iso-8859-7</code>"
  <tr><td>"<code>iso-ir-126</code>"
  <tr><td>"<code>iso8859-7</code>"
  <tr><td>"<code>iso88597</code>"
  <tr><td>"<code>iso_8859-7</code>"
  <tr><td>"<code>iso_8859-7:1987</code>"
  <tr><td>"<code>sun_eu_greek</code>"
  <tr>
   <td rowspan=11><a>ISO-8859-8</a>
   <td>"<code>csiso88598e</code>"
  <tr><td>"<code>csisolatinhebrew</code>"
  <tr><td>"<code>hebrew</code>"
  <tr><td>"<code>iso-8859-8</code>"
  <tr><td>"<code>iso-8859-8-e</code>"
  <tr><td>"<code>iso-ir-138</code>"
  <tr><td>"<code>iso8859-8</code>"
  <tr><td>"<code>iso88598</code>"
  <tr><td>"<code>iso_8859-8</code>"
  <tr><td>"<code>iso_8859-8:1988</code>"
  <tr><td>"<code>visual</code>"
  <tr>
   <td rowspan=3><a>ISO-8859-8-I</a>
   <td>"<code>csiso88598i</code>"
  <tr><td>"<code>iso-8859-8-i</code>"
  <tr><td>"<code>logical</code>"
  <tr>
   <td rowspan=7><a>ISO-8859-10</a>
   <td>"<code>csisolatin6</code>"
  <tr><td>"<code>iso-8859-10</code>"
  <tr><td>"<code>iso-ir-157</code>"
  <tr><td>"<code>iso8859-10</code>"
  <tr><td>"<code>iso885910</code>"
  <tr><td>"<code>l6</code>"
  <tr><td>"<code>latin6</code>"
  <tr>
   <td rowspan=3><a>ISO-8859-13</a>
   <td>"<code>iso-8859-13</code>"
  <tr><td>"<code>iso8859-13</code>"
  <tr><td>"<code>iso885913</code>"
  <tr>
   <td rowspan=3><a>ISO-8859-14</a>
   <td>"<code>iso-8859-14</code>"
  <tr><td>"<code>iso8859-14</code>"
  <tr><td>"<code>iso885914</code>"
  <tr>
   <td rowspan=6><a>ISO-8859-15</a>
   <td>"<code>csisolatin9</code>"
  <tr><td>"<code>iso-8859-15</code>"
  <tr><td>"<code>iso8859-15</code>"
  <tr><td>"<code>iso885915</code>"
  <tr><td>"<code>iso_8859-15</code>"
  <tr><td>"<code>l9</code>"
  <tr>
   <td><a>ISO-8859-16</a>
   <td>"<code>iso-8859-16</code>"
  <tr>
   <td rowspan=5><a>KOI8-R</a>
   <td>"<code>cskoi8r</code>"
  <tr><td>"<code>koi</code>"
  <tr><td>"<code>koi8</code>"
  <tr><td>"<code>koi8-r</code>"
  <tr><td>"<code>koi8_r</code>"
  <tr>
   <td rowspan=2><a>KOI8-U</a>
   <td>"<code>koi8-ru</code>"
  <tr><td>"<code>koi8-u</code>"
  <tr>
   <td rowspan=4><a>macintosh</a>
   <td>"<code>csmacintosh</code>"
  <tr><td>"<code>mac</code>"
  <tr><td>"<code>macintosh</code>"
  <tr><td>"<code>x-mac-roman</code>"
  <tr>
   <td rowspan=6><a>windows-874</a>
   <td>"<code>dos-874</code>"
  <tr><td>"<code>iso-8859-11</code>"
  <tr><td>"<code>iso8859-11</code>"
  <tr><td>"<code>iso885911</code>"
  <tr><td>"<code>tis-620</code>"
  <tr><td>"<code>windows-874</code>"
  <tr>
   <td rowspan=3><a>windows-1250</a>
   <td>"<code>cp1250</code>"
  <tr><td>"<code>windows-1250</code>"
  <tr><td>"<code>x-cp1250</code>"
  <tr>
   <td rowspan=3><a>windows-1251</a>
   <td>"<code>cp1251</code>"
  <tr><td>"<code>windows-1251</code>"
  <tr><td>"<code>x-cp1251</code>"
  <tr>
   <td rowspan=17><a>windows-1252</a>
   <td>"<code>ansi_x3.4-1968</code>"
  <tr><td>"<code>ascii</code>"
  <tr><td>"<code>cp1252</code>"
  <tr><td>"<code>cp819</code>"
  <tr><td>"<code>csisolatin1</code>"
  <tr><td>"<code>ibm819</code>"
  <tr><td>"<code>iso-8859-1</code>"
  <tr><td>"<code>iso-ir-100</code>"
  <tr><td>"<code>iso8859-1</code>"
  <tr><td>"<code>iso88591</code>"
  <tr><td>"<code>iso_8859-1</code>"
  <tr><td>"<code>iso_8859-1:1987</code>"
  <tr><td>"<code>l1</code>"
  <tr><td>"<code>latin1</code>"
  <tr><td>"<code>us-ascii</code>"
  <tr><td>"<code>windows-1252</code>"
  <tr><td>"<code>x-cp1252</code>"
  <tr>
   <td rowspan=3><a>windows-1253</a>
   <td>"<code>cp1253</code>"
  <tr><td>"<code>windows-1253</code>"
  <tr><td>"<code>x-cp1253</code>"
  <tr>
   <td rowspan=12><a>windows-1254</a>
   <td>"<code>cp1254</code>"
  <tr><td>"<code>csisolatin5</code>"
  <tr><td>"<code>iso-8859-9</code>"
  <tr><td>"<code>iso-ir-148</code>"
  <tr><td>"<code>iso8859-9</code>"
  <tr><td>"<code>iso88599</code>"
  <tr><td>"<code>iso_8859-9</code>"
  <tr><td>"<code>iso_8859-9:1989</code>"
  <tr><td>"<code>l5</code>"
  <tr><td>"<code>latin5</code>"
  <tr><td>"<code>windows-1254</code>"
  <tr><td>"<code>x-cp1254</code>"
  <tr>
   <td rowspan=3><a>windows-1255</a>
   <td>"<code>cp1255</code>"
  <tr><td>"<code>windows-1255</code>"
  <tr><td>"<code>x-cp1255</code>"
  <tr>
   <td rowspan=3><a>windows-1256</a>
   <td>"<code>cp1256</code>"
  <tr><td>"<code>windows-1256</code>"
  <tr><td>"<code>x-cp1256</code>"
  <tr>
   <td rowspan=3><a>windows-1257</a>
   <td>"<code>cp1257</code>"
  <tr><td>"<code>windows-1257</code>"
  <tr><td>"<code>x-cp1257</code>"
  <tr>
   <td rowspan=3><a>windows-1258</a>
   <td>"<code>cp1258</code>"
  <tr><td>"<code>windows-1258</code>"
  <tr><td>"<code>x-cp1258</code>"
  <tr>
   <td rowspan=2><a>x-mac-cyrillic</a>
   <td>"<code>x-mac-cyrillic</code>"
  <tr><td>"<code>x-mac-ukrainian</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-multi-byte-chinese-(simplified)-encodings>Legacy multi-byte Chinese (simplified) encodings</a>
  <tr>
   <td rowspan=9><a>GBK</a>
   <td>"<code>chinese</code>"
  <tr><td>"<code>csgb2312</code>"
  <tr><td>"<code>csiso58gb231280</code>"
  <tr><td>"<code>gb2312</code>"
  <tr><td>"<code>gb_2312</code>"
  <tr><td>"<code>gb_2312-80</code>"
  <tr><td>"<code>gbk</code>"
  <tr><td>"<code>iso-ir-58</code>"
  <tr><td>"<code>x-gbk</code>"
  <tr>
   <td><a>gb18030</a>
   <td>"<code>gb18030</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-multi-byte-chinese-(traditional)-encodings>Legacy multi-byte Chinese (traditional) encodings</a>
  <tr>
   <td rowspan=5><a>Big5</a>
   <td>"<code>big5</code>"
  <tr><td>"<code>big5-hkscs</code>"
  <tr><td>"<code>cn-big5</code>"
  <tr><td>"<code>csbig5</code>"
  <tr><td>"<code>x-x-big5</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-multi-byte-japanese-encodings>Legacy multi-byte Japanese encodings</a>
  <tr>
   <td rowspan=3><a>EUC-JP</a>
   <td>"<code>cseucpkdfmtjapanese</code>"
  <tr><td>"<code>euc-jp</code>"
  <tr><td>"<code>x-euc-jp</code>"
  <tr>
   <td rowspan=2><a>ISO-2022-JP</a>
   <td>"<code>csiso2022jp</code>"
  <tr><td>"<code>iso-2022-jp</code>"
  <tr>
   <td rowspan=8><a>Shift_JIS</a>
   <td>"<code>csshiftjis</code>"
  <tr><td>"<code>ms932</code>"
  <tr><td>"<code>ms_kanji</code>"
  <tr><td>"<code>shift-jis</code>"
  <tr><td>"<code>shift_jis</code>"
  <tr><td>"<code>sjis</code>"
  <tr><td>"<code>windows-31j</code>"
  <tr><td>"<code>x-sjis</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-multi-byte-korean-encodings>Legacy multi-byte Korean encodings</a>
  <tr>
   <td rowspan=10><a>EUC-KR</a>
   <td>"<code>cseuckr</code>"
  <tr><td>"<code>csksc56011987</code>"
  <tr><td>"<code>euc-kr</code>"
  <tr><td>"<code>iso-ir-149</code>"
  <tr><td>"<code>korean</code>"
  <tr><td>"<code>ks_c_5601-1987</code>"
  <tr><td>"<code>ks_c_5601-1989</code>"
  <tr><td>"<code>ksc5601</code>"
  <tr><td>"<code>ksc_5601</code>"
  <tr><td>"<code>windows-949</code>"
 <tbody>
  <tr><th colspan=2><a href=#legacy-miscellaneous-encodings>Legacy miscellaneous encodings</a>
  <tr>
   <td rowspan=6><a>replacement</a>
   <td>"<code>csiso2022kr</code>"
  <tr><td>"<code>hz-gb-2312</code>"
  <tr><td>"<code>iso-2022-cn</code>"
  <tr><td>"<code>iso-2022-cn-ext</code>"
  <tr><td>"<code>iso-2022-kr</code>"
  <tr><td>"<code>replacement</code>"
  <tr>
   <td rowspan=2><a>UTF-16BE</a>
   <td>"<code>unicodefffe</code>"
  <tr><td>"<code>utf-16be</code>"
  <tr>
   <td rowspan=7><a>UTF-16LE</a>
   <td>"<code>csunicode</code>"
  <tr><td>"<code>iso-10646-ucs-2</code>"
  <tr><td>"<code>ucs-2</code>"
  <tr><td>"<code>unicode</code>"
  <tr><td>"<code>unicodefeff</code>"
  <tr><td>"<code>utf-16</code>"
  <tr><td>"<code>utf-16le</code>"
  <tr>
   <td><a>x-user-defined</a>
   <td>"<code>x-user-defined</code>"
</table>

<p class=note>All <a for=/>encodings</a> and their
<a>labels</a> are also available as non-normative
<a href=encodings.json>encodings.json</a> resource.

<p class=note id=supported-encodings>The set of supported <a for=/>encodings</a> is primarily based
on the intersection of the sets supported by major browser engines when the development of this
standard started, while removing encodings that were rarely used legitimately but that could be used
in attacks. The inclusion of some encodings is questionable in the light of anecdotal evidence of
the level of use by existing Web content. That is, while they have been broadly supported by
browsers, it is unclear if they are broadly used by Web content. However, an effort has not been
made to eagerly remove <a>single-byte encodings</a> that were broadly supported by browsers or are
part of the ISO 8859 series. In particular, the necessity of the inclusion of <a>IBM866</a>,
<a>macintosh</a>, <a>x-mac-cyrillic</a>, <a>ISO-8859-3</a>, <a>ISO-8859-10</a>, <a>ISO-8859-14</a>,
and <a>ISO-8859-16</a> is doubtful for the purpose of supporting existing content, but there are no
plans to remove these.</p>


<h3 id=output-encodings>Output encodings</h3>

<p>To <dfn export>get an output encoding</dfn> from an <a for=/>encoding</a>
<var>encoding</var>, run these steps:

<ol>
 <li><p>If <var>encoding</var> is <a>replacement</a> or <a>UTF-16BE/LE</a>, then return
 <a>UTF-8</a>.

 <li><p>Return <var>encoding</var>.
</ol>

<p class=note>The <a>get an output encoding</a> algorithm is useful for URL parsing and HTML
form submission, which both need exactly this.


<h2 id=indexes>Indexes</h2>

<p>Most legacy <a for=/>encodings</a> make use of an <dfn id=index>index</dfn>. An
<a>index</a> is an ordered list of entries, each entry consisting of a pointer and a
corresponding code point. Within an <a>index</a> pointers are unique and code points can be
duplicated.

<p class="note no-backref">An efficient implementation likely has two
<a lt=index>indexes</a> per <a for=/>encoding</a>. One optimized for its
<a for=/>decoder</a> and one for its <a for=/>encoder</a>.

<p>To find the pointers and their corresponding code points in an <a>index</a>,
let <var>lines</var> be the result of splitting the resource's contents on U+000A.
Then remove each item in <var>lines</var> that is the empty string or starts with U+0023.
Then the pointers and their corresponding code points are found by splitting each item in <var>lines</var> on U+0009.
The first subitem is the pointer (as a decimal number) and the second is the corresponding code point (as a hexadecimal number).
Other subitems are not relevant.

<p class="note no-backref">To signify changes an <a>index</a> includes an
<i>Identifier</i> and a <i>Date</i>. If an <i>Identifier</i> has
changed, so has the <a>index</a>.

<p>The <dfn>index code point</dfn> for <var>pointer</var> in
<var>index</var> is the code point corresponding to
<var>pointer</var> in <var>index</var>, or null if
<var>pointer</var> is not in <var>index</var>.

<p>The <dfn>index pointer</dfn> for <var>code point</var> in
<var>index</var> is the <em>first</em> pointer corresponding to
<var>code point</var> in <var>index</var>, or null if
<var>code point</var> is not in <var>index</var>.

<div class=note id=visualization>
 <p>There is a non-normative visualization for each <a>index</a> other than
 <a>index gb18030 ranges</a> and <a>index ISO-2022-JP katakana</a>. <a>index jis0208</a> also has an
 alternative <a>Shift_JIS</a> visualization. Additionally, there is visualization of the Basic
 Multilingual Plane coverage of each index other than <a>index gb18030 ranges</a> and
 <a>index ISO-2022-JP katakana</a>.

 <p>The legend for the visualizations is:

 <ul class=visualizationlegend>
  <li class=unmapped>Unmapped
  <li class=mid>Two bytes in UTF-8
  <li class="mid contiguous">Two bytes in UTF-8, code point follows immediately the code point of
  previous pointer
  <li class=upper>Three bytes in UTF-8 (non-PUA)
  <li class="upper contiguous">Three bytes in UTF-8 (non-PUA), code point follows immediately the
  code point of previous pointer
  <li class=pua>Private Use
  <li class="pua contiguous">Private Use, code point follows immediately the code point of previous
  pointer
  <li class=astral>Four bytes in UTF-8
  <li class="astral contiguous">Four bytes in UTF-8, code point follows immediately the code point
  of previous pointer
  <li class=duplicate>Duplicate code point already mapped at an earlier index
  <li class=compatibility>CJK Compatibility Ideograph
  <li class=ext>CJK Unified Ideographs Extension A
 </ul>
</div>

<p>These are the <a lt=index>indexes</a> defined by this
specification, excluding <a>index single-byte</a>, which have their own table:

<table>
 <tbody><tr><th colspan=4><a>Index</a><th>Notes
 <tr>
  <td><dfn export>index Big5</dfn>
  <td><a href=index-big5.txt>index-big5.txt</a>
  <td><a href=big5.html>index Big5 visualization</a>
  <td><a href=big5-bmp.html>index Big5 BMP coverage</a>
  <td>This matches the Big5 standard in combination with the
  Hong Kong Supplementary Character Set and other common extensions.
 <tr>
  <td><dfn export>index EUC-KR</dfn>
  <td><a href=index-euc-kr.txt>index-euc-kr.txt</a>
  <td><a href=euc-kr.html>index EUC-KR visualization</a>
  <td><a href=euc-kr-bmp.html>index EUC-KR BMP coverage</a>
  <td>This matches the KS X 1001 standard and the Unified Hangul Code, more commonly known together
  as Windows Codepage 949. It covers the Hangul Syllables block of Unicode in its entirety. The
  Hangul block whose top left corner in the visualization is at pointer 9026 is in the Unicode
  order. Taken separately, the rest of the Hangul syllables in this index are in the Unicode order,
  too.
 <tr>
  <td><dfn export>index gb18030</dfn>
  <td><a href=index-gb18030.txt>index-gb18030.txt</a>
  <td><a href=gb18030.html>index gb18030 visualization</a>
  <td><a href=gb18030-bmp.html>index gb18030 BMP coverage</a>
  <td>This matches the GB18030-2005 standard for code points encoded as two bytes, except for
  0xA3 0xA0 which maps to U+3000 to be compatible with deployed content. This index covers the
  CJK Unified Ideographs block of Unicode in its entirety. Entries from that block that are above or
  to the left of (the first) U+3000 in the visualization are in the Unicode order.
  <!-- https://bugzilla.mozilla.org/show_bug.cgi?id=131837
       https://bugs.webkit.org/show_bug.cgi?id=17014
       https://www.w3.org/Bugs/Public/show_bug.cgi?id=25396
       https://github.com/whatwg/encoding/issues/17 -->
 <tr>
  <td><dfn export>index gb18030 ranges</dfn>
  <td colspan=3><a href=index-gb18030-ranges.txt>index-gb18030-ranges.txt</a>
  <td>This <a>index</a> works different from all others. Listing all code points would result
  in over a million items whereas they can be represented neatly in 207 ranges combined with trivial
  limit checks. It therefore only superficially matches the GB18030-2005 standard for code points
  encoded as four bytes. See also <a>index gb18030 ranges code point</a> and
  <a>index gb18030 ranges pointer</a> below.
 <tr>
  <td><dfn export>index jis0208</dfn>
  <td><a href=index-jis0208.txt>index-jis0208.txt</a>
  <td><a href=jis0208.html>index jis0208 visualization</a>, <a href=shift_jis.html>Shift_JIS visualization</a>
  <td><a href=jis0208-bmp.html>index jis0208 BMP coverage</a>
  <td>This is the JIS X 0208 standard including formerly proprietary
  extensions from IBM and NEC.
  <!-- NEC = Nippon Electronics Corporation -->
 <tr>
  <td><dfn export>index jis0212</dfn>
  <td><a href=index-jis0212.txt>index-jis0212.txt</a>
  <td><a href=jis0212.html>index jis0212 visualization</a>
  <td><a href=jis0212-bmp.html>index jis0212 BMP coverage</a>
  <td>This is the JIS X 0212 standard. It is only used by the <a>EUC-JP decoder</a>
  due to lack of widespread support elsewhere.
  <!--
   No JIX X 0212 EUC-JP encoder support:
     https://bugzilla.mozilla.org/show_bug.cgi?id=600715
     https://code.google.com/p/chromium/issues/detail?id=78847

   No JIX X 0212 ISO-2022-JP support:
     https://www.w3.org/Bugs/Public/show_bug.cgi?id=26885
  -->
 <tr>
  <td><dfn export>index ISO-2022-JP katakana</dfn>
  <td colspan=3><a href=index-iso-2022-jp-katakana.txt>index-iso-2022-jp-katakana.txt</a>
  <td>This maps halfwidth to fullwidth katakana as per Unicode Normalization Form KC, except that
  U+FF9E and U+FF9F map to U+309B and U+309C rather than U+3099 and U+309A. It is only used by the
  <a>ISO-2022-JP encoder</a>. [[UNICODE]]
</table>

<p>The <dfn>index gb18030 ranges code point</dfn> for <var>pointer</var> is
the return value of these steps:

<ol>
 <li><p>If <var>pointer</var> is greater than 39419 and less than
 189000, or <var>pointer</var> is greater than 1237575, return null.

 <li><p>If <var>pointer</var> is 7457, return code point U+E7C7.
 <!-- 7457 is 0x81 0x35 0xF4 0x37 -->

 <li><p>Let <var>offset</var> be the last pointer in <a>index gb18030 ranges</a> that is less than
 or equal to <var>pointer</var> and let <var>code point offset</var> be its corresponding code
 point.

 <li><p>Return a code point whose value is
 <var>code point offset</var> + <var>pointer</var> &minus; <var>offset</var>.
</ol>

<p>The <dfn>index gb18030 ranges pointer</dfn> for <var>code point</var> is
the return value of these steps:

<ol>
 <li><p>If <var>code point</var> is U+E7C7, return pointer 7457.

 <li><p>Let <var>offset</var> be the last code point in <a>index gb18030 ranges</a> that is less
 than or equal to <var>code point</var> and let <var>pointer offset</var> be its corresponding
 pointer.

 <li><p>Return a pointer whose value is
 <var>pointer offset</var> + <var>code point</var> &minus; <var>offset</var>.
</ol>

<p>The <dfn>index Shift_JIS pointer</dfn> for <var>code point</var> is the return value of these
steps:

<ol>
 <li>
  <p>Let <var>index</var> be <a>index jis0208</a> excluding all entries whose pointer is in
  the range 8272 to 8835, inclusive.
  <!-- selected NEC duplicates from IBM extensions later in the index; need to use IBM
       extensions when going back to bytes -->

  <p class=note>The <a>index jis0208</a> contains duplicate code points so the exclusion of
  these entries causes later code points to be used.

 <li><p>Return the <a>index pointer</a> for <var>code point</var> in
 <var>index</var>.
</ol>

<p>The <dfn>index Big5 pointer</dfn> for <var>code point</var> is the return value of
these steps:

<ol>
 <li>
  <p>Let <var>index</var> be <a>index Big5</a> excluding all entries whose pointer is less
  than (0xA1 - 0x81) × 157.

  <p class=note>Avoid returning Hong Kong Supplementary Character Set extensions literally.

 <li>
  <p>If <var>code point</var> is U+2550, U+255E, U+2561, U+256A, U+5341, or U+5345,
  return the <em>last</em> pointer corresponding to <var>code point</var> in
  <var>index</var>.
  <!-- https://www.w3.org/Bugs/Public/show_bug.cgi?id=27878 -->

  <p class=note>There are other duplicate code points, but for those the <em>first</em> pointer is
  to be used.

 <li><p>Return the <a>index pointer</a> for <var>code point</var> in
 <var>index</var>.
</ol>

<hr>

<p class="note no-backref">All <a lt=index>indexes</a> are also available as a non-normative
<a href=indexes.json>indexes.json</a> resource. (<a>Index gb18030 ranges</a> has a slightly
different format here, to be able to represent ranges.)


<h2 id=specification-hooks>Hooks for standards</h2>

<div class=note>
 <p>The algorithms defined below (<a>UTF-8 decode</a>, <a>UTF-8 decode without BOM</a>,
 <a>UTF-8 decode without BOM or fail</a>, and <a>UTF-8 encode</a>) are intended for usage by other
 standards.

 <p>For decoding, <a>UTF-8 decode</a> is to be used by new formats. For identifiers or byte
 sequences within a format or protocol, use <a>UTF-8 decode without BOM</a> or
 <a>UTF-8 decode without BOM or fail</a>.

 <p>For encoding, <a>UTF-8 encode</a> is to be used.

 <p>Standards are to ensure that the input I/O queues they pass to <a>UTF-8 encode</a> (as well as
 the legacy <a>encode</a>) are effectively I/O queues of scalar values, i.e., they contain no
 <a>surrogates</a>.

 <p>These hooks (as well as <a>decode</a> and <a>encode</a>) will block until the input I/O queue
 has been consumed in its entirety. In order to use the output tokens as they are pushed into the
 stream, callers are to invoke the hooks with an empty output I/O queue and read from it
 <a>in parallel</a>. Note that some care is needed when using
 <a>UTF-8 decode without BOM or fail</a>, as any error found during decoding will prevent the
 <a>end-of-queue</a> item from ever being pushed into the output I/O queue.
</div>

<p>To <dfn export>UTF-8 decode</dfn> an I/O queue of bytes <var>ioQueue</var> given an optional I/O
queue of scalar values <var>output</var> (default « »), run these steps:

<ol>
 <li><p>Let <var>buffer</var> be the result of <a for="I/O queue">peeking</a> three bytes from
 <var>ioQueue</var>, converted to a byte sequence.

 <li><p>If <var>buffer</var> is 0xEF 0xBB 0xBF, then <a for="I/O queue">read</a> three bytes from
 <var>ioQueue</var>. (Do nothing with those bytes.)

 <li><p><a>Process a queue</a> with an instance of <a>UTF-8</a>'s <a for=/>decoder</a>,
 <var>ioQueue</var>, <var>output</var>, and "<code>replacement</code>".

 <li><p>Return <var>output</var>.
</ol>

<p>To <dfn export>UTF-8 decode without BOM</dfn> an I/O queue of bytes <var>ioQueue</var> given an
optional I/O queue of scalar values <var>output</var> (default « »), run these steps:

<ol>
 <li><p><a>Process a queue</a> with an instance of <a>UTF-8</a>'s <a for=/>decoder</a>,
 <var>ioQueue</var>, <var>output</var>, and "<code>replacement</code>".

 <li><p>Return <var>output</var>.
</ol>

<p>To <dfn export>UTF-8 decode without BOM or fail</dfn> an I/O queue of bytes <var>ioQueue</var>
given an optional I/O queue of scalar values <var>output</var> (default « »), run these steps:
<!-- Needed by https://tools.ietf.org/html/rfc6455#section-8.1 and
     https://webassembly.github.io/spec/js-api/#dom-module-customsections-moduleobject-sectionname
     -->

<ol>
 <li><p>Let <var>potentialError</var> be the result of <a>processing a queue</a> with an instance of
 <a>UTF-8</a>'s <a for=/>decoder</a>, <var>ioQueue</var>, <var>output</var>, and
 "<code>fatal</code>".

 <li><p>If <var>potentialError</var> is an <a>error</a>, then return failure.

 <li><p>Return <var>output</var>.
</ol>

<hr>

<p>To <dfn export>UTF-8 encode</dfn> an I/O queue of scalar values <var>ioQueue</var> given an
optional I/O queue of bytes <var>output</var> (default « »), return the result of
<a lt=encode for=/>encoding</a> <var>ioQueue</var> with encoding <a>UTF-8</a> and <var>output</var>.


<h3 id=legacy-hooks>Legacy hooks for standards</h3>

<div class=note>
 <p>Standards are strongly discouraged from using <a>decode</a>, <a>BOM sniff</a>, and
 <a for=/>encode</a>, except as needed for compatibility. Standards needing these legacy hooks will
 most likely also need to use <a>get an encoding</a> (to turn a <a>label</a> into an
 <a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an <a for=/>encoding</a> into
 another <a for=/>encoding</a> that is suitable to pass into <a>encode</a>).

 <p>For the extremely niche case of URL percent-encoding, custom encoder error handling is needed.
 The <a>get an encoder</a> and <a>encode or fail</a> algorithms are to be used for that. Other
 algorithms are not to be used directly.
</div>

<p>To <dfn export>decode</dfn> an I/O queue of bytes <var>ioQueue</var> given a fallback encoding
<var>encoding</var> and an optional I/O queue of scalar values <var>output</var> (default « »), run
these steps:

<ol>
 <li><p>Let <var>BOMEncoding</var> be the result of <a>BOM sniffing</a> <var>ioQueue</var>.

 <li>
  <p>If <var>BOMEncoding</var> is non-null:

  <ol>
   <li><p>Set <var>encoding</var> to <var>BOMEncoding</var>.

   <li><p><a>Read</a> three bytes from <var>ioQueue</var>, if <var>BOMEncoding</var> is
   <a>UTF-8</a>; otherwise <a>read</a> two bytes. (Do nothing with those bytes.)
  </ol>

  <p class=note>For compatibility with deployed content, the byte order mark is more authoritative
  than anything else. In a context where HTTP is used this is in violation of the semantics of the
  `<code>Content-Type</code>` header.

 <li><p><a>Process a queue</a> with an instance of <var>encoding</var>'s <a for=/>decoder</a>,
 <var>ioQueue</var>, <var>output</var>, and "<code>replacement</code>".

 <li><p>Return <var>output</var>.
</ol>

<p>To <dfn export>BOM sniff</dfn> an I/O queue of bytes <var>ioQueue</var>, run these steps:

<ol>
 <li><p>Let <var>BOM</var> be the result of <a for="I/O queue">peeking</a> 3 bytes from
 <var>ioQueue</var>, converted to a byte sequence.

 <li>
  <p>For each of the rows in the table below, starting with the first one and going down, if
  <var>BOM</var> <a for="byte sequence">starts with</a> the bytes given in the first column, then
  return the <a for=/>encoding</a> given in the cell in the second column of that row. Otherwise,
  return null.

  <table>
   <tbody><tr><th>Byte order mark<th>Encoding
   <tr><td>0xEF 0xBB 0xBF<td><a>UTF-8</a>
   <tr><td>0xFE 0xFF<td><a>UTF-16BE</a>
   <tr><td>0xFF 0xFE<td><a>UTF-16LE</a>
  </table>
</ol>

<p class=note>This hook is a workaround for the fact that <a>decode</a> has no way to communicate
back to the caller that it has found a byte order mark and is therefore not using the provided
encoding. The hook is to be invoked before <a>decode</a>, and it will return an encoding
corresponding to the byte order mark found, or null otherwise.

<hr>

<p>To <dfn export>encode</dfn> an I/O queue of scalar values <var>ioQueue</var> given an encoding
<var>encoding</var> and an optional I/O queue of bytes <var>output</var> (default « »), run these
steps:

<ol>
 <li><p>Let <var>encoder</var> be the result of <a>getting an encoder</a> from <var>encoding</var>.

 <li><p><a>Process a queue</a> with <var>encoder</var>, <var>ioQueue</var>, <var>output</var>, and
 "<code>html</code>".

 <li><p>Return <var>output</var>.
</ol>

<p class="note no-backref">This is a legacy hook for HTML forms. Layering <a>UTF-8 encode</a> on top
is safe as it never triggers <a>errors</a>. [[HTML]]

<hr>

<p>To <dfn export lt="get an encoder|getting an encoder">get an encoder</dfn> from an
<a for=/>encoding</a> <var>encoding</var>:

<ol>
 <li><p>Assert: <var>encoding</var> is not <a>replacement</a> or <a>UTF-16BE/LE</a>.

 <li><p>Return an instance of <var>encoding</var>'s <a for=/>encoder</a>.
</ol>

<p>To <dfn export>encode or fail</dfn> an I/O queue of scalar values <var>ioQueue</var> given an
<a for=/>encoder</a> instance <var>encoder</var> and an I/O queue of bytes <var>output</var>, run
these steps:

<ol>
 <li><p>Let <var>potentialError</var> be the result of <a>processing a queue</a> with
 <var>encoder</var>, <var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>".

 <li><p><a for="I/O queue">Push</a> <a>end-of-queue</a> to <var>output</var>.

 <li><p>If <var>potentialError</var> is an <a>error</a>, then return <a>error</a>'s
 <a>code point</a>'s <a for="code point">value</a>.

 <li><p>Return null.
</ol>

<div class=note id=pit-of-iso-2022-jp>
 <p>This is a legacy hook for URL percent-encoding. The caller will have to keep an
 <a for=/>encoder</a> instance alive as the <a>ISO-2022-JP encoder</a> can be in two different
 states when returning an <a>error</a>. That also means that if the caller emits bytes to encode the
 error in some way, these have to be in the range 0x00 to 0x7F, inclusive, excluding 0x0E, 0x0F,
 0x1B, 0x5C, and 0x7E. [[URL]]

 <p>In particular, if upon returning an <a>error</a> the <a>ISO-2022-JP encoder</a> is in the
 <a lt="ISO-2022-JP decoder Roman">Roman</a> state, the caller cannot output 0x5C (\) as it will not
 decode as U+005C (\). For this reason, applications using <a>encode or fail</a> for unintended
 purposes ought to take care to prevent the use of the <a>ISO-2022-JP encoder</a> in combination
 with replacement schemes, such as those of JavaScript and CSS, that use U+005C (\) as part of the
 replacement syntax (e.g., <code>\u2603</code>) or make sure to pass the replacement syntax through
 the encoder (in contrast to URL percent-encoding).

 <p>The return value is either the number representing the <a>code point</a> that could not be
 encoded or null, if there was no <a>error</a>. When it returns non-null the caller will have to
 invoke it again, supplying the same <a for=/>encoder</a> instance and a new output I/O queue.
</div>


<h2 id=api>API</h2>

<p>This section uses terminology from Web IDL. Browser user agents must support this API. JavaScript
implementations should support this API. Other user agents or programming languages are encouraged
to use an API suitable to their needs, which might not be this one. [[!WEBIDL]]

<div class=example id=example-textencoder>
 <p>The following example uses the {{TextEncoder}} object to encode
 an array of strings into an
 {{ArrayBuffer}}. The result is a
 {{Uint8Array}} containing the number
 of strings (as a {{Uint32Array}}),
 followed by the length of the first string (as a
 {{Uint32Array}}), the
 <a>UTF-8</a> encoded string data, the length of the second string (as
 a {{Uint32Array}}), the string data,
 and so on.
 <pre><code class=lang-javascript>
function encodeArrayOfStrings(strings) {
  var encoder, encoded, len, bytes, view, offset;

  encoder = new TextEncoder();
  encoded = [];

  len = Uint32Array.BYTES_PER_ELEMENT;
  for (var i = 0; i &lt; strings.length; i++) {
    len += Uint32Array.BYTES_PER_ELEMENT;
    encoded[i] = encoder.encode(strings[i]);
    len += encoded[i].byteLength;
  }

  bytes = new Uint8Array(len);
  view = new DataView(bytes.buffer);
  offset = 0;

  view.setUint32(offset, strings.length);
  offset += Uint32Array.BYTES_PER_ELEMENT;
  for (var i = 0; i &lt; encoded.length; i += 1) {
    len = encoded[i].byteLength;
    view.setUint32(offset, len);
    offset += Uint32Array.BYTES_PER_ELEMENT;
    bytes.set(encoded[i], offset);
    offset += len;
  }
  return bytes.buffer;
}</code></pre>

 <p>The following example decodes an {{ArrayBuffer}} containing data encoded in the
 format produced by the previous example, or an equivalent algorithm for encodings other than
 <a>UTF-8</a>, back into an array of strings.

 <pre><code class=lang-javascript>
function decodeArrayOfStrings(buffer, encoding) {
  var decoder, view, offset, num_strings, strings, len;

  decoder = new TextDecoder(encoding);
  view = new DataView(buffer);
  offset = 0;
  strings = [];

  num_strings = view.getUint32(offset);
  offset += Uint32Array.BYTES_PER_ELEMENT;
  for (var i = 0; i &lt; num_strings; i++) {
    len = view.getUint32(offset);
    offset += Uint32Array.BYTES_PER_ELEMENT;
    strings[i] = decoder.decode(
      new DataView(view.buffer, offset, len));
    offset += len;
  }
  return strings;
}</code></pre>
</div>


<h3 id=interface-mixin-textdecodercommon>Interface mixin {{TextDecoderCommon}}</h3>

<pre class=idl>
interface mixin TextDecoderCommon {
  readonly attribute DOMString encoding;
  readonly attribute boolean fatal;
  readonly attribute boolean ignoreBOM;
};
</pre>

<p>The {{TextDecoderCommon}} interface mixin defines common getters that are shared between
{{TextDecoder}} and {{TextDecoderStream}} objects. These objects have an associated:

<dl>
 <dt><dfn id=textdecoder-encoding for=TextDecoderCommon>encoding</dfn>
 <dd>An <a for=/>encoding</a>.

 <dt><dfn for=TextDecoderCommon oldids=textdecoder-decoder,textdecoderstream-decoder>decoder</dfn>
 <dd>A <a for=/>decoder</a> instance.

 <dt><dfn for=TextDecoderCommon oldids=textdecoder-stream,textdecoderstream-stream,textdecodercommon-stream>I/O queue</dfn>
 <dd>An <a for=/>I/O queue</a> of bytes.

 <dt><dfn id=textdecoder-ignore-bom-flag for=TextDecoderCommon>ignore BOM</dfn>
 <dd>A boolean, initially false.

 <dt><dfn id=textdecoder-bom-seen-flag for=TextDecoderCommon>BOM seen</dfn>
 <dd>A boolean, initially false.

 <dt><dfn id=textdecoder-error-mode for=TextDecoderCommon>error mode</dfn>
 <dd>An <a for=/>error mode</a>, initially "<code>replacement</code>".
</dl>

<p>The <dfn id=concept-td-serialize>serialize I/O queue</dfn> algorithm, given a
{{TextDecoderCommon}} <var>decoder</var> and an <a for=/>I/O queue</a> of scalar values
<var>ioQueue</var>, runs these steps:

<ol>
 <li><p>Let <var>output</var> be the empty string.

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>item</var> be the result of <a>reading</a> from <var>ioQueue</var>.

   <li><p>If <var>item</var> is <a>end-of-queue</a>, then return <var>output</var>.

   <li>
    <p>If <var>decoder</var>'s <a for=TextDecoderCommon>encoding</a> is <a>UTF-8</a> or
    <a>UTF-16BE/LE</a>, and <var>decoder</var>'s <a for=TextDecoderCommon>ignore BOM</a> and
    <a for=TextDecoderCommon>BOM seen</a> are false, then:

    <ol>
     <li><p>Set <var>decoder</var>'s <a for=TextDecoderCommon>BOM seen</a> to true.

     <li><p>If <var>item</var> is U+FEFF, then <a for=iteration>continue</a>.
    </ol>

   <li><p>Append <var>item</var> to <var>output</var>.
  </ol>
</ol>

<p class=note>This algorithm is intentionally different with respect to BOM handling from
the <a for=/>decode</a> algorithm used by the rest of the platform to give API users more
control.

<hr>

<p>The <dfn attribute id=dom-textdecoder-encoding for=TextDecoderCommon><code>encoding</code></dfn>
getter steps are to return <a>this</a>'s <a for=TextDecoderCommon>encoding</a>'s
<a for=encoding>name</a>, <a>ASCII lowercased</a>.

<p>The <dfn attribute id=dom-textdecoder-fatal for=TextDecoderCommon><code>fatal</code></dfn> getter
steps are to return true if <a>this</a>'s <a for=TextDecoderCommon>error mode</a> is
"<code>fatal</code>", otherwise false.

<p>The
<dfn attribute id=dom-textdecoder-ignorebom for=TextDecoderCommon><code>ignoreBOM</code></dfn>
getter steps are to return <a>this</a>'s <a for=TextDecoderCommon>ignore BOM</a>.


<h3 id=interface-textdecoder>Interface {{TextDecoder}}</h3>

<pre class=idl>
dictionary TextDecoderOptions {
  boolean fatal = false;
  boolean ignoreBOM = false;
};

dictionary TextDecodeOptions {
  boolean stream = false;
};

[Exposed=*]
interface TextDecoder {
  constructor(optional DOMString label = "utf-8", optional TextDecoderOptions options = {});

  USVString decode(optional [AllowShared] BufferSource input, optional TextDecodeOptions options = {});
};
TextDecoder includes TextDecoderCommon;
</pre>

<p>A {{TextDecoder}} object has an associated
<dfn for=TextDecoder id=textdecoder-do-not-flush-flag>do not flush</dfn>, which is a boolean,
initially false.

<dl class=domintro>
 <dt><code><var>decoder</var> = new <a constructor for=TextDecoder lt=TextDecoder()>TextDecoder([<var>label</var> = "utf-8" [, <var>options</var>]])</a></code>
 <dd>
  <p>Returns a new {{TextDecoder}} object.
  <p>If <var>label</var> is either not a <a>label</a> or is a
  <a>label</a> for <a>replacement</a>,
  <a>throws</a> a
  {{RangeError}}.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>encoding</a></code>
 <dd><p>Returns <a for=TextDecoderCommon>encoding</a>'s <a>name</a>, lowercased.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>fatal</a></code>
 <dd><p>Returns true if <a for=TextDecoderCommon>error mode</a> is "<code>fatal</code>", otherwise
 false.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>ignoreBOM</a></code>
 <dd><p>Returns the value of <a for=TextDecoderCommon>ignore BOM</a>.

 <dt><code><var>decoder</var> . <a method for=TextDecoder lt=decode()>decode([<var>input</var> [, <var>options</var>]])</a></code>
 <dd>
  <p>Returns the result of running <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a>.
  The method can be invoked zero or more times with <var>options</var>'s <code>stream</code> set to
  true, and then once without <var>options</var>'s <code>stream</code> (or set to false), to process
  a fragmented input. If the invocation without <var>options</var>'s <code>stream</code> (or set to
  false) has no <var>input</var>, it's clearest to omit both arguments.

  <pre class=example id=example-end-of-stream><code class=lang-javascript>
var string = "", decoder = new TextDecoder(encoding), buffer;
while(buffer = next_chunk()) {
  string += decoder.decode(buffer, {stream:true});
}
string += decoder.decode(); // end-of-queue</code></pre>

  <p>If the <a for=TextDecoderCommon>error mode</a> is "<code>fatal</code>" and
  <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a> returns <a>error</a>,
  <a>throws</a> a {{TypeError}}.
</dl>

<p>The
<dfn constructor for=TextDecoder lt="TextDecoder(label, options)" id=dom-textdecoder><code>new TextDecoder(<var>label</var>, <var>options</var>)</code></dfn>
constructor steps are:

<ol>
 <li><p>Let <var>encoding</var> be the result of <a>getting an encoding</a> from <var>label</var>.

 <li><p>If <var>encoding</var> is failure or <a>replacement</a>, then <a>throw</a> a {{RangeError}}.

 <li><p>Set <a>this</a>'s <a for=TextDecoderCommon>encoding</a> to <var>encoding</var>.

 <li><p>If <var>options</var>["{{TextDecoderOptions/fatal}}"] is true, then set <a>this</a>'s
 <a for=TextDecoderCommon>error mode</a> to "<code>fatal</code>".

 <li><p>Set <a>this</a>'s <a for=TextDecoderCommon>ignore BOM</a> to
 <var>options</var>["{{TextDecoderOptions/ignoreBOM}}"].
</ol>

<p>The <dfn method for=TextDecoder><code>decode(<var>input</var>, <var>options</var>)</code></dfn>
method steps are:

<ol>
 <li><p>If <a>this</a>'s <a for=TextDecoder>do not flush</a> is false, then set <a>this</a>'s
 <a for=TextDecoderCommon>decoder</a> to a new instance of <a>this</a>'s
 <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a>, <a>this</a>'s
 <a for=TextDecoderCommon>I/O queue</a> to the <a for=/>I/O queue</a> of bytes
 « <a>end-of-queue</a> », and <a>this</a>'s <a for=TextDecoderCommon>BOM seen</a> to false.

 <li><p>Set <a>this</a>'s <a for=TextDecoder>do not flush</a> to
 <var>options</var>["{{TextDecodeOptions/stream}}"].

 <li>
  <p>If <var>input</var> is given, then <a>push</a> a
  <a lt="get a copy of the buffer source">copy of</a> <var>input</var> to <a>this</a>'s
  <a for=TextDecoderCommon>I/O queue</a>.

  <p class=note>Implementations are strongly encouraged to use an implementation strategy that
  avoids this copy. When doing so they will have to make sure that changes to <var>input</var> do
  not affect future calls to <a method><code>decode()</code></a>.

  <p class=warning id=sharedarraybuffer-warning>The memory exposed by <code>SharedArrayBuffer</code>
  objects does not adhere to data race freedom properties required by the memory model of
  programming languages typically used for implementations. When implementing, take care to use the
  appropriate facilities when accessing memory exposed by <code>SharedArrayBuffer</code> objects.

 <li><p>Let <var>output</var> be the <a for=/>I/O queue</a> of scalar values
 « <a>end-of-queue</a> ».

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>item</var> be the result of <a>reading</a> from <a>this</a>'s
   <a for=TextDecoderCommon>I/O queue</a>.

   <li>
    <p>If <var>item</var> is <a>end-of-queue</a> and <a>this</a>'s
    <a for=TextDecoder>do not flush</a> is true, then return the result of running
    <a>serialize I/O queue</a> with <a>this</a> and <var>output</var>.

    <p class=note>The way streaming works is to not handle <a>end-of-queue</a> here when
    <a>this</a>'s <a for=TextDecoder>do not flush</a> is true and to not set it to false. That way
    in a subsequent invocation <a>this</a>'s <a for=TextDecoderCommon>decoder</a> is not set anew in
    the first step of the algorithm and its state is preserved.

   <li>
    <p>Otherwise:

    <ol>
     <li><p>Let <var>result</var> be the result of <a>processing an item</a> with <var>item</var>,
     <a>this</a>'s <a for=TextDecoderCommon>decoder</a>, <a>this</a>'s
     <a for=TextDecoderCommon>I/O queue</a>, <var>output</var>, and <a>this</a>'s
     <a for=TextDecoderCommon>error mode</a>.

     <li><p>If <var>result</var> is <a>finished</a>, then return the result of running
     <a>serialize I/O queue</a> with <a>this</a> and <var>output</var>.

     <li><p>Otherwise, if <var>result</var> is <a>error</a>, <a>throw</a> a {{TypeError}}.
    </ol>
  </ol>
</ol>

<h3 id=interface-mixin-textencodercommon>Interface mixin {{TextEncoderCommon}}</h3>

<pre class=idl>
interface mixin TextEncoderCommon {
  readonly attribute DOMString encoding;
};
</pre>

<p>The {{TextEncoderCommon}} interface mixin defines common getters that are shared between
{{TextEncoder}} and {{TextEncoderStream}} objects.

<p>The <dfn attribute id=dom-textencoder-encoding for=TextEncoderCommon><code>encoding</code></dfn>
getter steps are to return "<code>utf-8</code>".


<h3 id=interface-textencoder>Interface {{TextEncoder}}</h3>

<pre class=idl>
dictionary TextEncoderEncodeIntoResult {
  unsigned long long read;
  unsigned long long written;
};

[Exposed=*]
interface TextEncoder {
  constructor();

  [NewObject] Uint8Array encode(optional USVString input = "");
  TextEncoderEncodeIntoResult encodeInto(USVString source, [AllowShared] Uint8Array destination);
};
TextEncoder includes TextEncoderCommon;
</pre>

<p class="note no-backref">A {{TextEncoder}} object offers no <var>label</var> argument as it only
supports <a>UTF-8</a>. It also offers no <code>stream</code> option as no <a for=/>encoder</a>
requires buffering of scalar values.

<hr>

<dl class=domintro>
 <dt><code><var>encoder</var> = new <a constructor for=TextEncoder>TextEncoder()</a></code>
 <dd><p>Returns a new {{TextEncoder}} object.

 <dt><code><var>encoder</var> . <a attribute for=TextEncoderCommon>encoding</a></code>
 <dd><p>Returns "<code>utf-8</code>".

 <dt><code><var>encoder</var> . <a method for=TextEncoder lt=encode()>encode([<var>input</var> = ""])</a></code>
 <dd><p>Returns the result of running <a>UTF-8</a>'s <a for=/>encoder</a>.

 <dt><code><var>encoder</var> . <a method=for=TextEncoder lt="encodeInto(source, destination)">encodeInto(<var>source</var>, <var>destination</var>)</a></code>
 <dd><p>Runs the <a>UTF-8 encoder</a> on <var>source</var>, stores the result of that operation into
 <var>destination</var>, and returns the progress made as an object wherein
 {{TextEncoderEncodeIntoResult/read}} is the number of converted <a>code units</a> of
 <var>source</var> and {{TextEncoderEncodeIntoResult/written}} is the number of bytes modified in
 <var>destination</var>.
</dl>

<p>The
<dfn constructor for=TextEncoder lt=TextEncoder() id=dom-textencoder><code>new TextEncoder()</code></dfn>
constructor steps are to do nothing.

<p>The <dfn method for=TextEncoder><code>encode(<var>input</var>)</code></dfn> method steps are:

<ol>
 <li><p><a for="to I/O queue">Convert</a> <var>input</var> to an <a for=/>I/O queue</a> of scalar
 values.

 <li><p>Let <var>output</var> be the <a for=/>I/O queue</a> of bytes « <a>end-of-queue</a> ».

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>item</var> be the result of
   <a>reading</a> from <var>input</var>.

   <li><p>Let <var>result</var> be the result of <a>processing an item</a> with <var>item</var>, an
   instance of the <a>UTF-8 encoder</a>, <var>input</var>, <var>output</var>, and
   "<code>fatal</code>".

   <li>
    <p>Assert: <var>result</var> is not an <a>error</a>.

    <p class=note>The <a>UTF-8 encoder</a> cannot return <a>error</a>.

   <li><p>If <var>result</var> is <a>finished</a>, then <a for="from I/O queue">convert</a>
   <var>output</var> into a byte sequence and return a {{Uint8Array}} object wrapping an
   {{ArrayBuffer}} containing <var>output</var>.
   <!-- XXX https://www.w3.org/Bugs/Public/show_bug.cgi?id=26966 -->
  </ol>
</ol>

<p>The
<dfn method for=TextEncoder><code>encodeInto(<var>source</var>, <var>destination</var>)</code></dfn>
method steps are:

<ol>
 <li><p>Let <var>read</var> be 0.

 <li><p>Let <var>written</var> be 0.

 <li><p>Let <var>encoder</var> be an instance of the <a>UTF-8 encoder</a>.

 <li>
  <p>Let <var>unused</var> be the <a for=/>I/O queue</a> of scalar values « <a>end-of-queue</a> ».

  <p class=note>The <a>handler</a> algorithm invoked below requires this argument, but it is not
  used by the <a>UTF-8 encoder</a>.

 <li><p><a for="to I/O queue">Convert</a> <var>source</var> to an <a for=/>I/O queue</a> of scalar
 values.

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>item</var> be the result of <a>reading</a> from <var>source</var>.

   <li><p>Let <var>result</var> be the result of running <var>encoder</var>'s <a>handler</a> on
   <var>unused</var> and <var>item</var>.

   <li><p>If <var>result</var> is <a>finished</a>, then <a for=iteration>break</a>.

   <li>
    <p>Otherwise:

    <ol>
     <li>
      <p>If <var>destination</var>'s <a for="BufferSource">byte length</a> &minus;
      <var>written</var> is greater than or equal to the number of bytes in <var>result</var>, then:

      <ol>
       <li><p>If <var>item</var> is greater than U+FFFF, then increment <var>read</var> by 2.

       <li><p>Otherwise, increment <var>read</var> by 1.

       <li>
        <p><a for="ArrayBufferView">Write</a> the bytes in <var>result</var> into
        <var>destination</var>, with <a for="ArrayBufferView/write"><i>startingOffset</i></a> set to
        <var>written</var>.

        <p class=warning>See the
        <a href=#sharedarraybuffer-warning>warning for <code>SharedArrayBuffer</code> objects</a>
        above.

       <li><p>Increment <var>written</var> by the number of bytes in <var>result</var>.
      </ol>

     <li><p>Otherwise, <a for=iteration>break</a>.
    </ol>
  </ol>

 <li><p>Return «[ "{{TextEncoderEncodeIntoResult/read}}" → <var>read</var>,
 "{{TextEncoderEncodeIntoResult/written}}" → <var>written</var> ]».
</ol>

<div class=example id=example-textencoder-encodeinto>
 <p>The <a method=for=TextEncoder lt="encodeInto(source, destination)">encodeInto()</a> method can
 be used to encode a string into an existing {{ArrayBuffer}} object. Various details below are left
 as an exercise for the reader, but this demonstrates an approach one could take to use this method:

 <pre><code class=lang-javascript>
function convertString(buffer, input, callback) {
  let bufferSize = 256,
      bufferStart = malloc(buffer, bufferSize),
      writeOffset = 0,
      readOffset = 0;
  while (true) {
    const view = new Uint8Array(buffer, bufferStart + writeOffset, bufferSize - writeOffset),
          {read, written} = cachedEncoder.encodeInto(input.substring(readOffset), view);
    readOffset += read;
    writeOffset += written;
    if (readOffset === input.length) {
      callback(bufferStart, writeOffset);
      free(buffer, bufferStart);
      return;
    }
    bufferSize *= 2;
    bufferStart = realloc(buffer, bufferStart, bufferSize);
  }
}
</code></pre>
</div>


<h3 id=interface-textdecoderstream>Interface {{TextDecoderStream}}</h3>

<pre class=idl>
[Exposed=*]
interface TextDecoderStream {
  constructor(optional DOMString label = "utf-8", optional TextDecoderOptions options = {});
};
TextDecoderStream includes TextDecoderCommon;
TextDecoderStream includes GenericTransformStream;
</pre>

<dl class=domintro>
 <dt><code><var>decoder</var> = new
 <a constructor for=TextDecoderStream lt=TextDecoderStream()>TextDecoderStream([<var>label</var> =
 "utf-8" [, <var>options</var>]])</a></code>
 <dd>
  <p>Returns a new {{TextDecoderStream}} object.
  <p>If <var>label</var> is either not a <a>label</a> or is a <a>label</a> for <a>replacement</a>,
  <a>throws</a> a {{RangeError}}.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>encoding</a></code>
 <dd><p>Returns <a for=TextDecoderCommon>encoding</a>'s <a>name</a>, lowercased.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>fatal</a></code>
 <dd><p>Returns true if <a for=TextDecoderCommon>error mode</a> is "<code>fatal</code>", and
 false otherwise.

 <dt><code><var>decoder</var> . <a attribute for=TextDecoderCommon>ignoreBOM</a></code>
 <dd><p>Returns the value of <a for=TextDecoderCommon>ignore BOM</a>.

 <dt><code><var>decoder</var> . <a attribute for=GenericTransformStream>readable</a></code>
 <dd>
  <p>Returns a <a>readable stream</a> whose <a>chunks</a> are strings resulting from running
  <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a> on the chunks written to
  {{GenericTransformStream/writable}}.

 <dt><code><var>decoder</var> . <a attribute for=GenericTransformStream>writable</a></code>
 <dd>
  <p>Returns a <a>writable stream</a> which accepts
  <code>[<a extended-attribute>AllowShared</a>] <a typedef>BufferSource</a></code> chunks and runs
  them through <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a> before making them
  available to {{GenericTransformStream/readable}}.

  <p>Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a
  {{ReadableStream}} source.

  <pre class=example id=example-textdecoderstream-writable><code class=lang-javascript>
var decoder = new TextDecoderStream(encoding);
byteReadable
  .pipeThrough(decoder)
  .pipeTo(textWritable);</code></pre>

  <p>If the <a for=TextDecoderCommon>error mode</a> is "<code>fatal</code>" and
  <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a> returns <a>error</a>, both
  {{GenericTransformStream/readable}} and {{GenericTransformStream/writable}} will be errored with a
  {{TypeError}}.
</dl>

<p>The
<dfn constructor for=TextDecoderStream lt="TextDecoderStream(label, options)" id=dom-textdecoderstream><code>new TextDecoderStream(<var>label</var>, <var>options</var>)</code></dfn>
constructor steps are:

<ol>
 <li><p>Let <var>encoding</var> be the result of <a>getting an encoding</a> from <var>label</var>.

 <li><p>If <var>encoding</var> is failure or <a>replacement</a>, then <a>throw</a> a {{RangeError}}.

 <li><p>Set <a>this</a>'s <a for=TextDecoderCommon>encoding</a> to <var>encoding</var>.

 <li><p>If <var>options</var>["{{TextDecoderOptions/fatal}}"] is true, then set <a>this</a>'s
 <a for=TextDecoderCommon>error mode</a> to "<code>fatal</code>".

 <li><p>Set <a>this</a>'s <a for=TextDecoderCommon>ignore BOM</a> to
 <var>options</var>["{{TextDecoderOptions/ignoreBOM}}"].

 <li><p>Set <a>this</a>'s <a for=TextDecoderCommon>decoder</a> to a new instance of <a>this</a>'s
 <a for=TextDecoderCommon>encoding</a>'s <a for=/>decoder</a>, and set <a>this</a>'s
 <a for=TextDecoderCommon>I/O queue</a> to a new <a for=/>I/O queue</a>.

 <li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument
 and runs the <a>decode and enqueue a chunk</a> algorithm with <a>this</a> and <var>chunk</var>.

 <li><p>Let <var>flushAlgorithm</var> be an algorithm which takes no arguments and runs the
 <a>flush and enqueue</a> algorithm with <a>this</a>.

 <li><p>Let <var>transformStream</var> be a [=new=] {{TransformStream}}.

 <li><p>[=TransformStream/Set up=] <var>transformStream</var> with
 <a for="TransformStream/set up"><var ignore>transformAlgorithm</var></a> set to
 <var>transformAlgorithm</var> and
 <a for="TransformStream/set up"><var ignore>flushAlgorithm</var></a> set to
 <var>flushAlgorithm</var>.

 <li><p>Set <a>this</a>'s <a for=GenericTransformStream>transform</a> to <var>transformStream</var>.
</ol>

<p>The <dfn>decode and enqueue a chunk</dfn> algorithm, given a {{TextDecoderStream}} object
<var>decoder</var> and a <var>chunk</var>, runs these steps:

<ol>
 <li><p>Let <var>bufferSource</var> be the result of
 <a lt="converted to an IDL value">converting</a> <var>chunk</var> to an
 <code>[<a extended-attribute>AllowShared</a>] <a typedef>BufferSource</a></code>.

 <li>
  <p><a>Push</a> a <a lt="get a copy of the buffer source">copy of</a> <var>bufferSource</var> to
  <var>decoder</var>'s <a for=TextDecoderCommon>I/O queue</a>.

  <p class=warning>See the
  <a href=#sharedarraybuffer-warning>warning for <code>SharedArrayBuffer</code> objects</a> above.

 <li><p>Let <var>output</var> be the <a for=/>I/O queue</a> of scalar values
 « <a>end-of-queue</a> ».

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>item</var> be the result of <a>reading</a> from <var>decoder</var>'s
   <a for=TextDecoderCommon>I/O queue</a>.

   <li>
    <p>If <var>item</var> is <a>end-of-queue</a>, then:

    <ol>
     <li><p>Let <var>outputChunk</var> be the result of running <a>serialize I/O queue</a> with
     <var>decoder</var> and <var>output</var>.

     <li><p>If <var>outputChunk</var> is non-empty, then <a for=TransformStream>enqueue</a>
     <var>outputChunk</var> in <var>decoder</var>'s <a for=GenericTransformStream>transform</a>.

     <li><p>Return.
    </ol>

   <li><p>Let <var>result</var> be the result of <a>processing an item</a> with <var>item</var>,
   <var>decoder</var>'s <a for=TextDecoderCommon>decoder</a>, <var>decoder</var>'s
   <a for=TextDecoderCommon>I/O queue</a>, <var>output</var>, and <var>decoder</var>'s
   <a for=TextDecoderCommon>error mode</a>.

   <li><p>If <var>result</var> is <a>error</a>, then <a>throw</a> a {{TypeError}}.
  </ol>
</ol>

<p>The <dfn>flush and enqueue</dfn> algorithm, which handles the end of data from the input
{{ReadableStream}} object, given a {{TextDecoderStream}} object <var>decoder</var>, runs these
steps:

<ol>
 <li><p>Let <var>output</var> be the <a for=/>I/O queue</a> of scalar values
 « <a>end-of-queue</a> ».

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>item</var> be the result of <a>reading</a> from <var>decoder</var>'s
   <a for=TextDecoderCommon>I/O queue</a>.

   <li><p>Let <var>result</var> be the result of <a>processing an item</a> with <var>item</var>,
   <var>decoder</var>'s <a for=TextDecoderCommon>decoder</a>, <var>decoder</var>'s
   <a for=TextDecoderCommon>I/O queue</a>, <var>output</var>, and <var>decoder</var>'s
   <a for=TextDecoderCommon>error mode</a>.

   <li>
    <p>If <var>result</var> is <a>finished</a>, then:

    <ol>
     <li><p>Let <var>outputChunk</var> be the result of running <a>serialize I/O queue</a> with
     <var>decoder</var> and <var>output</var>.

     <li><p>If <var>outputChunk</var> is non-empty, then <a for=TransformStream>enqueue</a>
     <var>outputChunk</var> in <var>decoder</var>'s <a for=GenericTransformStream>transform</a>.

     <li><p>Return.
    </ol>
   </li>

   <li><p>Otherwise, if <var>result</var> is <a>error</a>, <a>throw</a> a {{TypeError}}.
  </ol>
 </li>
</ol>


<h3 id=interface-textencoderstream>Interface {{TextEncoderStream}}</h3>

<pre class=idl>
[Exposed=*]
interface TextEncoderStream {
  constructor();
};
TextEncoderStream includes TextEncoderCommon;
TextEncoderStream includes GenericTransformStream;
</pre>

<p>A {{TextEncoderStream}} object has an associated:

<dl>
 <dt><dfn for=TextEncoderStream>encoder</dfn>
 <dd>An <a for=/>encoder</a> instance.

 <dt><dfn for=TextEncoderStream>pending high surrogate</dfn>
 <dd>Null or a <a for=/>surrogate</a>, initially null.
</dl>

<p class="note no-backref">A {{TextEncoderStream}} object offers no <var>label</var> argument as it
only supports <a>UTF-8</a>.

<dl class=domintro>
 <dt><code><var>encoder</var> = new <a constructor for=TextEncoderStream>TextEncoderStream()</a></code>
 <dd><p>Returns a new {{TextEncoderStream}} object.

 <dt><code><var>encoder</var> . <a attribute for=TextEncoderCommon>encoding</a></code>
 <dd><p>Returns "<code>utf-8</code>".

 <dt><code><var>encoder</var> . <a attribute for=GenericTransformStream>readable</a></code>
 <dd>
  <p>Returns a <a>readable stream</a> whose <a>chunks</a> are {{Uint8Array}}s resulting from running
  <a>UTF-8</a>'s <a for=/>encoder</a> on the chunks written to {{GenericTransformStream/writable}}.

 <dt><code><var>encoder</var> . <a attribute for=GenericTransformStream>writable</a></code>
 <dd>
  <p>Returns a <a>writable stream</a> which accepts string chunks and runs them through
  <a>UTF-8</a>'s <a for=/>encoder</a> before making them available to
  {{GenericTransformStream/readable}}.

  <p>Typically this will be used via the {{ReadableStream/pipeThrough()}} method on a
  {{ReadableStream}} source.

  <pre class=example id=example-textencoderstream-writable><code class=lang-javascript>
textReadable
  .pipeThrough(new TextEncoderStream())
  .pipeTo(byteWritable);</code></pre>
</dl>

<p>The
<dfn constructor for=TextEncoderStream lt=TextEncoderStream() id=dom-textencoderstream><code>new TextEncoderStream()</code></dfn>
constructor steps are:

<ol>
 <li><p>Set <a>this</a>'s <a for=TextEncoderStream>encoder</a> to an instance of the
 <a>UTF-8 encoder</a>.

 <li><p>Let <var>transformAlgorithm</var> be an algorithm which takes a <var>chunk</var> argument
 and runs the <a>encode and enqueue a chunk</a> algorithm with <a>this</a> and <var>chunk</var>.

 <li><p>Let <var>flushAlgorithm</var> be an algorithm which runs the <a>encode and flush</a>
 algorithm with <a>this</a>.

 <li><p>Let <var>transformStream</var> be a [=new=] {{TransformStream}}.

 <li><p>[=TransformStream/Set up=] <var>transformStream</var> with
 <a for="TransformStream/set up"><var ignore>transformAlgorithm</var></a> set to
 <var>transformAlgorithm</var> and
 <a for="TransformStream/set up"><var ignore>flushAlgorithm</var></a> set to
 <var>flushAlgorithm</var>.

 <li><p>Set <a>this</a>'s <a for=GenericTransformStream>transform</a> to <var>transformStream</var>.
</ol>

<hr>

<p>The <dfn>encode and enqueue a chunk</dfn> algorithm, given a {{TextEncoderStream}} object
<var>encoder</var> and <var>chunk</var>, runs these steps:

<ol>
 <li><p>Let <var>input</var> be the result of <a lt="converted to an IDL value">converting</a>
 <var>chunk</var> to a {{DOMString}}.

 <li><p><a for="to I/O queue">Convert</a> <var>input</var> to an <a for=/>I/O queue</a> of
 <a>code units</a>.

 <p class=note>{{DOMString}}, as well as an <a for=/>I/O queue</a> of code units rather than scalar
 values, are used here so that a surrogate pair that is split between chunks can be reassembled into
 the appropriate scalar value. The behavior is otherwise identical to {{USVString}}. In particular,
 lone surrogates will be replaced with U+FFFD.

 <li><p>Let <var>output</var> be the <a for=/>I/O queue</a> of bytes « <a>end-of-queue</a> ».

 <li>
  <p>While true:

  <ol>
   <li><p>Let <var>item</var> be the result of <a>reading</a> from <var>input</var>.

   <li>
    <p>If <var>item</var> is <a>end-of-queue</a>, then:

    <ol>
     <li><p><a for="from I/O queue">Convert</a> <var>output</var> into a byte sequence.

     <li>
      <p>If <var>output</var> is non-empty, then:

      <ol>
       <li><p>Let <var>chunk</var> be a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing
       <var>output</var>.

       <li><p><a for=TransformStream>Enqueue</a> <var>chunk</var> into <var>encoder</var>'s
       <a for=GenericTransformStream>transform</a>.
      </ol>

     <li><p>Return.
    </ol>

   <li><p>Let <var>result</var> be the result of executing the <a>convert code unit to scalar
   value</a> algorithm with <var>encoder</var>, <var>item</var> and <var>input</var>.

   <li><p>If <var>result</var> is not <a>continue</a>, then <a>process an item</a> with
   <var>result</var>, <var>encoder</var>'s <a for=TextEncoderStream>encoder</a>, <var>input</var>,
   <var>output</var>, and "<code>fatal</code>".
  </ol>
</ol>

<p>The <dfn>convert code unit to scalar value</dfn> algorithm, given a {{TextEncoderStream}} object
<var>encoder</var>, a <a>code unit</a> <var>item</var>, and an <a for=/>I/O queue</a> of code units
<var>input</var>, runs these steps:

<ol>
 <li>
  <p>If <var>encoder</var>'s <a>pending high surrogate</a> is non-null, then:

  <ol>
   <li><p>Let <var>high surrogate</var> be <var>encoder</var>'s <a>pending high surrogate</a>.

   <li><p>Set <var>encoder</var>'s <a>pending high surrogate</a> to null.

   <li><p>If <var>item</var> is in the range U+DC00 to U+DFFF, inclusive, then return a scalar value
   whose value is 0x10000 + ((<var>high surrogate</var> &minus; 0xD800) &lt;&lt; 10) +
   (<var>item</var> &minus; 0xDC00).

   <li><p><a>Prepend</a> <var>item</var> to <var>input</var>.

   <li><p>Return U+FFFD.
  </ol>

 <li><p>If <var>item</var> is in the range U+D800 to U+DBFF, inclusive, then set <a>pending high
 surrogate</a> to <var>item</var> and return <a>continue</a>.

 <li><p>If <var>item</var> is in the range U+DC00 to U+DFFF, inclusive, then return U+FFFD.

 <li><p>Return <var>item</var>.
</ol>

<p class=note>This is equivalent to the "<a for=string>convert</a> a <a for=/>string</a> into a
<a for=/>scalar value string</a>" algorithm from the Infra Standard, but allows for surrogate pairs
that are split between strings. [[!INFRA]]

<p>The <dfn>encode and flush</dfn> algorithm, given a {{TextEncoderStream}} object
<var>encoder</var>, runs these steps:

<ol>
 <li>
  <p>If <var>encoder</var>'s <a>pending high surrogate</a> is non-null, then:

  <ol>
   <li>
    <p>Let <var>chunk</var> be a {{Uint8Array}} object wrapping an {{ArrayBuffer}} containing
    0xEF 0xBF 0xBD.

    <p class=note>This is U+FFFD (�) in <a>UTF-8</a> bytes.

   <li><p><a for=TransformStream>Enqueue</a> <var>chunk</var> into <var>encoder</var>'s
   <a for=GenericTransformStream>transform</a>.
  </ol>
</ol>


<h2 id=the-encoding>The encoding</h2>

<h3 id=utf-8 dfn export>UTF-8</h3>

<h4 id=utf-8-decoder dfn export>UTF-8 decoder</h4>

<p class="note no-backref">A byte order mark has priority over a <a>label</a> as it has been found
to be more accurate in deployed content. Therefore it is not part of the <a>UTF-8 decoder</a>
algorithm but rather the <a>decode</a> and <a>UTF-8 decode</a> algorithms.

<p><a>UTF-8</a>'s <a for=/>decoder</a> has an associated
<dfn>UTF-8 code point</dfn>, <dfn>UTF-8 bytes seen</dfn>, and
<dfn>UTF-8 bytes needed</dfn> (all initially 0), a <dfn>UTF-8 lower boundary</dfn>
(initially 0x80), and a <dfn>UTF-8 upper boundary</dfn> (initially 0xBF).

<p><a>UTF-8</a>'s <a for=/>decoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-queue</a> and
 <a>UTF-8 bytes needed</a> is not 0, set
 <a>UTF-8 bytes needed</a> to 0 and return <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li>
  <p>If <a>UTF-8 bytes needed</a> is 0, based on <var>byte</var>:

  <dl class=switch>
   <dt>0x00 to 0x7F
   <dd><p>Return a code point whose value is <var>byte</var>.

   <dt>0xC2 to 0xDF
   <dd>
    <ol>
     <li><p>Set <a>UTF-8 bytes needed</a> to 1.

     <li>
      <p>Set <a>UTF-8 code point</a> to <var>byte</var> &amp; 0x1F.

      <p class=note>The five least significant bits of <var>byte</var>.
    </ol>

   <dt>0xE0 to 0xEF
   <dd>
    <ol>
     <li><p>If <var>byte</var> is 0xE0, set
     <a>UTF-8 lower boundary</a> to 0xA0.

     <li><p>If <var>byte</var> is 0xED, set
     <a>UTF-8 upper boundary</a> to 0x9F.

     <li><p>Set <a>UTF-8 bytes needed</a> to 2.

     <li>
      <p>Set <a>UTF-8 code point</a> to <var>byte</var> &amp; 0xF.

      <p class=note>The four least significant bits of <var>byte</var>.
    </ol>

   <dt>0xF0 to 0xF4
   <dd>
    <ol>
     <li><p>If <var>byte</var> is 0xF0, set
     <a>UTF-8 lower boundary</a> to 0x90.

     <li><p>If <var>byte</var> is 0xF4, set
     <a>UTF-8 upper boundary</a> to 0x8F.

     <li><p>Set <a>UTF-8 bytes needed</a> to 3.

     <li>
      <p>Set <a>UTF-8 code point</a> to <var>byte</var> &amp; 0x7.

      <p class=note>The three least significant bits of <var>byte</var>.
    </ol>

   <dt>Otherwise
   <dd><p>Return <a>error</a>.
  </dl>

  <p>Return <a>continue</a>.

 <li>
  <p>If <var>byte</var> is not in the range <a>UTF-8 lower boundary</a> to
  <a>UTF-8 upper boundary</a>, inclusive, then:

  <ol>
   <li><p>Set <a>UTF-8 code point</a>,
   <a>UTF-8 bytes needed</a>, and <a>UTF-8 bytes seen</a> to 0,
   set <a>UTF-8 lower boundary</a> to 0x80, and set
   <a>UTF-8 upper boundary</a> to 0xBF.

   <li><p><a>Prepend</a> <var>byte</var> to <var>ioQueue</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>Set <a>UTF-8 lower boundary</a> to 0x80 and
 <a>UTF-8 upper boundary</a> to 0xBF.

 <li>
  <p>Set <a>UTF-8 code point</a> to (<a>UTF-8 code point</a> &lt;&lt; 6) |
  (<var>byte</var> &amp; 0x3F)

  <p class="note no-backref">Shift the existing bits of <a>UTF-8 code point</a> left by six
  places and set the newly-vacated six least significant bits to the six least significant bits of
  <var>byte</var>.

 <li><p>Increase <a>UTF-8 bytes seen</a> by one.

 <li><p>If <a>UTF-8 bytes seen</a> is not equal to
 <a>UTF-8 bytes needed</a>, return <a>continue</a>.

 <li><p>Let <var>code point</var> be <a>UTF-8 code point</a>.

 <li><p>Set <a>UTF-8 code point</a>,
 <a>UTF-8 bytes needed</a>, and <a>UTF-8 bytes seen</a> to 0.

 <li><p>Return a code point whose value is <var>code point</var>.
</ol>

<p class=note>The constraints in the <a>UTF-8 decoder</a> above match
“Best Practices for Using U+FFFD” from the Unicode standard. No other
behavior is permitted per the Encoding Standard (other algorithms that
achieve the same result are fine, even encouraged).
[[!UNICODE]]


<h4 id=utf-8-encoder dfn export>UTF-8 encoder</h4>

<p><a>UTF-8</a>'s <a for=/>encoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li>
  <p>Set <var>count</var> and <var>offset</var> based on the
  range <var>code point</var> is in:

  <dl class=switch>
   <dt>U+0080 to U+07FF, inclusive
   <dd>1 and 0xC0
   <dt>U+0800 to U+FFFF, inclusive
   <dd>2 and 0xE0
   <dt>U+10000 to U+10FFFF, inclusive
   <dd>3 and 0xF0
  </dl>

 <li><p>Let <var>bytes</var> be a byte sequence whose first byte is
 (<var>code point</var> >> (6 × <var>count</var>)) + <var>offset</var>.

 <li>
  <p>While <var>count</var> is greater than 0:

  <ol>
   <li><p>Set <var>temp</var> to
   <var>code point</var> >> (6 × (<var>count</var> &minus; 1)).

   <li><p>Append to <var>bytes</var> 0x80 | (<var>temp</var> &amp; 0x3F).

   <li><p>Decrease <var>count</var> by one.
  </ol>

 <li><p>Return bytes <var>bytes</var>, in order.
</ol>

<p class=note>This algorithm has identical results to the one described in the Unicode standard. It
is included here for completeness. [[!UNICODE]]


<h2 id=legacy-single-byte-encodings>Legacy single-byte encodings</h2>

<p>An <a for=/>encoding</a> where each byte is either a single code point or
nothing, is a <dfn>single-byte encoding</dfn>.
<a>Single-byte encodings</a> share the
<a for=/>decoder</a> and <a for=/>encoder</a>. <dfn>Index single-byte</dfn>,
as referenced by the <a>single-byte decoder</a> and
<a>single-byte encoder</a>,  is defined by the following table, and
depends on the <a>single-byte encoding</a> in use. All but two
<a>single-byte encodings</a> have a
unique <a>index</a>.

<table>
 <tr><td><dfn export>IBM866</dfn><td><a href=index-ibm866.txt>index-ibm866.txt</a><td><a href=ibm866.html>index IBM866 visualization</a><td><a href=ibm866-bmp.html>index IBM866 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-2</dfn><td><a href=index-iso-8859-2.txt>index-iso-8859-2.txt</a><td><a href=iso-8859-2.html>index ISO-8859-2 visualization</a><td><a href=iso-8859-2-bmp.html>index ISO-8859-2 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-3</dfn><td><a href=index-iso-8859-3.txt>index-iso-8859-3.txt</a><td><a href=iso-8859-3.html>index ISO-8859-3 visualization</a><td><a href=iso-8859-3-bmp.html>index ISO-8859-3 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-4</dfn><td><a href=index-iso-8859-4.txt>index-iso-8859-4.txt</a><td><a href=iso-8859-4.html>index ISO-8859-4 visualization</a><td><a href=iso-8859-4-bmp.html>index ISO-8859-4 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-5</dfn><td><a href=index-iso-8859-5.txt>index-iso-8859-5.txt</a><td><a href=iso-8859-5.html>index ISO-8859-5 visualization</a><td><a href=iso-8859-5-bmp.html>index ISO-8859-5 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-6</dfn><td><a href=index-iso-8859-6.txt>index-iso-8859-6.txt</a><td><a href=iso-8859-6.html>index ISO-8859-6 visualization</a><td><a href=iso-8859-6-bmp.html>index ISO-8859-6 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-7</dfn><td><a href=index-iso-8859-7.txt>index-iso-8859-7.txt</a><td><a href=iso-8859-7.html>index ISO-8859-7 visualization</a><td><a href=iso-8859-7-bmp.html>index ISO-8859-7 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-8</dfn><td rowspan=2><a href=index-iso-8859-8.txt>index-iso-8859-8.txt</a><td rowspan=2><a href=iso-8859-8.html>index ISO-8859-8 visualization</a><td rowspan=2><a href=iso-8859-8-bmp.html>index ISO-8859-8 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-8-I</dfn>
 <tr><td><dfn export>ISO-8859-10</dfn><td><a href=index-iso-8859-10.txt>index-iso-8859-10.txt</a><td><a href=iso-8859-10.html>index ISO-8859-10 visualization</a><td><a href=iso-8859-10-bmp.html>index ISO-8859-10 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-13</dfn><td><a href=index-iso-8859-13.txt>index-iso-8859-13.txt</a><td><a href=iso-8859-13.html>index ISO-8859-13 visualization</a><td><a href=iso-8859-13-bmp.html>index ISO-8859-13 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-14</dfn><td><a href=index-iso-8859-14.txt>index-iso-8859-14.txt</a><td><a href=iso-8859-14.html>index ISO-8859-14 visualization</a><td><a href=iso-8859-14-bmp.html>index ISO-8859-14 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-15</dfn><td><a href=index-iso-8859-15.txt>index-iso-8859-15.txt</a><td><a href=iso-8859-15.html>index ISO-8859-15 visualization</a><td><a href=iso-8859-15-bmp.html>index ISO-8859-15 BMP coverage</a>
 <tr><td><dfn export>ISO-8859-16</dfn><td><a href=index-iso-8859-16.txt>index-iso-8859-16.txt</a><td><a href=iso-8859-16.html>index ISO-8859-16 visualization</a><td><a href=iso-8859-16-bmp.html>index ISO-8859-16 BMP coverage</a>
 <tr><td><dfn export>KOI8-R</dfn><td><a href=index-koi8-r.txt>index-koi8-r.txt</a><td><a href=koi8-r.html>index KOI8-R visualization</a><td><a href=koi8-r-bmp.html>index KOI8-R BMP coverage</a>
 <tr><td><dfn export>KOI8-U</dfn><td><a href=index-koi8-u.txt>index-koi8-u.txt</a><td><a href=koi8-u.html>index KOI8-U visualization</a><td><a href=koi8-u-bmp.html>index KOI8-U BMP coverage</a>
 <tr><td><dfn export>macintosh</dfn><td><a href=index-macintosh.txt>index-macintosh.txt</a><td><a href=macintosh.html>index macintosh visualization</a><td><a href=macintosh-bmp.html>index macintosh BMP coverage</a>
 <tr><td><dfn export>windows-874</dfn><td><a href=index-windows-874.txt>index-windows-874.txt</a><td><a href=windows-874.html>index windows-874 visualization</a><td><a href=windows-874-bmp.html>index windows-874 BMP coverage</a>
 <tr><td><dfn export>windows-1250</dfn><td><a href=index-windows-1250.txt>index-windows-1250.txt</a><td><a href=windows-1250.html>index windows-1250 visualization</a><td><a href=windows-1250-bmp.html>index windows-1250 BMP coverage</a>
 <tr><td><dfn export>windows-1251</dfn><td><a href=index-windows-1251.txt>index-windows-1251.txt</a><td><a href=windows-1251.html>index windows-1251 visualization</a><td><a href=windows-1251-bmp.html>index windows-1251 BMP coverage</a>
 <tr><td><dfn export>windows-1252</dfn><td><a href=index-windows-1252.txt>index-windows-1252.txt</a><td><a href=windows-1252.html>index windows-1252 visualization</a><td><a href=windows-1252-bmp.html>index windows-1252 BMP coverage</a>
 <tr><td><dfn export>windows-1253</dfn><td><a href=index-windows-1253.txt>index-windows-1253.txt</a><td><a href=windows-1253.html>index windows-1253 visualization</a><td><a href=windows-1253-bmp.html>index windows-1253 BMP coverage</a>
 <tr><td><dfn export>windows-1254</dfn><td><a href=index-windows-1254.txt>index-windows-1254.txt</a><td><a href=windows-1254.html>index windows-1254 visualization</a><td><a href=windows-1254-bmp.html>index windows-1254 BMP coverage</a>
 <tr><td><dfn export>windows-1255</dfn><td><a href=index-windows-1255.txt>index-windows-1255.txt</a><td><a href=windows-1255.html>index windows-1255 visualization</a><td><a href=windows-1255-bmp.html>index windows-1255 BMP coverage</a>
 <tr><td><dfn export>windows-1256</dfn><td><a href=index-windows-1256.txt>index-windows-1256.txt</a><td><a href=windows-1256.html>index windows-1256 visualization</a><td><a href=windows-1256-bmp.html>index windows-1256 BMP coverage</a>
 <tr><td><dfn export>windows-1257</dfn><td><a href=index-windows-1257.txt>index-windows-1257.txt</a><td><a href=windows-1257.html>index windows-1257 visualization</a><td><a href=windows-1257-bmp.html>index windows-1257 BMP coverage</a>
 <tr><td><dfn export>windows-1258</dfn><td><a href=index-windows-1258.txt>index-windows-1258.txt</a><td><a href=windows-1258.html>index windows-1258 visualization</a><td><a href=windows-1258-bmp.html>index windows-1258 BMP coverage</a>
 <tr><td><dfn export>x-mac-cyrillic</dfn><td><a href=index-x-mac-cyrillic.txt>index-x-mac-cyrillic.txt</a><td><a href=x-mac-cyrillic.html>index x-mac-cyrillic visualization</a><td><a href=x-mac-cyrillic-bmp.html>index x-mac-cyrillic BMP coverage</a>
 </table>

<p class=note><a>ISO-8859-8</a> and <a>ISO-8859-8-I</a> are
distinct <a for=/>encoding</a> <a for=encoding>names</a>, because
<a>ISO-8859-8</a> has influence on the layout direction. And although
historically this might have been the case for <a>ISO-8859-6</a> and
"ISO-8859-6-I" as well, that is no longer true.
<!-- https://www.w3.org/Bugs/Public/show_bug.cgi?id=19505 -->

<h3 id=single-byte-decoder dfn export>single-byte decoder</h3>

<p><a>Single-byte encodings</a>'s
<a for=/>decoder</a>'s <a>handler</a>, given <var>ioQueue</var> and
<var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return a code point whose value
 is <var>byte</var>.

 <li><p>Let <var>code point</var> be the <a>index code point</a>
 for <var>byte</var> &minus; 0x80 in <a>index single-byte</a>.

 <li><p>If <var>code point</var> is null, return <a>error</a>.

 <li><p>Return a code point whose value is <var>code point</var>.
</ol>

<h3 id=single-byte-encoder export dfn>single-byte encoder</h3>

<p><a>Single-byte encodings</a>'s
<a for=/>encoder</a>'s <a>handler</a>, given <var>ioQueue</var> and
<var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li><p>Let <var>pointer</var> be the <a>index pointer</a> for
 <var>code point</var> in <a>index single-byte</a>.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>Return a byte whose value is <var>pointer</var> + 0x80.
</ol>


<h2 id=legacy-multi-byte-chinese-(simplified)-encodings>Legacy multi-byte Chinese (simplified) encodings</h2>

<h3 id=gbk dfn export>GBK</h3>

<h4 id=gbk-decoder dfn export>GBK decoder</h4>

<p><a>GBK</a>'s <a for=/>decoder</a> is <a>gb18030</a>'s <a for=/>decoder</a>.


<h4 id=gbk-encoder dfn export>GBK encoder</h4>

<p><a>GBK</a>'s <a for=/>encoder</a> is <a>gb18030</a>'s <a for=/>encoder</a>
with its <a>is GBK</a> set to true.

<p class="note no-backref">Not fully aliasing <a>GBK</a> with <a>gb18030</a>
is a conservative move to decrease the chances of breaking legacy servers and other
consumers of content generated with <a>GBK</a>'s <a for=/>encoder</a>.


<h3 id=gb18030 dfn export>gb18030</h3>

<h4 id=gb18030-decoder dfn export>gb18030 decoder</h4>

<p><a>gb18030</a>'s <a for=/>decoder</a> has an associated <dfn>gb18030 first</dfn>,
<dfn>gb18030 second</dfn>, and <dfn>gb18030 third</dfn> (all initially 0x00).

<p><a>gb18030</a>'s <a for=/>decoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-queue</a> and
 <a>gb18030 first</a>, <a>gb18030 second</a>, and <a>gb18030 third</a>
 are 0x00, return <a>finished</a>.

 <li><p>If <var>byte</var> is <a>end-of-queue</a>, and
 <a>gb18030 first</a>, <a>gb18030 second</a>, or <a>gb18030 third</a>
 is not 0x00, set <a>gb18030 first</a>, <a>gb18030 second</a>, and
 <a>gb18030 third</a> to 0x00, and return <a>error</a>.

 <li>
  <p>If <a>gb18030 third</a> is not 0x00, then:

  <ol>
   <li>
    <p>If <var>byte</var> is not in the range 0x30 to 0x39, inclusive, then:

    <ol>
     <li><p><a>Prepend</a> <a>gb18030 second</a>, <a>gb18030 third</a>, and <var>byte</var> to
     <var>ioQueue</var>.

     <li><p>Set <a>gb18030 first</a>, <a>gb18030 second</a>, and <a>gb18030 third</a> to 0x00.

     <li><p>Return <a>error</a>.
    </ol>

   <li><p>Let <var>code point</var> be the <a>index gb18030 ranges code point</a> for
   ((<a>gb18030 first</a> &minus; 0x81) × (10 × 126 × 10)) +
   ((<a>gb18030 second</a> &minus; 0x30) × (10 × 126)) +
   ((<a>gb18030 third</a> &minus; 0x81) × 10) + <var>byte</var> &minus; 0x30.

   <li><p>Set <a>gb18030 first</a>, <a>gb18030 second</a>, and <a>gb18030 third</a> to 0x00.

   <li><p>If <var>code point</var> is null, return <a>error</a>.

   <li><p>Return a code point whose value is <var>code point</var>.
  </ol>

 <li>
  <p>If <a>gb18030 second</a> is not 0x00, then:

  <ol>
   <li><p>If <var>byte</var> is in the range 0x81 to 0xFE, inclusive, set
   <a>gb18030 third</a> to <var>byte</var> and return <a>continue</a>.

   <li><p><a>Prepend</a> <a>gb18030 second</a>
   followed by <var>byte</var> to <var>ioQueue</var>, set
   <a>gb18030 first</a> and <a>gb18030 second</a> to 0x00, and return
   <a>error</a>.
  </ol>

 <li>
  <p>If <a>gb18030 first</a> is not 0x00, then:

  <ol>
   <li><p>If <var>byte</var> is in the range 0x30 to 0x39, inclusive, set
   <a>gb18030 second</a> to <var>byte</var> and return <a>continue</a>.

   <li><p>Let <var>lead</var> be <a>gb18030 first</a>, let
   <var>pointer</var> be null, and set <a>gb18030 first</a> to 0x00.

   <li><p>Let <var>offset</var> be 0x40 if <var>byte</var> is less than 0x7F, otherwise 0x41.

   <li><p>If <var>byte</var> is in the range 0x40 to 0x7E, inclusive, or
   0x80 to 0xFE, inclusive, set <var>pointer</var> to
   (<var>lead</var> &minus; 0x81) × 190 + (<var>byte</var> &minus; <var>offset</var>).

   <li><p>Let <var>code point</var> be null if <var>pointer</var> is null, otherwise the
   <a>index code point</a> for <var>pointer</var> in <a>index gb18030</a>.

   <li><p>If <var>code point</var> is non-null, return a code point whose value is
   <var>code point</var>.

   <li><p>If <var>byte</var> is an <a>ASCII byte</a>, <a>prepend</a> <var>byte</var> to
   <var>ioQueue</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
 a code point whose value is <var>byte</var>.

 <li><p>If <var>byte</var> is 0x80, return code point U+20AC.

 <li><p>If <var>byte</var> is in the range 0x81 to 0xFE, inclusive, set
 <a>gb18030 first</a> to <var>byte</var> and return <a>continue</a>.

 <li><p>Return <a>error</a>.
</ol>


<h4 id=gb18030-encoder dfn export>gb18030 encoder</h4>

<p><a>gb18030</a>'s <a for=/>encoder</a> has an associated <dfn id=gbk-flag>is GBK</dfn>
(initially false).

<p><a>gb18030</a>'s <a for=/>encoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li>
  <p>If <var>code point</var> is U+E5E5, return <a>error</a> with <var>code point</var>.

  <p class=note><a>Index gb18030</a> maps 0xA3 0xA0 to U+3000 rather than U+E5E5 for
  compatibility with deployed content. Therefore it cannot roundtrip.

 <li><p>If <a>is GBK</a> is true and <var>code point</var> is
 U+20AC, return byte 0x80.

 <li><p>Let <var>pointer</var> be the <a>index pointer</a> for
 <var>code point</var> in <a>index gb18030</a>.

 <li>
  <p>If <var>pointer</var> is non-null, then:

  <ol>
   <li><p>Let <var>lead</var> be <var>pointer</var> / 190 + 0x81.

   <li><p>Let <var>trail</var> be <var>pointer</var> % 190.

   <li><p>Let <var>offset</var> be 0x40 if <var>trail</var> is less than 0x3F,<!--0x7F-0x40-->
   otherwise 0x41.

   <li><p>Return two bytes whose values are <var>lead</var> and
   <var>trail</var> + <var>offset</var>.
  </ol>

 <li><p>If <a>is GBK</a> is true, return <a>error</a> with
 <var>code point</var>.

 <li><p>Set <var>pointer</var> to the
 <a>index gb18030 ranges pointer</a> for <var>code point</var>.

 <li><p>Let <var>byte1</var> be <var>pointer</var> / (10 × 126 × 10).

 <li><p>Set <var>pointer</var> to <var>pointer</var> % (10 × 126 × 10).

 <li><p>Let <var>byte2</var> be <var>pointer</var> / (10 × 126).

 <li><p>Set <var>pointer</var> to <var>pointer</var> % (10 × 126).

 <li><p>Let <var>byte3</var> be <var>pointer</var> / 10.

 <li><p>Let <var>byte4</var> be <var>pointer</var> % 10.

 <li><p>Return four bytes whose values are <var>byte1</var> + 0x81,
 <var>byte2</var> + 0x30, <var>byte3</var> + 0x81,
 <var>byte4</var> + 0x30.
</ol>


<h2 id=legacy-multi-byte-chinese-(traditional)-encodings>Legacy multi-byte Chinese (traditional) encodings</h2>

<!--
 Lead:  0x81 to 0xFE
 Trail: 0x40 to 0x7E or 0xA1 to 0xFE
-->


<h3 id=big5 dfn export>Big5</h3>

<h4 id=big5-decoder dfn export>Big5 decoder</h4>

<p><a>Big5</a>'s <a for=/>decoder</a> has an associated
<dfn>Big5 lead</dfn> (initially 0x00).

<a>Big5</a>'s <a for=/>decoder</a>'s <a>handler</a>, given <var>ioQueue</var>
and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-queue</a> and <a>Big5 lead</a>
 is not 0x00, set <a>Big5 lead</a> to 0x00 and return <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-queue</a> and <a>Big5 lead</a>
 is 0x00, return <a>finished</a>.

 <li>
  <p>If <a>Big5 lead</a> is not 0x00, let <var>lead</var> be
  <a>Big5 lead</a>, let <var>pointer</var> be null, set
  <a>Big5 lead</a> to 0x00, and then:

  <ol>
   <li><p>Let <var>offset</var> be 0x40 if <var>byte</var> is less than 0x7F, otherwise 0x62.
   <!-- 0x62 = 0xA1-0x7E+1+0x40 -->

   <li><p>If <var>byte</var> is in the range 0x40 to 0x7E, inclusive, or
   0xA1 to 0xFE, inclusive, set <var>pointer</var> to
   (<var>lead</var> &minus; 0x81) × 157 + (<var>byte</var> &minus; <var>offset</var>).

   <li>
    <p>If there is a row in the table below whose first column is
    <var>pointer</var>, return the <em>two</em> code points listed in
    its second column (the third column is irrelevant):

    <table>
     <tbody><tr><th>Pointer<th>Code points<th>Notes<!-- https://www.unicode.org/Public/UNIDATA/NamedSequences.txt -->
     <tr><td>1133<!-- 0x88 0x62 --><td>U+00CA U+0304<td>Ê̄ (LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND MACRON)
     <tr><td>1135<!-- 0x88 0x64 --><td>U+00CA U+030C<td>Ê̌ (LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND CARON)
     <tr><td>1164<!-- 0x88 0xA3 --><td>U+00EA U+0304<td>ê̄ (LATIN SMALL LETTER E WITH CIRCUMFLEX AND MACRON)
     <tr><td>1166<!-- 0x88 0xA5 --><td>U+00EA U+030C<td>ê̌ (LATIN SMALL LETTER E WITH CIRCUMFLEX AND CARON)
    </table>
    <!-- we do this to avoid PUA -->

    <p class=note>Since <a lt=index>indexes</a> are limited to
    single code points this table is used for these pointers.

   <li><p>Let <var>code point</var> be null if <var>pointer</var> is null, otherwise the
   <a>index code point</a> for <var>pointer</var> in <a>index Big5</a>.

   <li><p>If <var>code point</var> is non-null, return a code point whose value is
   <var>code point</var>.

   <li><p>If <var>byte</var> is an <a>ASCII byte</a>, <a>prepend</a> <var>byte</var> to
   <var>ioQueue</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
 a code point whose value is <var>byte</var>.

 <li><p>If <var>byte</var> is in the range 0x81 to 0xFE, inclusive, set
 <a>Big5 lead</a> to <var>byte</var> and return <a>continue</a>.

 <li><p>Return <a>error</a>.
</ol>


<h4 id=big5-encoder dfn export>Big5 encoder</h4>

<p><a>Big5</a>'s <a for=/>encoder</a>'s <a>handler</a>, given <var>ioQueue</var>
and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li><p>Let <var>pointer</var> be the <a>index Big5 pointer</a> for
 <var>code point</var>.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>Let <var>lead</var> be <var>pointer</var> / 157 + 0x81.

 <li><p>Let <var>trail</var> be <var>pointer</var> % 157.

 <li><p>Let <var>offset</var> be 0x40 if <var>trail</var> is less than 0x3F,<!--0x7F-0x40-->
 otherwise 0x62.<!--0xA1-0x3F-->

 <li><p>Return two bytes whose values are <var>lead</var> and
 <var>trail</var> + <var>offset</var>.
</ol>


<h2 id=legacy-multi-byte-japanese-encodings>Legacy multi-byte Japanese encodings</h2>

<h3 id=euc-jp dfn export>EUC-JP</h3>
<!-- https://www.iana.org/assignments/charset-reg/CP51932 -->

<h4 id=euc-jp-decoder dfn export>EUC-JP decoder</h4>

<p><a>EUC-JP</a>'s <a for=/>decoder</a> has an associated
<dfn id=euc-jp-jis0212-flag>EUC-JP jis0212</dfn> (initially false) and
<dfn>EUC-JP lead</dfn> (initially 0x00).

<p><a>EUC-JP</a>'s <a for=/>decoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-queue</a> and
 <a>EUC-JP lead</a> is not 0x00, set <a>EUC-JP lead</a> to 0x00, and return
 <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-queue</a> and
 <a>EUC-JP lead</a> is 0x00, return <a>finished</a>.

 <li><p>If <a>EUC-JP lead</a> is 0x8E and <var>byte</var> is
 in the range 0xA1 to 0xDF, inclusive, set <a>EUC-JP lead</a> to 0x00 and return
 a code point whose value is 0xFF61 &minus; 0xA1 + <var>byte</var>.
 <!-- Katakana; subtraction is done first to avoid upsetting compilers -->

 <li><p>If <a>EUC-JP lead</a> is 0x8F and <var>byte</var> is in the range
 0xA1 to 0xFE, inclusive, set <a>EUC-JP jis0212</a> to true, set
 <a>EUC-JP lead</a> to <var>byte</var>, and return <a>continue</a>.

 <li>
  <p>If <a>EUC-JP lead</a> is not 0x00, let <var>lead</var> be <a>EUC-JP lead</a>, set
  <a>EUC-JP lead</a> to 0x00, and then:

  <ol>
   <li><p>Let <var>code point</var> be null.

   <li><p>If <var>lead</var> and <var>byte</var> are both in the range 0xA1 to 0xFE, inclusive, then
   set <var>code point</var> to the <a>index code point</a> for
   (<var>lead</var> &minus; 0xA1) × 94 + <var>byte</var> &minus; 0xA1
   in <a>index jis0208</a> if <a>EUC-JP jis0212</a> is false and in
   <a>index jis0212</a> otherwise.

   <li><p>Set <a>EUC-JP jis0212</a> to false.

   <li><p>If <var>code point</var> is non-null, return a code point whose value is
   <var>code point</var>.

   <li><p>If <var>byte</var> is an <a>ASCII byte</a>, <a>prepend</a> <var>byte</var> to
   <var>ioQueue</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
 a code point whose value is <var>byte</var>.

 <li><p>If <var>byte</var> is 0x8E, 0x8F, or in the range 0xA1 to
 0xFE, inclusive, set <a>EUC-JP lead</a> to <var>byte</var> and return
 <a>continue</a>.

 <li><p>Return <a>error</a>.
</ol>


<h4 id=euc-jp-encoder dfn export>EUC-JP encoder</h4>

<p><a>EUC-JP</a>'s <a for=/>encoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li><p>If <var>code point</var> is U+00A5, return byte 0x5C.

 <li><p>If <var>code point</var> is U+203E, return byte 0x7E.

 <li><p>If <var>code point</var> is in the range U+FF61 to U+FF9F, inclusive, return
 two bytes whose values are 0x8E and <var>code point</var> &minus; 0xFF61 + 0xA1.

 <li><p>If <var>code point</var> is U+2212, set it to U+FF0D.

 <li>
  <p>Let <var>pointer</var> be the <a>index pointer</a> for <var>code point</var> in
  <a>index jis0208</a>.

  <p class=note>If <var>pointer</var> is non-null, it is less than 8836 due to the nature of
  <a>index jis0208</a> and the <a>index pointer</a> operation.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>Let <var>lead</var> be <var>pointer</var> / 94 + 0xA1.

 <li><p>Let <var>trail</var> be <var>pointer</var> % 94 + 0xA1.

 <li><p>Return two bytes whose values are <var>lead</var> and
 <var>trail</var>.
</ol>


<h3 id=iso-2022-jp dfn export>ISO-2022-JP</h3>
<!--
 https://tools.ietf.org/html/rfc1468
 https://tools.ietf.org/html/rfc2237 (ISO-2022-JP-1; not used)
 "ESC ) I" is from ISO-2022-JP-3 reportedly
-->

<h4 id=iso-2022-jp-decoder dfn export>ISO-2022-JP decoder</h4>

<p><a>ISO-2022-JP</a>'s <a for=/>decoder</a> has an associated
<dfn>ISO-2022-JP decoder state</dfn> (initially
<a lt="ISO-2022-JP decoder ASCII">ASCII</a>),
<dfn>ISO-2022-JP decoder output state</dfn> (initially
<a lt="ISO-2022-JP decoder ASCII">ASCII</a>),
<dfn>ISO-2022-JP lead</dfn> (initially 0x00), and
<dfn id=iso-2022-jp-output-flag>ISO-2022-JP output</dfn> (initially false).

<p><a>ISO-2022-JP</a>'s <a for=/>decoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>byte</var>, runs these steps, switching on
<a>ISO-2022-JP decoder state</a>:

<dl class=switch>
 <dt><dfn lt="ISO-2022-JP decoder ASCII">ASCII</dfn>
 <dd>
  <p>Based on <var>byte</var>:

  <dl class=switch>
   <dt>0x1B
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape start">escape start</a> and return
   <a>continue</a>.

   <dt>0x00 to 0x7F, excluding 0x0E, 0x0F, and 0x1B
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return a code point whose
   value is <var>byte</var>.

   <dt><a>end-of-queue</a>
   <dd><p>Return <a>finished</a>.

   <dt>Otherwise
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return <a>error</a>.
  </dl>

 <dt><dfn lt="ISO-2022-JP decoder Roman">Roman</dfn>
 <dd>
  <p>Based on <var>byte</var>:

  <dl class=switch>
   <dt>0x1B
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape start">escape start</a> and return
   <a>continue</a>.

   <dt>0x5C
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return code point U+00A5.

   <dt>0x7E
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return code point U+203E.

   <dt>0x00 to 0x7F, excluding 0x0E, 0x0F, 0x1B, 0x5C, and 0x7E
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return a code point whose
   value is <var>byte</var>.

   <dt><a>end-of-queue</a>
   <dd><p>Return <a>finished</a>.

   <dt>Otherwise
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return <a>error</a>.
  </dl>

 <dt><dfn lt="ISO-2022-JP decoder katakana">katakana</dfn>
 <dd>
  <p>Based on <var>byte</var>:
  <dl class=switch>
   <dt>0x1B
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape start">escape start</a> and return
   <a>continue</a>.

   <dt>0x21 to 0x5F
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return a code point whose
   value is 0xFF61 &minus; 0x21 + <var>byte</var>.
   <!-- Katakana; subtraction is done first to avoid upsetting compilers -->

   <dt><a>end-of-queue</a>
   <dd><p>Return <a>finished</a>.

   <dt>Otherwise
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return <a>error</a>.
  </dl>

 <dt><dfn lt="ISO-2022-JP decoder lead byte">Lead byte</dfn>
 <dd>
  <p>Based on <var>byte</var>:
  <dl class=switch>
   <dt>0x1B
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape start">escape start</a> and return
   <a>continue</a>.

   <dt>0x21 to 0x7E
   <dd><p>Set <a>ISO-2022-JP output</a> to false,
   <a>ISO-2022-JP lead</a> to <var>byte</var>,
   <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder trail byte">trail byte</a>, and return
   <a>continue</a>.

   <dt><a>end-of-queue</a>
   <dd><p>Return <a>finished</a>.

   <dt>Otherwise
   <dd><p>Set <a>ISO-2022-JP output</a> to false and return <a>error</a>.
  </dl>

 <dt><dfn lt="ISO-2022-JP decoder trail byte">Trail byte</dfn>
 <dd>
  <p>Based on <var>byte</var>:
  <dl class=switch>
   <dt>0x1B
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape start">escape start</a> and return
   <a>error</a>.
   <!-- ISO-2022-JP decoder output state is still lead byte -->

   <dt>0x21 to 0x7E
   <dd>
    <ol>
     <li><p>Set the <a>ISO-2022-JP decoder state</a> to
     <a lt="ISO-2022-JP decoder lead byte">lead byte</a>.

     <li><p>Let <var>pointer</var> be
     (<a>ISO-2022-JP lead</a> &minus; 0x21) × 94 + <var>byte</var> &minus; 0x21.

     <li><p>Let <var>code point</var> be the <a>index code point</a> for
     <var>pointer</var> in <a>index jis0208</a>.

     <li><p>If <var>code point</var> is null, return <a>error</a>.

     <li><p>Return a code point whose value is <var>code point</var>.
    </ol>

   <dt><a>end-of-queue</a>
   <dd><p>Set the <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder lead byte">lead byte</a> and return <a>error</a>.

   <dt>Otherwise
   <dd><p>Set <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder lead byte">lead byte</a> and return
   <a>error</a>.
   <!-- ISO-2022-JP decoder output state is still lead byte -->
  </dl>

 <dt><dfn lt="ISO-2022-JP decoder escape start">Escape start</dfn>
 <dd>
  <ol>
   <li><p>If <var>byte</var> is either <!--$-->0x24 or <!--(-->0x28, set
   <a>ISO-2022-JP lead</a> to <var>byte</var>,
   <a>ISO-2022-JP decoder state</a> to
   <a lt="ISO-2022-JP decoder escape">escape</a>, and return
   <a>continue</a>.

   <li><p>If <var>byte</var> is not <a>end-of-queue</a>, then <a>prepend</a>
   <var>byte</var> to <var>ioQueue</var>.

   <li><p>Set <a>ISO-2022-JP output</a> to false,
   <a>ISO-2022-JP decoder state</a> to
   <a>ISO-2022-JP decoder output state</a>, and return <a>error</a>.
  </ol>

 <dt><dfn lt="ISO-2022-JP decoder escape">Escape</dfn>
 <dd>
  <ol>
   <li><p>Let <var>lead</var> be <a>ISO-2022-JP lead</a> and set
   <a>ISO-2022-JP lead</a> to 0x00.

   <li><p>Let <var>state</var> be null.

   <li><p>If <var>lead</var> is 0x28 and <var>byte</var> is 0x42<!--B-->, set
   <var>state</var> to <a lt="ISO-2022-JP decoder ASCII">ASCII</a>.

   <li><p>If <var>lead</var> is 0x28 and <var>byte</var> is 0x4A<!--J-->, set
   <var>state</var> to <a lt="ISO-2022-JP decoder Roman">Roman</a>.

   <li><p>If <var>lead</var> is 0x28 and <var>byte</var> is 0x49<!--I-->, set
   <var>state</var> to <a lt="ISO-2022-JP decoder katakana">katakana</a>.

   <li><p>If <var>lead</var> is 0x24 and <var>byte</var> is either
   0x40<!--@--> or 0x42<!--B-->, set <var>state</var> to
   <a lt="ISO-2022-JP decoder lead byte">lead byte</a>.

   <li>
    <p>If <var>state</var> is non-null, then:

    <ol>
     <li><p>Set <a>ISO-2022-JP decoder state</a> and
     <a>ISO-2022-JP decoder output state</a> to <var>state</var>.

     <li><p>Let <var>output</var> be the value of <a>ISO-2022-JP output</a>.

     <li><p>Set <a>ISO-2022-JP output</a> to true.

     <li><p>Return <a>continue</a>, if <var>output</var> is false, and
     <a>error</a> otherwise.
    </ol>

   <li><p>If <var>byte</var> is <a>end-of-queue</a>, then <a>prepend</a>
   <var>lead</var> to <var>ioQueue</var>. Otherwise, <a>prepend</a>
   <var>lead</var> and <var>byte</var> to <var>ioQueue</var>.

   <li><p>Set <a>ISO-2022-JP output</a> to false,
   <a>ISO-2022-JP decoder state</a> to <a>ISO-2022-JP decoder output state</a>
   and return <a>error</a>.
  </ol>
</dl>


<h4 id=iso-2022-jp-encoder dfn export>ISO-2022-JP encoder</h4>

<div class="note no-backref">
 <p>The <a>ISO-2022-JP encoder</a> is the only <a for=/>encoder</a> for which the concatenation of
 multiple outputs can result in an <a>error</a> when run through the corresponding
 <a for=/>decoder</a>.

 <p class=example id=example-iso-2022-jp-encoder-oddity>Encoding U+00A5 gives 0x1B 0x28 0x4A 0x5C
 0x1B 0x28 0x42. Doing that twice, concatenating the results, and then decoding yields U+00A5 U+FFFD
 U+00A5.
</div>

<p><a>ISO-2022-JP</a>'s <a for=/>encoder</a> has an associated
<dfn>ISO-2022-JP encoder state</dfn> which is <dfn lt="ISO-2022-JP encoder ASCII">ASCII</dfn>,
<dfn lt="ISO-2022-JP encoder Roman">Roman</dfn>, or
<dfn lt="ISO-2022-JP encoder jis0208">jis0208</dfn> (initially
<a lt="ISO-2022-JP encoder ASCII">ASCII</a>).

<p><a>ISO-2022-JP</a>'s <a for=/>encoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-queue</a> and
 <a>ISO-2022-JP encoder state</a> is not
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a>, set
 <a>ISO-2022-JP encoder state</a> to
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a>, and return three bytes
 0x1B 0x28 0x42.

 <li><p>If <var>code point</var> is <a>end-of-queue</a> and
 <a>ISO-2022-JP encoder state</a> is
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a>, return <a>finished</a>.

 <li>
  <p>If <a>ISO-2022-JP encoder state</a> is
  <a lt="ISO-2022-JP encoder ASCII">ASCII</a> or
  <a lt="ISO-2022-JP encoder Roman">Roman</a>, and <var>code point</var> is U+000E, U+000F,
  or U+001B, return <a>error</a> with U+FFFD.

  <p class=note>This returns U+FFFD rather than <var>code point</var> to prevent attacks.
  <!-- https://github.com/whatwg/encoding/issues/15 -->

 <li><p>If <a>ISO-2022-JP encoder state</a> is
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a> and <var>code point</var> is an
 <a>ASCII code point</a>, return a byte whose value is <var>code point</var>.

 <li>
  <p>If <a>ISO-2022-JP encoder state</a> is <a lt="ISO-2022-JP encoder Roman">Roman</a> and
  <var>code point</var> is an <a>ASCII code point</a>, excluding U+005C and U+007E, or is U+00A5 or
  U+203E, then:

  <ol>
   <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return a byte
   whose value is <var>code point</var>.

   <li><p>If <var>code point</var> is U+00A5, return byte 0x5C.

   <li><p>If <var>code point</var> is U+203E, return byte 0x7E.
  </ol>

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, and
 <a>ISO-2022-JP encoder state</a> is not
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a>,
 <a>prepend</a> <var>code point</var> to
 <var>ioQueue</var>, set <a>ISO-2022-JP encoder state</a> to
 <a lt="ISO-2022-JP encoder ASCII">ASCII</a>, and return three bytes
 0x1B 0x28 0x42.

 <li><p>If <var>code point</var> is either U+00A5 or U+203E, and
 <a>ISO-2022-JP encoder state</a> is not
 <a lt="ISO-2022-JP encoder Roman">Roman</a>,
 <a>prepend</a> <var>code point</var> to
 <var>ioQueue</var>, set <a>ISO-2022-JP encoder state</a> to
 <a lt="ISO-2022-JP encoder Roman">Roman</a>, and return three bytes
 0x1B 0x28 0x4A.

 <li><p>If <var>code point</var> is U+2212, set it to U+FF0D.

 <li><p>If <var>code point</var> is in the range U+FF61 to U+FF9F, inclusive, set it to the
 <a>index code point</a> for <var>code point</var> &minus; 0xFF61 in
 <a>index ISO-2022-JP katakana</a>.

 <li>
  <p>Let <var>pointer</var> be the <a>index pointer</a> for <var>code point</var> in
  <a>index jis0208</a>.

  <p class=note>If <var>pointer</var> is non-null, it is less than 8836 due to the nature of
  <a>index jis0208</a> and the <a>index pointer</a> operation.

 <li>
  <p>If <var>pointer</var> is null, then:

  <ol>
   <li><p>If <a>ISO-2022-JP encoder state</a> is <a lt="ISO-2022-JP encoder jis0208">jis0208</a>,
   then <a>prepend</a> <var>code point</var> to <var>ioQueue</var>, set
   <a>ISO-2022-JP encoder state</a> to <a lt="ISO-2022-JP encoder ASCII">ASCII</a>, and return three
   bytes 0x1B 0x28 0x42.

   <li><p>Return <a>error</a> with <var>code point</var>.
  </ol>

 <li><p>If <a>ISO-2022-JP encoder state</a> is not
 <a lt="ISO-2022-JP encoder jis0208">jis0208</a>,
 <a>prepend</a> <var>code point</var> to
 <var>ioQueue</var>, set <a>ISO-2022-JP encoder state</a> to
 <a lt="ISO-2022-JP encoder jis0208">jis0208</a>, and return three bytes
 0x1B 0x24 0x42.

 <li><p>Let <var>lead</var> be <var>pointer</var> / 94 + 0x21.

 <li><p>Let <var>trail</var> be <var>pointer</var> % 94 + 0x21.

 <li><p>Return two bytes whose values are <var>lead</var> and
 <var>trail</var>.
</ol>


<h3 id=shift_jis dfn export>Shift_JIS</h3>

<h4 id=shift_jis-decoder dfn export>Shift_JIS decoder</h4>

<p><a>Shift_JIS</a>'s <a for=/>decoder</a> has an associated
<dfn>Shift_JIS lead</dfn> (initially 0x00).

<p><a>Shift_JIS</a>'s <a for=/>decoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-queue</a> and
 <a>Shift_JIS lead</a> is not 0x00, set <a>Shift_JIS lead</a> to 0x00 and
 return <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-queue</a> and
 <a>Shift_JIS lead</a> is 0x00, return <a>finished</a>.

 <li>
  <p>If <a>Shift_JIS lead</a> is not 0x00, let <var>lead</var> be <a>Shift_JIS lead</a>, let
  <var>pointer</var> be null, set <a>Shift_JIS lead</a> to 0x00, and then:

  <ol>
   <li><p>Let <var>offset</var> be 0x40 if <var>byte</var> is less than 0x7F, otherwise 0x41.

   <li><p>Let <var>lead offset</var> be 0x81 if <var>lead</var> is less than 0xA0, otherwise 0xC1.

   <li><p>If <var>byte</var> is in the range 0x40 to 0x7E, inclusive, or
   0x80 to 0xFC, inclusive, set <var>pointer</var> to
   (<var>lead</var> &minus; <var>lead offset</var>) × 188 + <var>byte</var> &minus; <var>offset</var>.

   <li>
    <p>If <var>pointer</var> is in the range 8836 to 10715, inclusive, return a code point whose
    value is 0xE000 &minus; 8836 + <var>pointer</var>.
    <!-- subtraction is done first to avoid upsetting compilers -->

    <p class=note>This is interoperable legacy from Windows known as EUDC.
    <!-- PUA -->

   <li><p>Let <var>code point</var> be null if <var>pointer</var> is null, otherwise the
   <a>index code point</a> for <var>pointer</var> in <a>index jis0208</a>.

   <li><p>If <var>code point</var> is non-null, return a code point whose value is
   <var>code point</var>.

   <li><p>If <var>byte</var> is an <a>ASCII byte</a>, <a>prepend</a> <var>byte</var> to
   <var>ioQueue</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>If <var>byte</var> is an <a>ASCII byte</a> or 0x80, return a code point
 whose value is <var>byte</var>.
 <!-- Opera has 0x7E -->

 <li><p>If <var>byte</var> is in the range 0xA1 to 0xDF, inclusive, return
 a code point whose value is 0xFF61 &minus; 0xA1 + <var>byte</var>.
 <!-- Katakana; subtraction is done first to avoid upsetting compilers -->

 <li><p>If <var>byte</var> is in the range 0x81 to 0x9F, inclusive, or 0xE0 to 0xFC,
 inclusive, set <a>Shift_JIS lead</a> to <var>byte</var> and return
 <a>continue</a>.

 <li><p>Return <a>error</a>.
</ol>


<h4 id=shift_jis-encoder dfn export>Shift_JIS encoder</h4>

<p><a>Shift_JIS</a>'s <a for=/>encoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a> or U+0080, return
 a byte whose value is <var>code point</var>.

 <li><p>If <var>code point</var> is U+00A5, return byte 0x5C.

 <li><p>If <var>code point</var> is U+203E, return byte 0x7E.

 <li><p>If <var>code point</var> is in the range U+FF61 to U+FF9F, inclusive, return
 a byte whose value is <var>code point</var> &minus; 0xFF61 + 0xA1.

 <li><p>If <var>code point</var> is U+2212, set it to U+FF0D.

 <li><p>Let <var>pointer</var> be the <a>index Shift_JIS pointer</a> for
 <var>code point</var>.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>Let <var>lead</var> be <var>pointer</var> / 188.

 <li><p>Let <var>lead offset</var> be 0x81 if <var>lead</var> is less than 0x1F, otherwise 0xC1.
 <!-- 0xA0-0x81 -->

 <li><p>Let <var>trail</var> be <var>pointer</var> % 188.

 <li><p>Let <var>offset</var> be 0x40 if <var>trail</var> is less than 0x3F, otherwise 0x41.

 <li><p>Return two bytes whose values are
 <var>lead</var> + <var>lead offset</var> and
 <var>trail</var> + <var>offset</var>.
</ol>


<h2 id=legacy-multi-byte-korean-encodings>Legacy multi-byte Korean encodings</h2>

<h3 id=euc-kr dfn export>EUC-KR</h3>

<h4 id=euc-kr-decoder dfn export>EUC-KR decoder</h4>

<p><a>EUC-KR</a>'s <a for=/>decoder</a> has an associated
<dfn>EUC-KR lead</dfn> (initially 0x00).

<p><a>EUC-KR</a>'s <a for=/>decoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-queue</a> and
 <a>EUC-KR lead</a> is not 0x00, set <a>EUC-KR lead</a> to 0x00
 and return <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-queue</a> and
 <a>EUC-KR lead</a> is 0x00, return <a>finished</a>.

 <li>
  <p>If <a>EUC-KR lead</a> is not 0x00, let <var>lead</var> be <a>EUC-KR lead</a>, let
  <var>pointer</var> be null, set <a>EUC-KR lead</a> to 0x00, and then:

  <ol>
   <li><p>If <var>byte</var> is in the range  0x41 to 0xFE, inclusive, set
   <var>pointer</var> to
   (<var>lead</var> &minus; 0x81) × 190 + (<var>byte</var> &minus; 0x41).

   <li><p>Let <var>code point</var> be null if <var>pointer</var> is null, otherwise the
   <a>index code point</a> for <var>pointer</var> in <a>index EUC-KR</a>.

   <li><p>If <var>code point</var> is non-null, return a code point whose value is
   <var>code point</var>.

   <li><p>If <var>byte</var> is an <a>ASCII byte</a>, <a>prepend</a> <var>byte</var> to
   <var>ioQueue</var>.

   <li><p>Return <a>error</a>.
  </ol>

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
 a code point whose value is <var>byte</var>.

 <li><p>If <var>byte</var> is in the range 0x81 to 0xFE, inclusive, set
 <a>EUC-KR lead</a> to <var>byte</var> and return <a>continue</a>.

 <li><p>Return <a>error</a>.
</ol>


<h4 id=euc-kr-encoder dfn export>EUC-KR encoder</h4>

<p><a>EUC-KR</a>'s <a for=/>encoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li><p>Let <var>pointer</var> be the <a>index pointer</a> for
 <var>code point</var> in <a>index EUC-KR</a>.

 <li><p>If <var>pointer</var> is null, return <a>error</a> with
 <var>code point</var>.

 <li><p>Let <var>lead</var> be <var>pointer</var> / 190 + 0x81.

 <li><p>Let <var>trail</var> be <var>pointer</var> % 190 + 0x41.

 <li><p>Return two bytes whose values are <var>lead</var> and <var>trail</var>.
</ol>


<h2 id=legacy-miscellaneous-encodings>Legacy miscellaneous encodings</h2>

<h3 id=replacement dfn export>replacement</h3>

<p class=note>The <a>replacement</a> <a for=/>encoding</a> exists to prevent certain
attacks that abuse a mismatch between <a for=/>encodings</a> supported on
the server and the client.


<h4 id=replacement-decoder dfn export>replacement decoder</h4>

<p><a>replacement</a>'s <a for=/>decoder</a> has an associated
<dfn id=replacement-error-returned-flag>replacement error returned</dfn> (initially false).

<p><a>replacement</a>'s <a for=/>decoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-queue</a>, return <a>finished</a>.

 <li><p>If <a>replacement error returned</a> is false, set
 <a>replacement error returned</a> to true and return <a>error</a>.

 <li><p>Return <a>finished</a>.
</ol>


<h3 id=common-infrastructure-for-utf-16be-and-utf-16le>Common infrastructure for <a>UTF-16BE/LE</a></h3>

<p><dfn export>UTF-16BE/LE</dfn> is <a>UTF-16BE</a> or <a>UTF-16LE</a>.


<h4 id=shared-utf-16-decoder dfn export>shared UTF-16 decoder</h4>

<p class="note no-backref">A byte order mark has priority over a <a>label</a> as it
has been found to be more accurate in deployed content. Therefore it is not part of the
<a>shared UTF-16 decoder</a> algorithm but rather the <a>decode</a> algorithm.

<p><a>shared UTF-16 decoder</a> has an associated <dfn>UTF-16 lead byte</dfn> and
<dfn>UTF-16 lead surrogate</dfn> (both initially null), and
<dfn id=utf-16be-decoder-flag>is UTF-16BE decoder</dfn> (initially false).

<p><a>shared UTF-16 decoder</a>'s <a>handler</a>, given <var>ioQueue</var> and
<var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-queue</a> and either
 <a>UTF-16 lead byte</a> or <a>UTF-16 lead surrogate</a> is non-null, set
 <a>UTF-16 lead byte</a> and <a>UTF-16 lead surrogate</a> to null, and return
 <a>error</a>.

 <li><p>If <var>byte</var> is <a>end-of-queue</a> and
 <a>UTF-16 lead byte</a> and <a>UTF-16 lead surrogate</a> are null, return
 <a>finished</a>.

 <li><p>If <a>UTF-16 lead byte</a> is null, set <a>UTF-16 lead byte</a> to
 <var>byte</var> and return <a>continue</a>.

 <li>
  <p>Let <var>code unit</var> be the result of:

  <dl class=switch>
   <dt><a>is UTF-16BE decoder</a> is true
   <dd><p>(<a>UTF-16 lead byte</a> &lt;&lt; 8) + <var>byte</var>.
   <dt><a>is UTF-16BE decoder</a> is false
   <dd><p>(<var>byte</var> &lt;&lt; 8) + <a>UTF-16 lead byte</a>.
  </dl>

  <p>Then set <a>UTF-16 lead byte</a> to null.

 <li>
  <p>If <a>UTF-16 lead surrogate</a> is non-null, let <var>lead surrogate</var> be
  <a>UTF-16 lead surrogate</a>, set <a>UTF-16 lead surrogate</a> to null, and then:

  <ol>
   <li><p>If <var>code unit</var> is in the range U+DC00 to U+DFFF, inclusive,
   return a code point whose value is
   0x10000 + ((<var>lead surrogate</var> &minus; 0xD800) &lt;&lt; 10) + (<var>code unit</var> &minus; 0xDC00).

   <li><p>Let <var>byte1</var> be <var>code unit</var> >> 8.

   <li><p>Let <var>byte2</var> be <var>code unit</var> &amp; 0x00FF.

   <li><p>Let <var>bytes</var> be two bytes whose values are <var>byte1</var> and <var>byte2</var>,
   if <a>is UTF-16BE decoder</a> is true, and <var>byte2</var> and <var>byte1</var> otherwise.

   <li><p><a>Prepend</a> the <var>bytes</var> to <var>ioQueue</var> and return <a>error</a>.
   <!-- unpaired surrogates; IE/WebKit output them, Gecko/Opera U+FFFD them -->
  </ol>

 <li><p>If <var>code unit</var> is in the range U+D800 to U+DBFF, inclusive, set
 <a>UTF-16 lead surrogate</a> to <var>code unit</var> and return
 <a>continue</a>.

 <li><p>If <var>code unit</var> is in the range U+DC00 to U+DFFF, inclusive,
 return <a>error</a>.
 <!-- unpaired surrogates; IE/WebKit output them, Gecko/Opera U+FFFD them -->

 <li><p>Return code point <var>code unit</var>.
</ol>


<h3 id=utf-16be dfn export>UTF-16BE</h3>

<h4 id=utf-16be-decoder dfn export>UTF-16BE decoder</h4>

<p><a>UTF-16BE</a>'s <a for=/>decoder</a> is <a>shared UTF-16 decoder</a> with
its <a>is UTF-16BE decoder</a> set to true.


<h3 id=utf-16le dfn export>UTF-16LE</h3>

<p class="note no-backref">"<code>utf-16</code>" is a <a>label</a> for <a>UTF-16LE</a> to deal with
deployed content.


<h4 id=utf-16le-decoder dfn export>UTF-16LE decoder</h4>

<p><a>UTF-16LE</a>'s <a for=/>decoder</a> is <a>shared UTF-16 decoder</a>.


<h3 id=x-user-defined dfn export>x-user-defined</h3>

<p class=note>While technically this is a <a>single-byte encoding</a>,
it is defined separately as it can be implemented algorithmically.

<!--
This encoding is silly, however, the web depends on it:

https://krijnhoetmer.nl/irc-logs/whatwg/20121003#l-461
https://krijnhoetmer.nl/irc-logs/whatwg/20121010#l-812

https://stackoverflow.com/questions/6986789/why-are-some-bytes-prefixed-with-0xf7-when-using-charset-x-user-defined-with-xm
-->

<h4 id=x-user-defined-decoder dfn export>x-user-defined decoder</h4>

<p><a>x-user-defined</a>'s <a for=/>decoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>byte</var>, runs these steps:

<ol>
 <li><p>If <var>byte</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li><p>If <var>byte</var> is an <a>ASCII byte</a>, return
 a code point whose value is <var>byte</var>.

 <li><p>Return a code point whose value is 0xF780 + <var>byte</var> &minus; 0x80.
</ol>


<h4 id=x-user-defined-encoder dfn export>x-user-defined encoder</h4>

<p><a>x-user-defined</a>'s <a for=/>encoder</a>'s <a>handler</a>, given
<var>ioQueue</var> and <var>code point</var>, runs these steps:

<ol>
 <li><p>If <var>code point</var> is <a>end-of-queue</a>, return
 <a>finished</a>.

 <li><p>If <var>code point</var> is an <a>ASCII code point</a>, return
 a byte whose value is <var>code point</var>.

 <li><p>If <var>code point</var> is in the range U+F780 to U+F7FF, inclusive, return
 a byte whose value is <var>code point</var> &minus; 0xF780 + 0x80.

 <li><p>Return <a>error</a> with <var>code point</var>.
</ol>


<h2 id=browser-ui>Browser UI</h2>

<p>Browsers are encouraged to not enable overriding the encoding of a resource. If such a feature is
nonetheless present, browsers should not offer <a>UTF-16BE/LE</a> as an option, due to the
aforementioned security issues. Browsers should also disable this feature if the resource was
decoded using <a>UTF-16BE/LE</a>.


<h2 class=no-num id=implementation-considerations>Implementation considerations</h2>

<p>Instead of supporting <a for=/>I/O queues</a> with arbitrary <a for="I/O queue">prepend</a>, the
<a for=/>decoders</a> for <a for=/>encodings</a> in this standard could be implemented with:

<ol>
 <li><p>The ability to unread the current byte.

 <li>
  <p>A single-byte buffer for <a>gb18030</a> (an <a>ASCII byte</a>) and <a>ISO-2022-JP</a> (0x24 or
  0x28).

  <p class=example id=example-gb18030-implementation-strategy>For <a>gb18030</a> when hitting a
  bogus byte while <a>gb18030 third</a> is not 0x00, <a>gb18030 second</a> could be moved into the
  single-byte buffer to be returned next, and <a>gb18030 third</a> would be the new
  <a>gb18030 first</a>, checked for not being 0x00 after the single-byte buffer was returned and
  emptied. This is possible as the range for the first and third byte in <a>gb18030</a> is
  identical.
</ol>

<p>The <a>ISO-2022-JP encoder</a> needs <a>ISO-2022-JP encoder state</a> as additional state, but
other than that, none of the <a for=/>encoders</a> for <a for=/>encodings</a> in this standard
require additional state or buffers.


<h2 class=no-num id=acknowledgments>Acknowledgments</h2>

<p>There have been a lot of people that have helped make encodings more
interoperable over the years and thereby furthered the goals of this
standard. Likewise many people have helped making this standard what it is
today.

<p>With that, many thanks to
Adam Rice,
Alan Chaney,
Alexander Shtuchkin,
Allen Wirfs-Brock,
Andreu Botella,
Aneesh Agrawal,
Arkadiusz Michalski,
Asmus Freytag,
Ben Noordhuis,
Bnaya Peretz,
Boris Zbarsky,
Bruno Haible,
Cameron McCormack,
Charles McCathieNeville,
Christopher Foo,
CodifierNL, <!-- Codifier on GitHub -->
David Carlisle,
Domenic Denicola,
Dominique Hazaël-Massieux,
Doug Ewell,
Erik van der Poel,
譚永鋒 (Frank Yung-Fong Tang),
Glenn Maynard,
Gordon P. Hemsley,
Henri Sivonen,
Ian Hickson,
J. King,
James Graham,
Jeffrey Yasskin,
John Tamplin,
Joshua Bell,
村井純 (Jun Murai),
신정식 (Jungshik Shin),
Jxck,
강 성훈 (Kang Seonghoon),<!-- space is intentional: https://www.w3.org/Bugs/Public/show_bug.cgi?id=27675#c2 -->
川幡太一 (Kawabata Taichi),
Ken Lunde,
Ken Whistler,
Kenneth Russell,
田村健人 (Kent Tamura),
Leif Halvard Silli,
Luke Wagner,
Maciej Hirsz,
Makoto Kato,
Mark Callow,
Mark Crispin,
Mark Davis,
Martin Dürst,
Masatoshi Kimura,
Mattias Buelens,
Ms2ger,
Nigel Megitt,
Nigel Tao,
Norbert Lindenberg,
Øistein E. Andersen,
Peter Krefting,
Philip Jägenstedt,
Philip Taylor,
Richard Ishida,
Robbert Broersma,
Robert Mustacchi,
Ryan Dahl,
Sam Sneddon,
Shawn Steele,
Simon Montagu,
Simon Pieters,
Simon Sapin,
Stephen Checkoway,
寺田健 (Takeshi Terada),
Vyacheslav Matva,
Wolf Lammen, and
成瀬ゆい (Yui Naruse)
for being awesome.

<p>This standard is written by
<a href=https://annevankesteren.nl/ lang=nl>Anne van Kesteren</a>
(<a href=https://www.mozilla.org/>Mozilla</a>,
<a href=mailto:annevk@annevk.nl>annevk@annevk.nl</a>). The <a href=#api>API</a> chapter
was initially written by Joshua Bell (<a href=https://www.google.com/>Google</a>).