Skip to content

Commit

Permalink
Fix percent-encoding for ISO-2022-JP
Browse files Browse the repository at this point in the history
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.)

Builds on this Encoding PR: whatwg/encoding#238.

Tests: web-platform-tests/wpt#26158.

Fixes #557.
  • Loading branch information
annevk committed Oct 28, 2020
1 parent 9637645 commit 22e8bd3
Showing 1 changed file with 42 additions and 53 deletions.
95 changes: 42 additions & 53 deletions url.bs
Original file line number Diff line number Diff line change
Expand Up @@ -217,81 +217,70 @@ inclusive, and U+007E (~).
all code points, except the <a>ASCII alphanumeric</a>, U+002A (*), U+002D (-), U+002E (.), and
U+005F (_).

<p>To <dfn for="code point">percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
<var>encoding</var>, <a for=/>code point</a> <var>codePoint</var>, and a
<var>percentEncodeSet</var>, run these steps:
<p>To <dfn for=string>percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
<var>encoding</var>, <a for=/>string</a> <var>input</var>, a <var>percentEncodeSet</var>, and an
optional boolean <var>spaceAsPlus</var> (default false), run these steps:

<ol>
<li><p>Let <var>bytes</var> be the result of <a lt=encode>encoding</a> <var>codePoint</var> using
<var>encoding</var>.
<li><p>Let <var>encoder</var> be the result of <a>getting an encoder</a> from <var>encoding</var>.

<li>
<p>If <var>bytes</var> starts with 0x26 (&amp;) 0x23 (#) and ends with 0x3B (;), then:

<ol>
<li><p>Let <var>output</var> be <var>bytes</var>, <a>isomorphic decoded</a>.
<li><p>Let <var>inputQueue</var> be <var>input</var> onverted to an <a for=/>I/O queue</a>.

<li><p>Replace the first two code points of <var>output</var> with "<code>%26%23</code>".

<li><p>Replace the last code point of <var>output</var> with "<code>%3B</code>".

<li><p>Return <var>output</var>.
</ol>
<li><p>Let <var>output</var> be the empty string.

<p class="note no-backref">This can happen when <var>encoding</var> is not <a>UTF-8</a>.
<li>
<p>Let <var>potentialError</var> be 0.

<li><p>Let <var>output</var> be the empty string.</p></li>
<p class=note>This needs to be a non-null value to initiate the subsequent while loop.

<li>
<p>For each <var>byte</var> of <var>bytes</var>:
<p>While <var>potentialError</var> is non-null:

<ol>
<li><p>Let <var>isomorph</var> be a <a for=/>code point</a> whose <a for="code point">value</a>
is <var>byte</var>'s <a for=byte>value</a>.
<li><p>Let <var>encodeOutput</var> be an empty <a for=/>I/O queue</a>.

<li><p>Assert: <var>percentEncodeSet</var> includes all non-<a>ASCII code points</a>.
<li><p>Set <var>potentialError</var> to the result of running <a>encode or fail</a> with
<var>inputQueue</var>, <var>encoder</var>, and <var>encodeOutput</var>.

<li><p>If <var>isomorph</var> is not in <var>percentEncodeSet</var>, then append
<var>isomorph</var> to <var>output</var>.
<li>
<p>For each <var>byte</var> of <var>encodeOutput</var> converted to a byte sequence:

<li><p>Otherwise, <a for=byte>percent-encode</a> <var>byte</var> and append the result to
<var>output</var>.
</ol>
<ol>
<li><p>If <var>spaceAsPlus</var> is true and <var>byte</var> is 0x20 (SP), then append
U+002B (+) to <var>output</var>.

<li><p>Return <var>output</var>.
</ol>
<li><p>Let <var>isomorph</var> be a <a for=/>code point</a> whose <a for="code point">value</a>
is <var>byte</var>'s <a for=byte>value</a>.

<p>To <dfn for="string">percent-encode after encoding</dfn>, given an <a for=/>encoding</a>
<var>encoding</var>, <a for=/>string</a> <var>input</var>, a <var>percentEncodeSet</var>, and a
boolean <var>spaceAsPlus</var>, run these steps:
<li><p>Assert: <var>percentEncodeSet</var> includes all non-<a>ASCII code points</a>.

<ol>
<li><p>Let <var>output</var> be the empty string.</p></li>
<li><p>If <var>isomorph</var> is not in <var>percentEncodeSet</var>, then append
<var>isomorph</var> to <var>output</var>.

<li>
<p>For each <var>codePoint</var> of <var>input</var>:
<li><p>Otherwise, <a for=byte>percent-encode</a> <var>byte</var> and append the result to
<var>output</var>.
</ol>

<ol>
<li><p>If <var>spaceAsPlus</var> is true and <var>codePoint</var> is U+0020, then append
U+002B (+) to <var>output</var>.
<li>
<p>If <var>potentialError</var> is non-null, then append "<code>%26%23</code>", followed by the
shortest sequence of <a for=/>ASCII digits</a> representing <var>potentialError</var> in base
ten, followed by "<code>%3B</code>", to <var>output</var>.

<li><p>Otherwise, run <a for="code point">percent-encode after encoding</a> with
<var>encoding</var>, <var>codePoint</var>, and <var>percentEncodeSet</var>, and append the result
to <var>output</var>.
<p class="note no-backref">This can happen when <var>encoding</var> is not <a>UTF-8</a>.
</ol>

<li><p>Return <var>output</var>.
</ol>

<p>To <dfn for="code point" id=utf-8-percent-encode>UTF-8 percent-encode</dfn> a
<a for=/>code point</a> <var>codePoint</var> using a <var>percentEncodeSet</var>, return the result
of running <a for="code point">percent-encode after encoding</a> with <a for=/>UTF-8</a>,
<var>codePoint</var>, and <var>percentEncodeSet</var>.
of running <a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>,
<var>codePoint</var> as a <a for=/>string</a>, and <var>percentEncodeSet</var>.

<p>To <dfn export for=string>UTF-8 percent-encode</dfn> a <a for=/>string</a> <var>input</var> using
a <var>percentEncodeSet</var>, return the result of running
<a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>, <var>input</var>,
<var>percentEncodeSet</var>, and false.
<a for=string>percent-encode after encoding</a> with <a for=/>UTF-8</a>, <var>input</var>, and
<var>percentEncodeSet</var>.

<hr>

Expand Down Expand Up @@ -319,20 +308,20 @@ a <var>percentEncodeSet</var>, return the result of running
<td>"<code>‽%25%2E</code>"
<td>0xE2 0x80 0xBD 0x25 0x2E
<tr>
<td rowspan=3><a for="code point">Percent-encode after encoding</a> with <a>Shift_JIS</a>,
<td rowspan=3><a for=string>Percent-encode after encoding</a> with <a>Shift_JIS</a>,
<var>input</var>, and the <a>userinfo percent-encode set</a>
<td>U+0020
<td>"<code> </code>"
<td>"<code>%20</code>"
<tr>
<td>U+2261 (≡)
<td>"<code></code>"
<td>"<code>%81%DF</code>"
<tr>
<td>U+203D (‽)
<td>"<code></code>"
<td>"<code>%26%238253%3B</code>"
<tr>
<td><a for="code point">Percent-encode after encoding</a> with <a>ISO-2022-JP</a>,
<var>input</var>, and the <a>userinfo percent-encode set</a>
<td>U+00A5 (¥)
<td><a for=string>Percent-encode after encoding</a> with <a>ISO-2022-JP</a>, <var>input</var>,
and the <a>userinfo percent-encode set</a>
<td>"<code>¥</code>"
<td>"<code>%1B(J\%1B(B</code>"
<tr>
<td><a for=string>Percent-encode after encoding</a> with <a>Shift_JIS</a>, <var>input</var>, the
Expand Down

0 comments on commit 22e8bd3

Please sign in to comment.