-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add get an encoder and encode or fail for URLs #238
Conversation
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Depends on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(posted by mistake, the actual review is below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these changes depend on #237 (comment).
encoding.bs
Outdated
<div class=note> | ||
<p>In addition to the <a>decode</a>, <a>BOM sniff</a>, and <a>encode</a> algorithms below, | ||
standards needing these legacy hooks will most likely also need to use <a>get an encoding</a> (to | ||
turn a <a>label</a> into an <a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turn a <a>label</a> into an <a for=/>encoding</a>) and <a>get an output encoding</a> (to turn an | |
turn a <a>label</a> into an <a for=/>encoding</a> instance) and <a>get an output encoding</a> (to turn an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed #240 on this. I'd like to go for a more minimal approach and can post a PR after this lands if that's okay.
<ol> | ||
<li><p>Assert: <var>encoding</var> is not <a>replacement</a> or <a>UTF-16BE/LE</a>. | ||
|
||
<li><p>Return <var>encoding</var>'s <a for=/>encoder</a>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<li><p>Return <var>encoding</var>'s <a for=/>encoder</a>. | |
<li><p>Return a new instance of <var>encoding</var>'s <a for=/>encoder</a>. |
There's a difference between an encoder (an "encoder class", so to speak) and an encoder instance, which has state. This hook should also be renamed to "get an encoder instance".
See also #237 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
</ol> | ||
|
||
<p>To <dfn export>encode or fail</dfn> an I/O queue of scalar values <var>ioQueue</var> given an | ||
<a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<a for=/>encoder</a> <var>encoder</var> and an I/O queue of bytes <var>output</var>, run these | |
<a for=/>encoder</a> instance <var>encoderInstance</var> and an I/O queue of bytes <var>output</var>, run these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
|
||
<ol> | ||
<li><p>Let <var>potentialError</var> be the result of <a>running</a> <var>encoder</var> with | ||
<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>". | |
<var>ioQueue</var>, <var>output</var>, and "<code>fatal</code>". | |
<li><p><a for="I/O queue">Push</a> <a>end-of-queue</a> into <var>encoder</var>. |
Needed so the conversion to a byte sequence in whatwg/url#558 doesn't hang.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this, but adjusted the wording slightly and pushed into output instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, nice catch.
encoding.bs
Outdated
</ol> | ||
|
||
<div class=note> | ||
<p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> alive as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> alive as | |
<p>This is a legacy hook for URLs. The caller will have to keep an <a for=/>encoder</a> instace alive as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
I thought of an alternative that seems slightly nicer, which is that we push the error itself into the output I/O queue. That still gives the caller freedom to deal with errors in whatever way they want, but preserves the existing contract of encode (other than having to pass a different error mode) and avoids having to hand out encoders. And if the remaining caller of encode really is text/plain (I need to double check that) it might well make sense to do away with "html" entirely. |
97ddfa3
to
beecec9
Compare
I seemed to remember that HTML also specified some encoding for multipart/form-data, and indeed, although it doesn't use the "encode" algorithm, what it describes is equivalent to encoding with replacement. I'm not very keen on the idea of returning an I/O queue of bytes or errors – I'd prefer encode to take an error-handling callback that can push into the output I/O queue, especially since multipart/form-data and text/plain could use the same callback which would be different from the one in the URL standard. But this isn't a blocker for me. |
Pushing into the output queue is no good (unless you get to push errors in, but at that point...). URL needs to process non-errors differently from errors. I suppose it could use the callback to do something else, but that seems worse than the current alternatives to me. |
Whoops, excuse my brain fart there. In that case, I'm still not too keen on it, but it seems to be the best way to solve this, so that's fine by me. |
I much prefer the concept of an encoder instance over the concept of error objects intermingling in a queue with bytes, because the former is closer to actual implementation concepts. |
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
Since the ISO-2022-JP encoder is stateful, percent-encoding needs to hold onto an instance of the encoder and manually perform error handling. This also requires the input to be the full string rather than individual code points as otherwise the callers of percent-encoding would need to be aware of this too. (As UTF-8 encoding cannot fail this problem does not affect those endpoints.) Builds on this Encoding PR: whatwg/encoding#238. Tests: web-platform-tests/wpt#26158 and web-platform-tests/wpt#26317. Fixes #557.
Please excuse the branch name. The first commit is #237 and I will rebase this once @andreubotella has looked at that.
I suspect some changes might be needed here around my IO queue usage. In particular, in this scenario we probably do want to push end-of-queue upon encountering an error, to make conversion to bytes easier for the caller.
Preview | Diff