-
Notifications
You must be signed in to change notification settings - Fork 3k
Specify escaping in the multipart/form-data encoding algorithm #6282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The multipart/form-data encoding algorithm, rather than specifying an escape algorithm for names and file names, used to provide a recommendation for escaping, that doesn't match what browsers do. This change specifies the escaping mechanism.
source
Outdated
<var>encoding</var> could be replaced by other characters before encoding).</p></li> | ||
<var>encoding</var>, converted to a byte sequence.</p></li> | ||
|
||
<li><p>Field names and file names for file fields must be escaped after encoding by replacing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make "encoding" a reference now and also pass that <var>encoding</var>
? It seems your tests account for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean after <span data-x="encode">encoding</span> with <var>encoding</var>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this seems like a good incremental improvement here. Will let @domenic make the final call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only editorial concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is much clearer!
See whatwg/html#6282. This change doesn't mark the tests dealing with building a multipart/form-data request body directly from a `FormData` object because some of the details are still being worked on in whatwg/html#6247.
Don't forget to file the appropriate Firefox bug for the failing tests in web-platform-tests/wpt#27142, although perhaps you are holding off on doing that until other related spec changes also land. |
I was holding off until I opened #6287 and added some tests for that: https://bugzilla.mozilla.org/show_bug.cgi?id=1686765 |
…scapes not tentative, a=testonly Automatic update from web-platform-tests Make the tests for multipart/form-data escapes not tentative See whatwg/html#6282. This change doesn't mark the tests dealing with building a multipart/form-data request body directly from a `FormData` object because some of the details are still being worked on in whatwg/html#6247. -- wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9 wpt-pr: 27142
…scapes not tentative, a=testonly Automatic update from web-platform-tests Make the tests for multipart/form-data escapes not tentative See whatwg/html#6282. This change doesn't mark the tests dealing with building a multipart/form-data request body directly from a `FormData` object because some of the details are still being worked on in whatwg/html#6247. -- wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9 wpt-pr: 27142 UltraBlame original commit: 4283b11039b1ee9cc93eb5289dcdfeb9c6d06972
…scapes not tentative, a=testonly Automatic update from web-platform-tests Make the tests for multipart/form-data escapes not tentative See whatwg/html#6282. This change doesn't mark the tests dealing with building a multipart/form-data request body directly from a `FormData` object because some of the details are still being worked on in whatwg/html#6247. -- wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9 wpt-pr: 27142
…scapes not tentative, a=testonly Automatic update from web-platform-tests Make the tests for multipart/form-data escapes not tentative See whatwg/html#6282. This change doesn't mark the tests dealing with building a multipart/form-data request body directly from a `FormData` object because some of the details are still being worked on in whatwg/html#6247. -- wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9 wpt-pr: 27142 UltraBlame original commit: b70b8b891ecd9535c03560c6a06c4a1972bcd51e
<li><p>For field names and file names for file fields, the result of the encoding in the | ||
previous bullet point must be escaped by replacing any 0x0A (LF) bytes with the byte sequence | ||
`<code data-x="">%0A</code>`, 0x0D (CR) with `<code data-x="">%0D</code>` and 0x22 (") with | ||
`<code data-x="">%22</code>`. The user agent must not perform any other escapes.</p></li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify how server-side software should distinguish filenames like %22.txt
vs ".txt
?
The line says "must not perform any other escapes", so if the filename was %22.txt
, then user agent must pass it as %22.txt
, while it should convert filename of ".txt
as %22.txt
.
That creates ambiguity, so the server-side won't be able to tell the original filename.
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, currently there is no way for server-side software to tell between those two filenames. There's a proposal to fix this in #7575, but it's not clear that making that change will not break currently existing servers.
Note that this PR made the specification align to what two out of three browser engines (Chrome and Safari) were already doing. And while Firefox's behavior back then did distinguish %22.txt
and ".txt
, it did conflate newlines and spaces, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A related issue: it looks like there's no way to tell which encoding did browser use to encode the filename. Does it make sense to add an explicit field for it somewhere?
For instance, a per-Content-Disposition
charset or a global, per-form charset.
I'm trying to fix unicode filename upload in Apache JMeter, so I am interested in understanding the encodings better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the form contains an <input type="hidden" name="_charset_">
field, browsers will automatically populate it with the form submission encoding's name, see the "construct the entry list" algorithm.
But in the general case, the encoding name is not included in the form submission body, and I doubt a proposal to include it would gain much support from browser vendors, since UTF-8 is preferred for modern websites.
…scapes not tentative, a=testonly Automatic update from web-platform-tests Make the tests for multipart/form-data escapes not tentative See whatwg/html#6282. This change doesn't mark the tests dealing with building a multipart/form-data request body directly from a `FormData` object because some of the details are still being worked on in whatwg/html#6247. -- wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9 wpt-pr: 27142
…scapes not tentative, a=testonly Automatic update from web-platform-tests Make the tests for multipart/form-data escapes not tentative See whatwg/html#6282. This change doesn't mark the tests dealing with building a multipart/form-data request body directly from a `FormData` object because some of the details are still being worked on in whatwg/html#6247. -- wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9 wpt-pr: 27142
The multipart/form-data encoding algorithm, rather than specifying an escape algorithm for names and file names, used to provide a recommendation for escaping, that doesn't match what browsers do. This change specifies the escaping mechanism.
This PR is a replacement for #3276, since that PR's changes don't reflect the behavior of current browsers, and it seems to confuse escaping unmappable characters at the character encoding level with escaping bytes as needed for the format.
(See WHATWG Working Mode: Changes for more details.)
/form-control-infrastructure.html ( diff )