Skip to content

Conversation

andreubotella
Copy link
Member

@andreubotella andreubotella commented Jan 12, 2021

The multipart/form-data encoding algorithm, rather than specifying an escape algorithm for names and file names, used to provide a recommendation for escaping, that doesn't match what browsers do. This change specifies the escaping mechanism.


This PR is a replacement for #3276, since that PR's changes don't reflect the behavior of current browsers, and it seems to confuse escaping unmappable characters at the character encoding level with escaping bytes as needed for the format.

(See WHATWG Working Mode: Changes for more details.)


/form-control-infrastructure.html ( diff )

The multipart/form-data encoding algorithm, rather than specifying an
escape algorithm for names and file names, used to provide a
recommendation for escaping, that doesn't match what browsers do. This
change specifies the escaping mechanism.
source Outdated
<var>encoding</var> could be replaced by other characters before encoding).</p></li>
<var>encoding</var>, converted to a byte sequence.</p></li>

<li><p>Field names and file names for file fields must be escaped after encoding by replacing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make "encoding" a reference now and also pass that <var>encoding</var>? It seems your tests account for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean after <span data-x="encode">encoding</span> with <var>encoding</var>?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

Copy link
Member

@annevk annevk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this seems like a good incremental improvement here. Will let @domenic make the final call.

Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only editorial concerns.

@andreubotella andreubotella requested a review from domenic January 13, 2021 19:11
Copy link
Member

@domenic domenic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is much clearer!

@domenic domenic merged commit 290eaa4 into whatwg:master Jan 13, 2021
@andreubotella andreubotella deleted the multipart-escape branch January 13, 2021 23:23
domenic pushed a commit to web-platform-tests/wpt that referenced this pull request Jan 13, 2021
See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
@domenic
Copy link
Member

domenic commented Jan 13, 2021

Don't forget to file the appropriate Firefox bug for the failing tests in web-platform-tests/wpt#27142, although perhaps you are holding off on doing that until other related spec changes also land.

@andreubotella
Copy link
Member Author

Don't forget to file the appropriate Firefox bug for the failing tests in web-platform-tests/wpt#27142, although perhaps you are holding off on doing that until other related spec changes also land.

I was holding off until I opened #6287 and added some tests for that: https://bugzilla.mozilla.org/show_bug.cgi?id=1686765

moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Jan 15, 2021
…scapes not tentative, a=testonly

Automatic update from web-platform-tests
Make the tests for multipart/form-data escapes not tentative

See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
--

wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9
wpt-pr: 27142
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Jan 16, 2021
…scapes not tentative, a=testonly

Automatic update from web-platform-tests
Make the tests for multipart/form-data escapes not tentative

See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
--

wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9
wpt-pr: 27142

UltraBlame original commit: 4283b11039b1ee9cc93eb5289dcdfeb9c6d06972
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Jan 19, 2021
…scapes not tentative, a=testonly

Automatic update from web-platform-tests
Make the tests for multipart/form-data escapes not tentative

See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
--

wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9
wpt-pr: 27142
gecko-dev-updater pushed a commit to marco-c/gecko-dev-wordified that referenced this pull request Jan 25, 2021
…scapes not tentative, a=testonly

Automatic update from web-platform-tests
Make the tests for multipart/form-data escapes not tentative

See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
--

wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9
wpt-pr: 27142

UltraBlame original commit: b70b8b891ecd9535c03560c6a06c4a1972bcd51e
<li><p>For field names and file names for file fields, the result of the encoding in the
previous bullet point must be escaped by replacing any 0x0A (LF) bytes with the byte sequence
`<code data-x="">%0A</code>`, 0x0D (CR) with `<code data-x="">%0D</code>` and 0x22 (") with
`<code data-x="">%22</code>`. The user agent must not perform any other escapes.</p></li>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify how server-side software should distinguish filenames like %22.txt vs ".txt?

The line says "must not perform any other escapes", so if the filename was %22.txt, then user agent must pass it as %22.txt, while it should convert filename of ".txt as %22.txt.
That creates ambiguity, so the server-side won't be able to tell the original filename.

Am I missing something?

Copy link
Member Author

@andreubotella andreubotella Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, currently there is no way for server-side software to tell between those two filenames. There's a proposal to fix this in #7575, but it's not clear that making that change will not break currently existing servers.

Note that this PR made the specification align to what two out of three browser engines (Chrome and Safari) were already doing. And while Firefox's behavior back then did distinguish %22.txt and ".txt, it did conflate newlines and spaces, for example.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A related issue: it looks like there's no way to tell which encoding did browser use to encode the filename. Does it make sense to add an explicit field for it somewhere?
For instance, a per-Content-Disposition charset or a global, per-form charset.

I'm trying to fix unicode filename upload in Apache JMeter, so I am interested in understanding the encodings better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the form contains an <input type="hidden" name="_charset_"> field, browsers will automatically populate it with the form submission encoding's name, see the "construct the entry list" algorithm.

But in the general case, the encoding name is not included in the form submission body, and I doubt a proposal to include it would gain much support from browser vendors, since UTF-8 is preferred for modern websites.

@whatwg whatwg deleted a comment from Jessi-alt21 May 27, 2024
bhearsum pushed a commit to mozilla-releng/staging-firefox that referenced this pull request May 1, 2025
…scapes not tentative, a=testonly

Automatic update from web-platform-tests
Make the tests for multipart/form-data escapes not tentative

See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
--

wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9
wpt-pr: 27142
bhearsum pushed a commit to mozilla-releng/staging-firefox that referenced this pull request May 1, 2025
…scapes not tentative, a=testonly

Automatic update from web-platform-tests
Make the tests for multipart/form-data escapes not tentative

See whatwg/html#6282.

This change doesn't mark the tests dealing with building a
multipart/form-data request body directly from a `FormData` object
because some of the details are still being worked on in
whatwg/html#6247.
--

wpt-commits: c7b6a59ca2ae1f71b1711884eefa135e20ca8bc9
wpt-pr: 27142
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants