Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions source
Original file line number Diff line number Diff line change
Expand Up @@ -56546,10 +56546,12 @@ fur
<li><p>Field names, field values for non-file fields, and file names for file fields, in the
generated <code>multipart/form-data</code> resource must be set to the result of <span
data-x="encode">encoding</span> the corresponding entry's name or value with
<var>encoding</var>, converted to a byte sequence. In the case of file names, however, the
precise name may be approximated if necessary (e.g., newlines could be removed from file names,
quotes could be changed to "<code data-x="">%22</code>", and characters not expressible in
<var>encoding</var> could be replaced by other characters before encoding).</p></li>
<var>encoding</var>, converted to a byte sequence.</p></li>

<li><p>For field names and file names for file fields, the result of the encoding in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make "encoding" a reference now and also pass that <var>encoding</var>? It seems your tests account for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean after <span data-x="encode">encoding</span> with <var>encoding</var>?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

previous bullet point must be escaped by replacing any 0x0A (LF) bytes with the byte sequence
`<code data-x="">%0A</code>`, 0x0D (CR) with `<code data-x="">%0D</code>` and 0x22 (") with
`<code data-x="">%22</code>`. The user agent must not perform any other escapes.</p></li>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please clarify how server-side software should distinguish filenames like %22.txt vs ".txt?

The line says "must not perform any other escapes", so if the filename was %22.txt, then user agent must pass it as %22.txt, while it should convert filename of ".txt as %22.txt.
That creates ambiguity, so the server-side won't be able to tell the original filename.

Am I missing something?

Copy link
Member Author

@andreubotella andreubotella Jun 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, currently there is no way for server-side software to tell between those two filenames. There's a proposal to fix this in #7575, but it's not clear that making that change will not break currently existing servers.

Note that this PR made the specification align to what two out of three browser engines (Chrome and Safari) were already doing. And while Firefox's behavior back then did distinguish %22.txt and ".txt, it did conflate newlines and spaces, for example.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A related issue: it looks like there's no way to tell which encoding did browser use to encode the filename. Does it make sense to add an explicit field for it somewhere?
For instance, a per-Content-Disposition charset or a global, per-form charset.

I'm trying to fix unicode filename upload in Apache JMeter, so I am interested in understanding the encodings better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the form contains an <input type="hidden" name="_charset_"> field, browsers will automatically populate it with the form submission encoding's name, see the "construct the entry list" algorithm.

But in the general case, the encoding name is not included in the form submission body, and I doubt a proposal to include it would gain much support from browser vendors, since UTF-8 is preferred for modern websites.


<li><p>The parts of the generated <code>multipart/form-data</code> resource that correspond to
non-file fields must not have a `<code>Content-Type</code>` header specified.</p></li>
Expand Down