Skip to content

Documenting de-facto handling of multipart/form-data form field file uploads #3040

@stanvass

Description

@stanvass

I don't see this documented in any IETF, W3C or WHATWG spec, and I notice browser bugs filed throughout the last 15 or so years, when a browser chooses to follow the word of https://www.ietf.org/rfc/rfc1867.txt and ignored some de-facto behaviors that contradict the spec. What happens is the browser is released, then it breaks a lot of servers and server apps, and the browser is forced to reimplement or revert to the undocumented behavior.

Namely, this part of RFC 1867 contradicts real-world behavior:

File inputs may also identify the file name. The file name may be described using the 'filename' parameter of the "content-disposition" header. This is not required, but is strongly recommended in any case where the original filename is known. This is useful or necessary in many applications.

Despite it says it's necessary in many applications right after, the "this is not required" part strongly underestimates the issue.

In theory:

  • Any form field that doesn't have an explicit Content-Type, or the Content-Type is different than text/plain (the default) should be considered a "file upload".
  • The "filename" can be omitted if its empty, empty and missing filename is the same thing.

But in practice...

  • Any presence of header Content-Type in a form field triggers the behavior of servers to consider the field a "file upload"
  • Virtually all mainstream servers require the presence of the filename attribute. If the filename attribute is not present, many servers incorrectly try to interpret the field as a text field. If the filename attribute is present, then the field is considered a file upload field.
  • Virtually all mainstream servers require the filename attribute to be non-empty if the file upload is to be considered non-empty (some even ignore any present content if the filename is left empty). And respectively if the filename attribute is present, but empty (i.e. filename=""), it's treated by servers as a signal that this is a file upload field, where the user hasn't specified a file.

Here's a link to the implementation of file uploads in PHP, which powers a good % of the web right now:
https://github.com/php/php-src/blob/c8aa6f3a9a3d2c114d0c5e0c9fdd0a465dbb54a5/main/rfc1867.c

Notice PHP treats the filename attribute as the absolute truth about whether a field is a file upload field (filename attribute present) and if a file is present in it (depending on whether the attribute is an empty string or non-empty string). Many other servers do that.

I'm sorry I can't provide more concrete references as I researched this few weeks ago and I didn't keep links and now I find it hard to find them back. It didn't occur to me to file an issue here, but now I am. I hope this might trigger a discussion and more proper research on the topic.

Note that his de-facto behavior has implications in all unexpected places. For example one Chrome bug (if I remember right) involved Blob FormData uploads not encoding a filename attribute and this causing servers not to treat them as file uploads. The solution was to assign a bogus name to the Blob on upload, i.e. say just a hash of its contents (say CRC32). I remember seeing another similar issue being resolved by assigning names like "Untitled.jpg" to uploads, so the filename attribute is not left empty.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions