Skip to content

Web reality: Newline normalization in form payloads and in FormData objects #6247

@andreubotella

Description

@andreubotella

I previously opened whatwg/url#562 for this, but it soon became clear that it's not something limited to urlencoded.

There are differences between the newline normalization in forms in the different browsers and the behavior mandated by the spec. This is most apparent with filenames as serialized in the urlencoded and text/plain enctypes, since all browsers agree to normalize the newlines, but the spec would have them without normalization.

For some context about the spec's behavior, newline normalization in forms happens in the "append an entry" algorithm, which normalizes names and string values, but doesn't touch the filename or data of File values. This algorithm is called when constructing the entry list, except in the case of form-associated custom elements, for which it's called for string or File submission values but not for FormData submission values.

Additionally, step 7 in "constructing the entry list" algorithm fires a formdata event whose event object contains a FormData object whose entry list is the form's entry list, and that object is live (i.e., modifying it from JS modifies the form's entry list). Those modifications are not then normalized.

Since the urlencoded and text/plain enctypes only support file values, their serialization algorithms explicitly handle File values by replacing them with the filename, but they perform no further normalization.


Test results and discussion:

For discussing these results, it's useful to distinguish between two types of "sources" for form entries, which I'll be calling "String/File-sourced" and "FormData-sourced" (these are ad-hoc names), distinguished by whether those entries were added through the spec's "append an entry" algorithm. String/File-sourced entries are entries which come from:

  • <input type="text">, <input type="hidden">, etc. including <input type="file">
  • Form-associated custom elements whose submission value is a string or a File object.

FormData-sourced entries come from creating or modifying a FormData object from JS code and then serializing it as a form payload. As far as I can tell, that is only possible in the following ways:

  • The FormData object is set as the submission value of a form-associated custom element.
  • The FormData object is the formData attribute of FormDataEvent and is modified from JS.
  • A Request or Response object is constructed with the FormData object as its body.

Note that not all browsers implement all of the above: form-associated custom elements are only implemented in Chrome, Safari doesn't support the formdata event, Firefox in multipart/form-data replaces newlines in names and filenames with a space (see #3223 #6282). However, for each group of cases below, every browser supports at least one of the possible ways to test it.

"Normalized" here means following the normalization in the "append an entry" algorithm. When used for serializations, it means all names and values are serialized, including string values that come from filenames in the entry list.

  • new FormData(formElement) and FormDataEvent#formData:
    • String/File-sourced entries:
      • Spec, Chrome: names and string values are normalized
      • Firefox, Safari: unchanged
    • FormData-sourced entries: unchanged
  • application/x-www-form-urlencoded and text/plain serializations:
    • Spec:
      • String/File-sourced entries: names and values which derive from strings are normalized
      • FormData-sourced entries: unchanged
    • Chrome on application/x-www-form-urlencoded, Firefox, Safari: normalized
    • Chrome on text/plain:
      • String/File-sourced entries: normalized
      • FormData-sourced entries: unchanged except for values which derive from an original filename, which are normalized. This is almost certainly a bug.
  • multipart/form-data serialization:
    • String/File-sourced entries:
      • Spec, Chrome, Safari: names and string values are normalized, filenames are unchanged
      • Firefox: string values are normalized (names and filenames N/A)
    • FormData-sourced entries:
      • Spec, Chrome: unchanged
      • Firefox: string values are normalized (names and filenames N/A)
      • Safari: names and string values are normalized, filenames are unchanged

It seems to me like we should change the spec to have urlencoded and text/plain normalize everything, and the rest can stay as currently in the spec (but with wpt tests and bugs filed on browsers).

Note that for urlencoded at least, this normalization has to take place at some point before the serializer is invoked, since URLSearchParams objects interoperably don't normalize entries. This would also remove the need for the urlencoded and text/plain serializers to have to deal with files.

@whatwg/forms any thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions