Skip to content

Encourage UTF-8 for new formats and APIs #322

@domenic

Description

@domenic

A lot of the new formats and APIs we've been designing (and some not-so-new) assume UTF-8 unconditionally. These include:

  • JavaScript modules (including upcoming JSON and CSS modules)
  • Workers, and anything included in them via importScript()s
  • WebSockets
  • EventSource
  • fetch()'s text() convenience method; XMLHttpRequest's responseText convenience getter; and Blob's text() convenience method
  • Various not-yet-shipped or still-early JSON-based formats like import maps, origin policy, or speculation rules

We also made it non-conforming for HTML documents to use any other encoding. And, Encoding tries to be clear that everything else is legacy.

It'd be good if this was captured in the design principles doc. https://w3ctag.github.io/design-principles/#new-data-formats is one place, that captures several of the above examples. There might be room for some separate guidance on APIs (not just formats), to capture the text() and responseText examples: basically, any time an API is interpreting some unknown bytes as a string, it should just assume it's always UTF-8.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions