Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add user preferences to HTTP header #416

Open
sffc opened this issue Mar 3, 2020 · 18 comments
Open

RFC: Add user preferences to HTTP header #416

sffc opened this issue Mar 3, 2020 · 18 comments
Labels
c: locale Component: locale identifiers Proposal Larger change requiring a proposal s: in progress Status: the issue has an active proposal User Preferences Related to user preferences

Comments

@sffc
Copy link
Contributor

sffc commented Mar 3, 2020

This probably isn't the *best* place to post this issue, but we can get the conversation started.

Hello,

I would like to discuss the following RFC to add user locale preferences to HTTP requests.

Motivating Example

Alice, an end user, reads en-US, but prefers using a 24-hour clock. She has set that preference in her operating system settings. Native apps are able to respect her preference. However, web apps cannot access OS user preferences, so on the Web Platform, Alice sees a 12-hour clock, the default for en-US. This RFC proposes a way to bubble OS user preferences into the Web Platform, both on the frontend and the backend via HTTP headers.

Background

User preferences, such as first day of week, collation system, and preferred calendar, are often system-wide settings (such as in Android, macOS, or Windows), but there is not currently a standard way to access those preferences in the Web Platform.

We've been discussing in #6 and #409 various mechanisms to get at user preferences from the client side. However, there are also situations when it would be good for the server to get those user preferences; for example, rendering might start on the server and finish on the client, or the server might want to perform special logic based on the user's preferences.

There is a desire to use the BCP47 mechanism for delivering user preferences, to handle these preferences in a consistent way across the industry, rather than inventing something novel in ECMAScript. UTS 35 defines a set of the most common user preferences, which could be extended when there is demand. As of April 2020, the list is:

  • Calendar system (-u-ca-)
  • Whether to use accounting currency format (-u-cf-)
  • Collation sort order (-u-co-)
  • Currency code (-u-cu-)
  • Emoji presentation style, text or color (-u-em-)
  • First day of week (-u-fw-)
  • Hour cycle, i.e., 12-hour or 24-hour clock (-u-hc-)
  • Segmentation (break) settings (-u-lb-, -u-lw-, -u-ss-)
  • Measurement system, i.e., metric or imperial (-u-ms-)
  • Numbering system (-u-nu-)
  • Region override and subdivision (-u-rg-, -u-sd-)
  • Time zone (-u-tz-)

Proposed Solution

We have the existing Accept-Language header that is sent to the server on every request. The specification allows BCP47 language tags, but in practice, no known browser puts extension subtags into the content of that header. This raises concerns (@litherum @FrankYFTang) that it may break the web to start adding subtags to locales in the existing header.

Mark Davis (@macchiati) suggested an alternate approach: add only a single language tag with the extension keywords, and put it at the end of the header. The language code mul could be used to signify that the subtags should be applied to all languages. For example:

Accept-Language: fr-CH,fr;q=0.9, en;q=0.8, mul-u-hc-h23

The semantics understood by the server would be that mul-u-hc-h23 is the user's option bag of preferences. If the user has no overrides in their locale, the option bag does not need to be sent.

This approach has the following advantages:

  • Fully compliant with existing specifications (for both the IETF header syntax and BCP47).
  • Does not require going through the long process of adding a new header to IETF.
  • In the common case, does not increase HTTP payload; only when the user has custom overrides would the option bag need to be sent.

Disadvantages:

  • Raises some concerns regarding fingerprinting (@litherum). However, it would in no way be required that the user agent sends this bag of user preferences, so when the user wishes to browse in incognito mode, the user agent may choose to not send this information.
  • Would increase base HTTP payload when custom settings are used. However, in the common case, users will continue using default settings from their language/script/region; it can be expected that the extra bytes will not be carried by every user.
  • Still some theoretical possibility that this breaks servers somewhere that assume all language tags in Accept-Language only ever have the three subtags. However, @markusicu pointed out that since we already do allow the script subtag in Accept-Language, which was not always the case, it is likely that servers that read this header are equipped with a proper BCP47-compatible parsing engine.

Also CC @aphillips as the chair of the W3C i18n WG.

Alternatives considered

We could propose a new header, such as Accept-Locale or Locale-Preferences, but this presents a much more uncertain path forward. The span of time to get these adopted would be much longer, even if all the cards fell the right way. We would like to consider a solution that works within the current framework before introducing something novel.

@litherum also pointed out the draft User-Agent Client Hints proposal in WHATWG. That proposal appears to be largely focused on solving the problem of user agent properties, like operating system, architecture, etc., but we could possibly work with the champions of that proposal (@mikewest, @yoavweiss) to see if it could be extended to also cover user preferences.

@sffc sffc added s: discuss Status: TG2 must discuss to move forward c: locale Component: locale identifiers labels Mar 3, 2020
@sffc
Copy link
Contributor Author

sffc commented Apr 14, 2020

@aphillips @litherum @zbraniecki Does this RFC SGTY? Please give a "thumbs up".

I chatted with @skywalkerswartz, @sven-oly, and @macchiati, who are generally onboard with the direction of this RFC. The next step involves filing a CLDR issue to spec out how to interpret the Accept-Language header (basically a spec on top of IETF), and doing smoke testing to make sure this doesn't break the web in unexpected ways.

@mathiasbynens
Copy link
Member

I'm in favor of this proposal.

I really like the idea of only exposing a single language tag with the extension keywords at the end of the header, for optimal backwards compatibility.

Some additional thoughts and questions:

However, @markusicu pointed out that since we already do allow the script subtag in Accept-Language, which was not always the case, it is likely that servers that read this header are equipped with a proper BCP47-compatible parsing engine.

Can we get some data on this for popular server software? If Apache, nginx, etc. already do have a proper BCP47-compatible parser, that would certainly make the path to adoption easier.

I'm interested in the fingerprinting concerns. Note that navigator.locale, navigator.locales, and new Intl.NumberFormat().resolvedOptions().locale already expose the same information. This proposal merely exposes the same information at the HTTP header level as well. Fingerprinting is still a valid concern, but there's some nuance in that this is not an entirely new vector.

What guarantees do we have that mul will never become an actual language tag and thus collide with this proposal?

@sffc
Copy link
Contributor Author

sffc commented Apr 17, 2020

I'm interested in the fingerprinting concerns. Note that navigator.locale, navigator.locales, and new Intl.NumberFormat().resolvedOptions().locale already expose the same information. This proposal merely exposes the same information at the HTTP header level as well. Fingerprinting is still a valid concern, but there's some nuance in that this is not an entirely new vector.

AFAIK, navigator.locales is not yet available (#68). If this RFC gets adopted, the idea would be that navigator.locales becomes available and exposes the same information as Accept-Language with user preferences.

What guarantees do we have that mul will never become an actual language tag and thus collide with this proposal?

It is already reserved for this purpose: https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes

@aphillips
Copy link

@sffc This is an interesting proposal. I like the possibility but I'm cautious about using Accept-Language in this way. It adds complexity to parsing the header and introduces a somewhat nebulous bag of preferences. Browser support for managing what is in the header has never been particularly good and user control over the header is pretty spartan today. Combine that with the fact that the values being transmitted might not apply evenly to all of the customer's locale choices (or the server's available languages) and I am wary that it might result in odd user experiences.

I like Mark's (@macchiati) proposal of using mul, although I might suggest und as an alternative (it matches the root locale in CLDR, so providing some insulation for content selection). I'd also suggest adding a q=0.0 weight to the value to help avoid it affecting existing language negotiation mechanisms.

I would also probably prohibit transmitting the value by itself (in an otherwise empty Accept-Language header) or one with only the special range *.

As a wacky idea, you might consider whether a similar addendum to the Content-Language value might be useful in telling the client what options were applied to the rendered page: Intl could then match server-generated stuff if the page author so desired.

@aphillips
Copy link

@mathiasbynens

WRT:

What guarantees do we have that mul will never become an actual language tag and thus collide with this proposal?
It is already reserved for this purpose: https://en.wikipedia.org/wiki/List_of_ISO_639-2_codes

Usually it's better to look at BCP47 directly instead of the the underlying standards: subtags in BCP47's registry are stabilized and mul's meaning is fixed in place in the IANA registry (cf. here). It is valid in "actual language tags", but the use/meaning is permanently limited. Conflicts with real world usage of tags based on mul are likely limited.

@macchiati
Copy link

We considered 'und', but that is used in various APIs to mean 'no match', whereas 'mul' really can't serve any purpose in circumstances where multiple languages can be listed, for the reasons given in https://tools.ietf.org/html/bcp47#section-4.1, item 5.

So that's why 'mul' seems like the much better choice to serve as the base, because it couldn't serve any other purpose.

The suggestion of q=0 is interesting as a best practice, if there are any q numbers otherwise in the list. I think any good implementation just ignores q's, but for bad implementations q=0 might be useful.

[Aside: In general, I'd recommend against using q. Because the semantics of q are so fuzzy, nobody knows what numbers to use, and because of that, no implementation knows what was the intent of those numbers and thus how to interpret them.
I can handle English, German, and French in that order, but should that be en;q=1, de;q=0.9, fr;q=0.8? Or en;q=0.5, de;q=0.3, fr;q=0.2? Nobody knows what the numbers mean, and thus no implementations treat them consistently.

Note https://tools.ietf.org/html/rfc7231#section-5.3.5, which also points to https://tools.ietf.org/html/rfc4647#section-2.3) which has "A simple list of ranges is considered to be in descending order of priority."

My best advice is that if an implementation gets any q's less than 1, to just reorder the list in stable descending order.]

@aphillips
Copy link

Implementations (meaning browsers) actually generate q values and the specification requires them. Since we're using an existing mechanism, we can't just rubbish them.

There is some predictability to the assignments, but, as you note, no real value to the numbers. AFAIK, no one pays attention to them except as a "parsing problem" for the header. I do not know of a browser that generates this header and doesn't put the numbers in descending order. However, since the browser is obligated to make them and since they appear in the real world (and since consumers may have been foolish and paid attention to the numbers??), why not mandate 0.0?

Regarding und/mul, I kind of agree, but note that (a) I really really don't want to see folks thinking mul means something and (b) don't want naive implementations to consume the tag and do something wacky. Thinking outside the box, maybe we should go all the way here. How about: x-locale-u-...? A private use tag that says the word "locale" right on it? It can't match anything real or do any harm or damage?

@macchiati
Copy link

macchiati commented Apr 21, 2020 via email

@aphillips
Copy link

As is often the case, we appear to mainly be in violent agreement?

That's true of HTTP (RFC7231). I also look at https://tools.ietf.org/html/rfc3282, which directly defines Accept-Language, where a careful reading of the ABNF makes the q weight optional also. The relevant text is:

If no Q
   values are given, the language-ranges are given in priority order,
   with the leftmost language-range being the most preferred language;
   this is an extension to the HTTP/1.1 rules, but matches current
   practice.

   If Q values are given, refer to HTTP/1.1 [RFC 2616] for the details
   on how to evaluate it.

My point is that every browser implements q (as a list of automatically generated descending values). I don't see why mandating a weight of 0.0 would be so awful? We could make it optional, e.g.:

If the user-agent includes or generates Q values, the value for the locale tag MUST be 0.0.

@macchiati
Copy link

macchiati commented Apr 21, 2020 via email

@sffc sffc added s: in progress Status: the issue has an active proposal and removed s: discuss Status: TG2 must discuss to move forward labels Apr 23, 2020
@littledan
Copy link
Member

In general, I'm really in favor of this work. It feels like it fills in some important missing platform capabilities needed to make certain kinds of "native-like" experiences. I like the idea of using an HTTP header to complement navigator.locales, to allow more computation on the server side and avoid unnecessary round-trips.

In addition to working in IETF, we might want to check whether this would have interactions in WHATWG fetch (cc @annevk). In either case, I like the idea of "incubating" early discussion here in ECMA-402, where we've brought together a lot of experts on these topics.

If we want this to be a fully general solution, we run into a number of issues (that people raised to me in the past when I was looking into navigator.locale):

  • OS preferences don't, in general, match the BCP 47 schema, or each other. They are sometimes much more expressive or less expressive, or just express different things, depending on the OS. (@jungshik raised this concern)
  • We don't have a data model to represent OS preferences in general. It's unclear whether it would make sense to extend BCP 47 further to express more kinds of preferences. There's a lot that OSes have that BCP47 doesn't. @zbraniecki has long talked about exposing Windows' user-provided pattern for datetimes, for example, but that wouldn't really fit in BCP 47. Do we want to limit navigator.locales to what can be serialized in BCP 47, or do we want it to include other things that cannot be serialized this way? Eventually, BCP 47 serialization for more and more things may reach its limits; there's a sort of tradeoff for how complex we want to make it. (@aphillips raised this concern)
  • To get the OS preferences, the browser or other JS implementation would have to reach into the operating system in particular ways. It's not clear whether we actually want browsers to do these kinds of things, with the self-imposed limitations that sandboxing gives. (@jungshik raised this concern)
  • I'm not sure how to evaluate/mitigate the fingerprinting issue; we should probably rope in some more privacy experts here before proceeding. I don't think permitting UAs to not send this information is enough of a mitigation; most people will use their default browser settings, and many browsers today care about mitigating fingerprinting. (@zbraniecki raised concerns about fingerprinting in the past, in addition to @litherum ) Note that, for fingerprinting, it might be somewhat "worse" for the preferences to be transmitted passively all the time, than if they are just available to JS (spreads the information to even more places).

To work through this, I want to suggest that we work towards exposing certain concrete settings, rather than expecting that we'll find a fully general mechanism. A strawperson: starting with just default calendar and hour cycle. Some possible advantages:

  • They have a BCP 47 serialization; no need to stretch the boundaries
  • These might be some of the most common/demanded settings to be followed (TO BE VERIFIED)
  • I think they're broadly present in OS settings (TO BE VERIFIED)
  • There are only a few options, so there aren't all that many new bits from a fingerprinting perspective (TO BE VERIFIED)
  • By enumerating the settings that we care about and focusing on getting cross-implementation support for those, we might have a better shot at compatibility across engines.

(I could see the case for adding some other things, such as first day of the week; my point is more that the set should be well-determined and something that we plan to implement broadly. And if we go down this road of sticking with the BCP 47 data model, we should be conscious of the limitations we're committing ourselves to.)

CSS added a single bit for "dark mode", adding just a single bit, and this has been widely implemented. So that experience gives me some hope that, if we scope this well, it can meet the bar.

@markusicu
Copy link

My point is that every browser implements q (as a list of automatically generated descending values). I don't see why mandating a weight of 0.0 would be so awful?

I am against q=0.

RFC 2616 section 3.9 says that "If a parameter has a quality value of 0, then content with this parameter is `not acceptable' for the client."

Section 14.4 says "If no language-range in the field matches the tag, the language quality factor assigned is 0." and "If an Accept-Language header is present, then all languages which are assigned a quality factor greater than 0 are acceptable."

Section 19.6.3: "Quality Values of zero should indicate that "I don't want something" to allow clients to refuse a representation."

The parsers that I have seen (e.g., ICU4J class LocalePriorityList) take q=0 to mean to discard the language tag. This means that the callers need not check for q>0. This would hide a mul-u-some-thing;q=0 item completely from the output of such a parser, making it hard to read this tag from the Accept-Language header. Such parsers would require an API change to preserve q=0 items in order not to break existing callers.

I am not convinced that we need any qvalue here, but if there is a good argument for it, then it should not be 0. The smallest well-formed, non-zero value would be q=0.001 .

@aphillips
Copy link

@markusicu That's a compelling argument. A few points, though.

  • If an existing caller passed a non-mul tag with q=0 the behavior (ignore/discard) would be the same either way.
  • If an existing parser drops the mul tag on the floor, that would suck for any implementation capable of consuming the mul value.
  • A new caller would have to distinguish mul from undesirable values weighted 0
  • With q=0 we reduce the risk of old consumers trying to something with the unspeakable mul

If we don't use 0, I'd favor brevity and just position the tag on the end with no q value.

@annevk
Copy link
Member

annevk commented Apr 28, 2020

Thanks @littledan, that's correct. Any kind of change to request (or response) headers needs Fetch Standard integration of some kind.

Where are navigator.locale and .locales defined?

@littledan
Copy link
Member

@annevk They're not defined; they're proposed in whatwg/html#3046 . You can see I've closed the PR due to (my perception of) lack of interest.

@sffc sffc added Proposal Larger change requiring a proposal User Preferences Related to user preferences labels Jun 5, 2020
@yoavweiss
Copy link

Apologies for the delayed response here...

  • Raises some concerns regarding fingerprinting (@litherum). However, it would in no way be required that the user agent sends this bag of user preferences, so when the user wishes to browse in incognito mode, the user agent may choose to not send this information.

I don't think that's sufficient, and browsers should not be sending that information passively to all servers.
At the same time, Client Hints (I-D, spec) can be a good way to ensure that information is available to servers that actually need it.

https://github.com/WICG/lang-client-hint is a proposal on a related front (trying to replace Accept-Language). I believe @davidben have thoughts in that area as well.

/cc @JensenPaul

@davidben
Copy link

I sketched an idea for language negotiation in https://github.com/davidben/client-language-selection but haven't had the time to pursue it yet.

@sffc
Copy link
Contributor Author

sffc commented Jun 23, 2020

Thanks for the info, @yoavweiss and @davidben!

We followed up with a phone meeting last week. Here are the notes: https://github.com/tc39/ecma402/blob/master/meetings/subcommittees/user-preferences/notes-2020-06-15.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: locale Component: locale identifiers Proposal Larger change requiring a proposal s: in progress Status: the issue has an active proposal User Preferences Related to user preferences
Projects
None yet
Development

No branches or pull requests

9 participants