Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cookie-octet reality check #2185

Closed
bagder opened this issue Jun 29, 2022 · 32 comments
Closed

cookie-octet reality check #2185

bagder opened this issue Jun 29, 2022 · 32 comments
Labels

Comments

@bagder
Copy link

bagder commented Jun 29, 2022

Hello,

In draft-10 section 4.1.1 we see:

cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                   / %x80-FF
                     ; octets excluding CTLs,
                     ; whitespace DQUOTE, comma, semicolon,
                     ; and backslash

This means that space, comma and double-quotes for example are invalid contents in cookie values and names. Why?

In RFC 6265 the same section says:

cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                   ; US-ASCII characters excluding CTLs,
                   ; whitespace DQUOTE, comma, semicolon,
                   ; and backslash

(the difference is %x80-FF which now is explicitly allowed)

Firefox does not ignore all those and claims "parity with Chrome" on this.

If we ignore such cookies, we break compatibility with two major browsers. If we don't, we don't follow the spec.

@reschke
Copy link
Contributor

reschke commented Jun 29, 2022

99a9d4a

(so 0x80-FF are gone again)

@bagder
Copy link
Author

bagder commented Jun 29, 2022

Ah thanks for pointing that out. I think they are slightly less problematic than the more common ASCII bytes though.

@sbingler
Copy link
Collaborator

Hi @bagder,

Section 4 defines how a well behaved server should format their cookies. Unfortunately not all servers are well behaved so in the interest of compatibility UAs are encouraged to implement a more permissive syntax as mentioned in Section 5.

This section specifies the Cookie and Set-Cookie header fields in
sufficient detail that a user agent implementing these requirements
precisely can interoperate with existing servers (even those that do
not conform to the well-behaved profile described in Section 4).

A user agent could enforce more restrictions than those specified
herein (e.g., restrictions specified by its cookie policy, described
in Section 7.2). However, such additional restrictions may reduce
the likelihood that a user agent will be able to interoperate with
existing servers.

Section-5.4 goes into more detail.

This means that space, comma and double-quotes for example are invalid contents in cookie values and names. Why?

I would imagine it's due to potential parsing errors and other ambiguities, but that's just an educated guess.
Set-Cookie: foo=bar, "Set-Cookie: foo2=bar2" seems pretty foot gunny to me.

@bagder
Copy link
Author

bagder commented Jun 29, 2022

What's the point in specifying a format that isn't followed by implementations? Isn't it much smarter to align that with what is actually in use in the world so that the spec can help implementers?

@sbingler
Copy link
Collaborator

What's the point in specifying a format that isn't followed by implementations?

By which implementations? The spec exists for UAs consuming cookies as well as the servers producing them.

Note that I wasn't around when the original decision to split the formats in this manner was made, so take this all with a grain of salt. But with my knowledge of the space I believe it's because different parts of the cookie ecosystem have different goals and need somewhat different guidance to reach those goals. Generally speaking though I assume that implementers want their work to be compatible with their counterparts' implementation:

  • Are you a web-dev/server-dev looking to be as widely compatible as possible across UAs and versions? You're better off producing the more restrictive syntax to minimize the possibility that your cookie string is misparsed by any UA.

  • Are you a UA developer looking to be as widely compatible as possible across websites/servers/origins? You're better off accepting a more relaxed syntax to minimize the number of cookies you can't parse from older/misconfigured servers.

Each side, server and UA, should ideally try to align on their own format within the spec but they can't always assume their counterpart will. E.x.: Maybe a user is running some ancient browser to visit a site or maybe someone on the latest Firefox version wants to visit a site that hasn't been updated since the 90s.

(And finally, if you don't care about being widely compatible you can do whatever you want.)

@bagder
Copy link
Author

bagder commented Jun 29, 2022

The split-description is incomprehensible and just makes readers of the spec to get it wrong and not read both descriptions (I know I'm not alone in repeatedly doing this mistake). There is just one Set-Cookie: field; it has one defined format. The spec should then explain what format it has. If that format is not followed, compliant receivers should ignore it (as the spec says).

Right now, the spec specifies a way how servers should send the cookie, but none of the popular browsers care about that format and happily accept cookies not following it, because you need to read the second format for the same header to see that a receiver can be much more liberal. So the single header field has... two formats?

I care deeply about compatibility. That's why I submitted this issue. I think the current way of specifying it makes us all a disservice.

@bagder
Copy link
Author

bagder commented Jun 29, 2022

By which implementations

The very reason existing cookie receivers don't use the same strict checks as is documented for the server side is of course because many servers don't comply. Otherwise there wouldn't be a need to write them differently.

@bagder
Copy link
Author

bagder commented Jul 1, 2022

It looks like #1759 is a fix for this.

@sbingler
Copy link
Collaborator

sbingler commented Jul 1, 2022

To make sure I understand:
Is your proposal to fix this issue to make the UA parser requirements more obvious (in some way) to readers of the spec? And therefore reduce the chance that a UA will implement the server syntax accidentally?

@bagder
Copy link
Author

bagder commented Jul 1, 2022

I think the way it currently is phrased easily misleads readers of the spec. I think it should be improved to reduce that risk. I don't think there is a server-syntax and a separate client-syntax of the same header field.

And basically, since the server-syntax clearly isn't obeyed in the wild, it should be fixed to better match reality.

@sbingler
Copy link
Collaborator

sbingler commented Jul 6, 2022

So then is your proposal to remove the syntax as described currently in section 4 and replace its usage with the syntax as described in Section 5.4? I.e.: To recommend that all implementers of cookies (UA and servers alike) use the section 5.4 syntax?

@bagder
Copy link
Author

bagder commented Jul 9, 2022

Yes. In spirit with how HTTP header fields are documented/specified in other HTTP related RFCs: yes. One header field, one syntax in one place.

@sbingler
Copy link
Collaborator

I agree that having a single syntax to refer to would be ideal, unfortunately the web has not been ideal with respect to cookies with many erroneous and incomplete implementations having been released on both sides along with simply outdated versions (likely with many still in use today).

Many of the designs and considerations that have gone into the spec have been to maximize compatibility, including backwards compatibility, between servers and UAs in the real world and that has required some concessions which includes the server/UA syntax differences. ("Lax-Allowing-Unsafe" enforcement is another example.)

The chief benefit of using a single syntax appears to be ease of implementation, mainly locating the correct grammar within the document, with a minor benefit (for the servers) of a slightly expanded character set. These seem appealing but I expect there will be subtle problems arising from allowing more characters:
While current versions of Chrome, Firefox, and likely Safari all implement the more lax syntax, there are other UAs that may not. These other UAs would begin to drop cookies (with the expanded character set) but such breakage would be hidden from the server operator and likely never fixed. This is obviously bad for compatibility. These effects are dispersed and difficult to quantify but are still important to consider.

Both of the benefits of a single syntax seem to me to be minor compared to the compatibility risks. I acknowledge that there are existing servers that do not follow the stricter syntax, but I feel we should not encourage even more to do so.

I understand that a UA inadvertently implementing the server syntax is frustrating, maybe there is something we can do to reduce the chances of that happening. Perhaps a more obvious note or some other strong indication that there are different requirements for different implementors.

@bagder
Copy link
Author

bagder commented Jul 12, 2022

While current versions of Chrome, Firefox, and likely Safari all implement the more lax syntax, there are other UAs that may not

That's exactly what brought me to file this issue to begin with. The current spec combined with the current state of widely used implementations cause cookie interop problems. I argue these problems already exist and I propose a way that could work to reduce them. I cannot agree with your risk assessment.

@bagder
Copy link
Author

bagder commented Jul 12, 2022

Clearly the servers of the world don't follow the stricter specified server-side syntax since if they did, the client parser could use and insist on the same syntax and they don't: since they can't. It means servers don't adhere to that syntax. Why then document that syntax?

@sbingler
Copy link
Collaborator

I argue these problems already exist and I propose a way that could work to reduce them.

It seems we agree that there are user agents (UAs) already that will not accept cookies created with the more lax syntax, so then I do not follow how encouraging more servers to send those types of cookies will reduce problems.

Clearly the servers of the world don't follow the stricter specified server-side syntax

It means servers don't adhere to that syntax. Why then document that syntax?

Some servers do not, many do. For example, all my current github.com cookies follow the stricter server syntax.

If a server does not care about being widely compatible then they are free to do whatever they want. The spec is for servers that would like to be widely compatible across UA vendors and versions and provides a syntax to accomplish that. This includes UAs that will reject cookies created with the more lax syntax.

the client parser could use and insist on the same syntax and they don't: since they can't

They could, if the UA was not interested in being widely compatible across servers and was willing to accept the breakage that that decision incurs.

Again, the spec is interested in compatibility and provides a syntax for UAs interested in being widely compatible with many servers. Since it’s possible some servers produce cookie strings that differ from the spec’s suggested server syntax, UAs are encouraged to accept those.

The current spec combined with the current state of widely used implementations cause cookie interop problems.

The current spec, when implemented correctly, will result in minimal interoperability problems for both servers and UAs. Compatibility for each side means something slightly different and so their requirements differ accordingly.

Whether or not it's easy to implement the spec correctly is a different question.

Cookies are messy and come with plenty of historical baggage. Forcing a single syntax will result in breakage and problems for one side or another.

@bagder
Copy link
Author

bagder commented Jul 13, 2022

Forcing a single syntax will result in breakage and problems for one side or another.

I strongly disagree. The client-side "lenient" syntax is already what dictates how cookies work on the web.

@sbingler
Copy link
Collaborator

Your proposal removes compatibility with existing UAs that reject cookies made with the lax syntax. What benefits do you see that outweigh this loss of compatibility?

@bagder
Copy link
Author

bagder commented Jul 13, 2022

Please be specific and tell me exactly what these UAs are that you speak of?

The existing problem with the way the current spec is written, is that existing servers send cookies that contain octets that are fine according to the client-syntax (but not according to the server-syntax). The client-syntax that by the way is used by all the major browsers for example.

If we want to write a cookie using HTTP client that interops and works with the same set of cookies that the browsers support, we need to mimic what they accept. That's interop.

You seem to speak of clients that explicitly decide to not interop with existing current widely deployed cookie clients and servers. Well, such clients already decided to not interop so they might just as well continue working the way they do. I don't see how us fixing the spec will hurt them. They already decided to run their own race.

@RyanTheOptimist
Copy link
Contributor

Assuming I'm following this discussion correctly, I think I agree with @bagder. The cookies in the wild conform to the format accepted by major browsers. It seems like the right thing to do is specify that format as the standard. If there are clients that today do not accept cookies that are currently in use in the web, then they are already not interoperable.

@sbingler
Copy link
Collaborator

I have no data indicating which/how many UAs would be adversely affected by this nor which/how many servers are already producing cookies that don’t match the stricter syntax.
I’m also not aware of anyone else that possesses this data. And that’s why I’d rather not steam forward with this change, we don’t know the scope of the impact.

While some servers certainly send spec non-compliant cookies I’m not aware of any indication that this happens a majority of the time nor from a majority of servers, is it really so common to be considered a de facto standard? Perhaps these are the non-interoperable entities that we shouldn’t be encouraging. I’ve checked a number of my open tabs and haven’t found any non-compliant cookies, it seems all of them conform to the stricter server syntax. Any data to this effect would be helpful (possibly browsers could collect it via metrics, it seems straightforward enough).

At the moment though, maybe this change will affect zero users, maybe it’ll affect millions, we don’t know; there are a lot of smaller UAs in the wild. So what is the benefit of this change that makes taking that risk worthwhile? Are servers deprived of something they need that this’ll remedy? Will this provide some security improvement? Something else? I’m unclear what the benefits are that make this appealing in the face of potentially breaking users.

The spec isn’t unfamiliar with painful changes that are for the better, samesite by default is a great example of this. But that change had clear benefits that made the pain it caused worthwhile. What is this proposal providing?

In addition to the possible UA breakage, the promotion of the lax syntax will explicitly allow foot gun behavior that UAs attempt to deal with. This includes the likes of Set-Cookie: =foo=bar, Set-Cookie: foo=bar, "Set-Cookie: foo2=bar2", Set-Cookie: baz, and Set-Cookie: Æ=╬. This feels ripe for future problems.

Servers also benefit from abiding by the stricter syntax. Doing so vastly increases the chances the cookies they receive back from different UAs are identical and what the server intended. The UA syntax covers a lot of corner cases and depending on the UA’s implementation it’s possible for subtle differences and bugs to work their way in which could result in Browse A’s cookie not matching browser B’s and/or not matching what the server wanted in the first place. This has been a problem as recently as the past year.

All of this is why I’m erring on the side of caution and saying that we should keep the syntaxes as is, for now at least.


You seem to speak of clients that explicitly decide to not interop with existing current widely deployed cookie clients and servers.

My statements were intended for UAs that attempted to implement the spec but are perhaps out of date, have a bugged implementation, or whatever else good faith difference which causes them to reject some or all cookies not created with the stricter server syntax.

@bagder
Copy link
Author

bagder commented Jul 17, 2022

I maintain that my proposal doesn't have any bad impact at all. It helps cookie clients interop better. I maintain the opposite: the current way of writing the spec is hard to read and makes cookie interop harder to accomplish than it needs to be.

Cookie clients can already and always opt to drop/discard any cookie at any time, so clients that chose to be more picky and to use stricter checks are fine to do so and if they do that today they already opted to not interoperate with several major cookie clients - and that is fine. But it also indicates that a modified specification won't break anything since such non-interop clients already decided that interop is not top prio and they already decide to not work with for examples the popular browsers.

I think a main purpose for this spec is to document how to interoperate. Help for example newcomers in the field how they should write their parsers.

In addition to the possible UA breakage, the promotion of the lax syntax will explicitly allow foot gun behavior that UAs attempt to deal with

That seems utterly strange to me. If you are concerned about those variations, then why document that clients can allow them and how come the major browsers allow them? Shouldn't then both servers and clients use the stricter character set? The stricter character set that isn't what is used by most implementers

I realize we're only going in circles here.

@mnot
Copy link
Member

mnot commented Jul 17, 2022

An alternative would be to have a single syntax, but to use obs-text along with some prose explaining the implications for senders and receivers.

@RyanTheOptimist
Copy link
Contributor

RyanTheOptimist commented Jul 18, 2022 via email

@mnot
Copy link
Member

mnot commented Jul 18, 2022

If we go that way, it'll impact #1399 / #1759.

@sbingler
Copy link
Collaborator

sbingler commented Jul 18, 2022

The reasons I’ve seen so far to support a single syntax have presupposed two important factors:

  1. the more lax syntax is a defacto standard for servers that sees wide usage
  2. Any UAs that would reject some/all of this syntax are knowingly and purposefully doing so.

I’m unaware of any data supporting either claim and do not understand why we should make those assumptions. If we are wrong then it's ultimately the users that will feel the pain.

While I support the effort for an easier to read spec, I don’t understand the desire for a single syntax as an end goal.
Other parser specs such as HTML and CSS encourage a more lenient interpretation of their respective syntaxes by UAs, so cookies are not alone in this. This behavior arises from an understanding that the real world is messy, historical baggage exists, people make mistakes, and that, as agents of the user, UAs are better off accepting something slightly wrong than breaking the content the user wants. Newly encouraging that somewhat wrong behavior as “spec compliant” is just asking for problems, especially if doing so isn’t unlocking some sort of benefit.

the current way of writing the spec is hard to read

This is something I’m very interested in remedying and I believe it can be done so in a way that has zero risk of breaking users. A more prominent warning in section 4.1.1 indicating that UAs have a different set of requirements for example. We can almost certainly make the spec less confusing to implementors without materially changing it.

Mnot’s suggestion seems interesting, particularly the obs- notation, and I think it’s an improvement to the current proposal. However, given that implementors have previously run into problems by not fully reading the relevant sections I’m not entirely confident that extra prose will be enough to discourage someone from blindly implementing the grammar, although maybe its proximity will help. I also wonder if the additional obs- notation will make the grammar unwieldy, making it harder for both servers and UA to implement correctly. In either case this option still runs the risk of breaking users so I suggest a different tactic such as above.

I think a main purpose for this spec is to document how to interoperate.

I agree it’s important, noting that we want this interoperability so that the system is useful to our users. It follows then that we shouldn’t do anything that risks harming our users’ experience unless we have a good reason to.

So without any benefits to changing the syntax we're much better off editing the spec for clarity.

@mnot
Copy link
Member

mnot commented Jul 19, 2022

Perhaps it'd be good to discuss this in the meeting...

@bagder
Copy link
Author

bagder commented Jul 19, 2022

2. I’m unaware of any data supporting either claim

I presume that the "big browsers" are pretty big cookie consumers in the world. I would guess they handle more cookies than most other user agents. They are most liberal in what they accept. If we want to interop with them, we need to do the same.

Why are browsers this relaxed on what octets they accept? Presumably because of how browsers always work: there are some servers somewhere in use that would make things break if they would go stricter. And they also need to interop with the other major browsers that have the same laxed parsers.

I don't think anyone, except perhaps one of the browser vendors, can produce data/numbers for cookie use. And we already know how they parse cookies.

@sbingler
Copy link
Collaborator

@mnot
That seems like a good idea, IETF 114 has only allocated 10 minutes for cookie related discussion so this'll probably need to be a breakout session. Unless you're referring to a different meeting?

@mnot
Copy link
Member

mnot commented Jul 21, 2022

Yes, it's going to be a tight meeting. You could either do a side meeting, or wait for the interim-in-planning.

@mnot
Copy link
Member

mnot commented Aug 17, 2022

As far as I can tell, this issue is a duplicate of #1399, in that it's about the difference in the server-side vs. client-side grammar, as well as how it's documented.

Closing in favour of that; @bagder if there's an aspect that doesn't cover, please say so.

@mnot mnot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 17, 2022
@bagder
Copy link
Author

bagder commented Aug 17, 2022

I think it's correct to close this as a dupe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

5 participants