Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC 6265bis: Set-Cookie parsing algorithm should enforce more of the syntax requirements #1399

Open
chlily1 opened this issue Feb 4, 2021 · 8 comments

Comments

@chlily1
Copy link
Contributor

chlily1 commented Feb 4, 2021

There is an ABNF syntax given for the Set-Cookie header in section 4.4.1, however many aspects of this are not enforced in the actual parsing algorithm in section 5.3.

The syntax given in 4.4.1 is only a "SHOULD", not a hard requirement for the server:

Servers SHOULD NOT send Set-Cookie headers that fail to conform to
   the following grammar:

There is an explicit caveat about this in section 5.3:

   NOTE: The algorithm below is more permissive than the grammar in
   Section 4.1.  For example, the algorithm strips leading and trailing
   whitespace from the cookie name and value (but maintains internal
   whitespace), whereas the grammar in Section 4.1 forbids whitespace in
   these positions.  User agents use this algorithm so as to
   interoperate with servers that do not follow the recommendations in
   Section 4.

The algorithm in 5.3 should enforce more of the requirements given in the recommended syntax, at the very least the requirements on validity of characters/octets (whitespace, control characters, etc.), and specify the intended behavior in these cases. (For context and examples of such discrepancies, see here and here.)

@recvfrom
Copy link
Contributor

recvfrom commented Aug 16, 2021

To call out some of the current differences between the ABNF section and the parsing algorithm (not an exhaustive list - just the ones I've happened to notice):

cookie-pair       = cookie-name BWS "=" BWS cookie-value
cookie-name       = 1*cookie-octet
cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )
  • cookie-name can be empty
  • cookie-pair doesn't have to include the '=' (Ex: cookie with empty name, non-empty cookie value)
  • Double quotes surrounding cookie-octet in cookie-value are not removed and will be a part of the cookie-value if present
cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                    / %x80-FF
                      ; octets excluding CTLs,
                      ; whitespace DQUOTE, comma, semicolon,
                      ; and backslash
  • The character set used by cookie-name is %x09 / %x20-3A / %x3C / %x3E-7E / %x80-FF (all octets except for non-whitespace CTLs, ;, and =)
  • The character set used by cookie-value is %x09 / %x20-3A / %x3C-7E / %x80-FF (all octets except for non-whitespace CTLs and ;)
  • Whitespace gets trimmed from the beginning and end
max-age-av        = "Max-Age" BWS "=" BWS non-zero-digit *DIGIT
                      ; In practice, both expires-av and max-age-av
                      ; are limited to dates representable by the
                      ; user agent.
non-zero-digit    = %x31-39
                      ; digits 1 through 9
  • The first digit can be 0 or a negative sign
domain-value      = <subdomain>
                      ; defined in [RFC1034], Section 3.5, as
                      ; enhanced by [RFC1123], Section 2.1
  • The RFCs for domains indicate that the domain name should not exceed 253 characters, and also indicates that labels (the pieces separated by dots in a domain) should not be more than 63 characters each. The parsing algorithm doesn't indicate that these restrictions must be enforced, but instead applies the 1024 octet limit for attribute values as an upper bound on the length of domain-value.
extension-av      = *av-octet
av-octet          = %x20-3A / %x3C-7E
                      ; any CHAR except CTLs or ";"
  • the av-octet character set is %x09 / %x20-3A / %x3C-7E / %x80-FF (all octets except for non-whitespace CTLs and ;)

@miketaylr miketaylr assigned recvfrom and unassigned chlily1 Aug 31, 2021
@recvfrom
Copy link
Contributor

I've observed multiple instances where the grammar from 4.4.1 is copied into discussions/code related to cookie parsing and used as if it was the syntax that all cookies MUST adhere to. To mitigate this confusion, I think we should provide a grammar for section 5.3 either in addition to or in place of the one in 4.4.1.

Does anyone have any thoughts about whether there's still value in having the separate, stricter grammar for Set-Cookie headers sent by servers?

@miketaylr
Copy link
Collaborator

Let's make sure anything changing here is well tested and matches reality.

recvfrom added a commit to recvfrom/http-extensions that referenced this issue Oct 19, 2021
recvfrom added a commit to recvfrom/http-extensions that referenced this issue Oct 19, 2021
recvfrom added a commit to recvfrom/http-extensions that referenced this issue Nov 1, 2021
@kailuowang
Copy link

kailuowang commented Mar 3, 2022

Just noticed another discrepancy in section 4.4.1:

 cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                       / %x80-FF
                         ; octets excluding CTLs,
                         ; whitespace DQUOTE, comma, semicolon,
                         ; and backslash

%x7F is also excluded but not mentioned in the comment: It's the question mark character ?

@annevk
Copy link

annevk commented Apr 29, 2022

I hadn't seen this issue before, but it's quite common to have strict requirements for producers and loose-but-still-strict requirements for consumers. E.g., CSS and HTML do something similar.

https://crbug.com/1174647 seems like a solid reason to make changes to the parser, but I think that's a separable concern from what this issue seems to raise as a larger issue, which I don't think is an issue at all.

@mnot
Copy link
Member

mnot commented Aug 17, 2022

Stepping back, I see two aspects to this issue:

a. Whether there should be differences in the producer and consumer grammars, and
b. If so, how to document them.

It seems like we've come to agreement here that there can be such differences. Not only do we see it in existing specs (e.g., Cookies so far, HTTP with productions like OWS, Structured Fields), it's inevitable - ABNF cannot capture all of the constraints that prose (or an algorithm) can express.

(b) is largely an editorial issue about how to capture the delta between the producer grammer and the parsing algorithm. As such, the editors should decide, taking into account input from the WG as they see fit.

Personally: I don't see how having two separate ABNF grammars would help -- it's just as likely to cause confusion, and there's a possilbity of introducing yet more errors. I think it'd be fine if the editors added a note to the top of 4.1.1 noting the relationship with 5.4, much as they already do in the other direction.

@mnot
Copy link
Member

mnot commented Sep 1, 2022

As per above, marking as editorial.

@mnot mnot added the editorial label Sep 1, 2022
@sbingler sbingler added the defer label Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

8 participants
@miketaylr @mnot @kailuowang @annevk @sbingler @recvfrom @chlily1 and others