RFC 6265bis: Set-Cookie parsing algorithm should enforce more of the syntax requirements #1399

chlily1 · 2021-02-04T22:07:59Z

There is an ABNF syntax given for the Set-Cookie header in section 4.4.1, however many aspects of this are not enforced in the actual parsing algorithm in section 5.3.

The syntax given in 4.4.1 is only a "SHOULD", not a hard requirement for the server:

Servers SHOULD NOT send Set-Cookie headers that fail to conform to
   the following grammar:

There is an explicit caveat about this in section 5.3:

   NOTE: The algorithm below is more permissive than the grammar in
   Section 4.1.  For example, the algorithm strips leading and trailing
   whitespace from the cookie name and value (but maintains internal
   whitespace), whereas the grammar in Section 4.1 forbids whitespace in
   these positions.  User agents use this algorithm so as to
   interoperate with servers that do not follow the recommendations in
   Section 4.

The algorithm in 5.3 should enforce more of the requirements given in the recommended syntax, at the very least the requirements on validity of characters/octets (whitespace, control characters, etc.), and specify the intended behavior in these cases. (For context and examples of such discrepancies, see here and here.)

The text was updated successfully, but these errors were encountered:

…1420) Addresses #1399

recvfrom · 2021-08-16T18:01:25Z

To call out some of the current differences between the ABNF section and the parsing algorithm (not an exhaustive list - just the ones I've happened to notice):

cookie-pair       = cookie-name BWS "=" BWS cookie-value
cookie-name       = 1*cookie-octet
cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )

cookie-name can be empty
cookie-pair doesn't have to include the '=' (Ex: cookie with empty name, non-empty cookie value)
Double quotes surrounding cookie-octet in cookie-value are not removed and will be a part of the cookie-value if present

cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                    / %x80-FF
                      ; octets excluding CTLs,
                      ; whitespace DQUOTE, comma, semicolon,
                      ; and backslash

The character set used by cookie-name is %x09 / %x20-3A / %x3C / %x3E-7E / %x80-FF (all octets except for non-whitespace CTLs, ;, and =)
The character set used by cookie-value is %x09 / %x20-3A / %x3C-7E / %x80-FF (all octets except for non-whitespace CTLs and ;)
Whitespace gets trimmed from the beginning and end

max-age-av        = "Max-Age" BWS "=" BWS non-zero-digit *DIGIT
                      ; In practice, both expires-av and max-age-av
                      ; are limited to dates representable by the
                      ; user agent.
non-zero-digit    = %x31-39
                      ; digits 1 through 9

The first digit can be 0 or a negative sign

domain-value      = <subdomain>
                      ; defined in [RFC1034], Section 3.5, as
                      ; enhanced by [RFC1123], Section 2.1

The RFCs for domains indicate that the domain name should not exceed 253 characters, and also indicates that labels (the pieces separated by dots in a domain) should not be more than 63 characters each. The parsing algorithm doesn't indicate that these restrictions must be enforced, but instead applies the 1024 octet limit for attribute values as an upper bound on the length of domain-value.

extension-av      = *av-octet
av-octet          = %x20-3A / %x3C-7E
                      ; any CHAR except CTLs or ";"

the av-octet character set is %x09 / %x20-3A / %x3C-7E / %x80-FF (all octets except for non-whitespace CTLs and ;)

recvfrom · 2021-09-30T20:51:29Z

I've observed multiple instances where the grammar from 4.4.1 is copied into discussions/code related to cookie parsing and used as if it was the syntax that all cookies MUST adhere to. To mitigate this confusion, I think we should provide a grammar for section 5.3 either in addition to or in place of the one in 4.4.1.

Does anyone have any thoughts about whether there's still value in having the separate, stricter grammar for Set-Cookie headers sent by servers?

miketaylr · 2021-09-30T22:09:14Z

Let's make sure anything changing here is well tested and matches reality.

…ing algorithm

kailuowang · 2022-03-03T14:29:06Z

Just noticed another discrepancy in section 4.4.1:

 cookie-octet      = %x21 / %x23-2B / %x2D-3A / %x3C-5B / %x5D-7E
                       / %x80-FF
                         ; octets excluding CTLs,
                         ; whitespace DQUOTE, comma, semicolon,
                         ; and backslash

%x7F is also excluded but not mentioned in the comment: It's the question mark character ?

annevk · 2022-04-29T08:13:54Z

I hadn't seen this issue before, but it's quite common to have strict requirements for producers and loose-but-still-strict requirements for consumers. E.g., CSS and HTML do something similar.

https://crbug.com/1174647 seems like a solid reason to make changes to the parser, but I think that's a separable concern from what this issue seems to raise as a larger issue, which I don't think is an issue at all.

mnot · 2022-08-17T00:53:39Z

Stepping back, I see two aspects to this issue:

a. Whether there should be differences in the producer and consumer grammars, and
b. If so, how to document them.

It seems like we've come to agreement here that there can be such differences. Not only do we see it in existing specs (e.g., Cookies so far, HTTP with productions like OWS, Structured Fields), it's inevitable - ABNF cannot capture all of the constraints that prose (or an algorithm) can express.

(b) is largely an editorial issue about how to capture the delta between the producer grammer and the parsing algorithm. As such, the editors should decide, taking into account input from the WG as they see fit.

Personally: I don't see how having two separate ABNF grammars would help -- it's just as likely to cause confusion, and there's a possilbity of introducing yet more errors. I think it'd be fine if the editors added a note to the top of 4.1.1 noting the relationship with 5.4, much as they already do in the other direction.

mnot · 2022-09-01T00:24:16Z

As per above, marking as editorial.

chlily1 mentioned this issue Feb 4, 2021

rfc6265bis UA requirements and WPT mismatch #1136

Closed

19 tasks

chlily1 added the 6265bis label Feb 11, 2021

chlily1 self-assigned this Feb 23, 2021

chlily1 added a commit that referenced this issue Mar 30, 2021

RFC 6265bis: Specify handling of CTL characters in set-cookie-string (#…

f02b5ff

…1420) Addresses #1399

miketaylr assigned recvfrom and unassigned chlily1 Aug 31, 2021

recvfrom added a commit to recvfrom/http-extensions that referenced this issue Oct 19, 2021

Fix httpwg#1399 - RFC6265bis: Add a grammar corresponding to the pars…

11d8daf

…ing algorithm

recvfrom added a commit to recvfrom/http-extensions that referenced this issue Oct 19, 2021

Fix httpwg#1399 - Add a grammar corresponding to the parsing algorithm

54547c4

recvfrom added a commit to recvfrom/http-extensions that referenced this issue Nov 1, 2021

Fix httpwg#1399 - Add a grammar corresponding to the parsing algorithm

d6d071b

kailuowang mentioned this issue Mar 3, 2022

Further relax cookie parsing? http4s/http4s#6067

Closed

mnot mentioned this issue Jul 18, 2022

cookie-octet reality check #2185

Closed

mnot added the editorial label Sep 1, 2022

sbingler added the defer label Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC 6265bis: Set-Cookie parsing algorithm should enforce more of the syntax requirements #1399

RFC 6265bis: Set-Cookie parsing algorithm should enforce more of the syntax requirements #1399

chlily1 commented Feb 4, 2021

recvfrom commented Aug 16, 2021 •

edited

Loading

recvfrom commented Sep 30, 2021

miketaylr commented Sep 30, 2021

kailuowang commented Mar 3, 2022 •

edited

Loading

annevk commented Apr 29, 2022

mnot commented Aug 17, 2022

mnot commented Sep 1, 2022

RFC 6265bis: Set-Cookie parsing algorithm should enforce more of the syntax requirements #1399

RFC 6265bis: Set-Cookie parsing algorithm should enforce more of the syntax requirements #1399

Comments

chlily1 commented Feb 4, 2021

recvfrom commented Aug 16, 2021 • edited Loading

recvfrom commented Sep 30, 2021

miketaylr commented Sep 30, 2021

kailuowang commented Mar 3, 2022 • edited Loading

annevk commented Apr 29, 2022

mnot commented Aug 17, 2022

mnot commented Sep 1, 2022

recvfrom commented Aug 16, 2021 •

edited

Loading

kailuowang commented Mar 3, 2022 •

edited

Loading