Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store header char check too strict? #47479

Closed
ronag opened this issue Apr 8, 2023 · 11 comments
Closed

store header char check too strict? #47479

ronag opened this issue Apr 8, 2023 · 11 comments
Labels
http Issues or PRs related to the http subsystem.

Comments

@ronag
Copy link
Member

ronag commented Apr 8, 2023

Not sure what is correct here but a test in undici started to fail in node 19 with:

Invalid character in header content ["content-disposition"]

With:

    res.writeHead(200, {
      'content-length': 2,
      'content-disposition': "attachment; filename='år.pdf'"
    })

I don't think å should cause problems?

@ronag ronag added the http Issues or PRs related to the http subsystem. label Apr 8, 2023
@ronag
Copy link
Member Author

ronag commented Apr 8, 2023

@nodejs/http

@ronag
Copy link
Member Author

ronag commented Apr 8, 2023

I just noticed nodejs/undici#2020 but it seems like no one has a solution. Help would be appreciated.

@mcollina
Copy link
Member

mcollina commented Apr 8, 2023

I think this is a change we might want to revert.

cc @nodejs/tsc

@targos
Copy link
Member

targos commented Apr 8, 2023

Which change?

@marco-ippolito
Copy link
Member

I think we should revert this #46528 and rethink a better solution, the problem that this pr tried to solve is smaller than the one it created

@bnoordhuis
Copy link
Member

https://httpwg.org/specs/rfc9110.html#fields.values still says:

Field values are usually constrained to the range of US-ASCII characters. Fields needing a greater range of characters can use an encoding, such as the one defined in RFC8187.

Thus technically node is in the clear. The spec also says this:

Historically, HTTP allowed field content with text in the ISO-8859-1 charset

But the å in the example is UTF-8, not ISO-8859-1. The spec then goes on to say:

Specifications for newly defined fields SHOULD limit their values to visible US-ASCII octets (VCHAR), SP, and HTAB. A recipient SHOULD treat other allowed octets in field content (i.e., obs-text) as opaque data.

So I suppose we could transmit the character as C3 A5 (UTF-8) and well-behaved recipients will accept it but then it's ambiguous how the recipient decodes it. Probably as ISO-8859-1.

We could also transcode from UTF-8 to ISO-8859-1 but that's lossy and it still doesn't really resolve the ambiguity.

RFC8187 defines a configurable character encoding scheme for HTTP header values but literally nothing supports it AFAIK so yeah, good idea but not very practical.

@ShogunPanda
Copy link
Contributor

I tend to agree with Ben: if the spec allows that, we allow it as well.
The biggest issue I see is we don't technically support RFC 9110 but, for instance, llhttp looks at RFC 7230. So if we start this trend we might end up having conflicts. What shall we do?

@bnoordhuis
Copy link
Member

Newer RFCs generally obsolete older ones so, unless there's very good reason not to, node should follow the newer one.

@ShogunPanda
Copy link
Contributor

I absolutely agree. I'm also willing to follow the newer RFC.
I'm just wondering if we're gonna introduce conflicts if we slowly migrate to 9110.

@mcollina
Copy link
Member

mcollina commented May 8, 2023

Should we close this then?

@marco-ippolito
Copy link
Member

closing this as we already implement the latest rfc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem.
Projects
None yet
Development

No branches or pull requests

6 participants