Add HTTP header well known type and C++ UUID#297
Conversation
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
|
The java parts look fine. Why do we need a new validation type for http header? |
|
For Envoy's fields like |
htuch
left a comment
There was a problem hiding this comment.
I'm supportive of the idea of well known regex validators, providing we base on them on some universal standard and they are likely to have applicability. I think header names/values pass these criteria, so great to see this coming to PGV.
/wait
java/pgv-java-stub/src/main/java/io/envoyproxy/pgv/StringValidation.java
Outdated
Show resolved
Hide resolved
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
htuch
left a comment
There was a problem hiding this comment.
Looks great. Other than the minor feedback, would be good to get a review from someone for Go style and someone to nitpick the regexes themselves ( haven't done this yet).
Signed-off-by: Asra Ali <asraa@google.com>
htuch
left a comment
There was a problem hiding this comment.
@securityinsanity agree, but I think for the purpose of config validation, being stricter is fine. We primarily want this for Envoy config (rather than data plane content, which I agree might need to more permissive).
|
I agree @htuch since envoy isn’t doing it already even more so, but if it’s easy it would definitely be nice to fix. |
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
htuch
left a comment
There was a problem hiding this comment.
Yep, much clearer now.
/wait
Signed-off-by: Asra Ali <asraa@google.com>
|
In rewriting this logic for the ASF HTTP Server, I adopted the very strict treatment of all request line and header lines input which has been the behavior since 2016 with little complaint. Please note, while obs-fold was once valid, and now firmly disallowed to further guard against HTTP request/response splitting attack vectors, that implementation strictly follows original advise to unfold any obs-fold into a single space before ever storing the header line. (Actually, all white space is collapsed to single-spaces). The revised header line logic is here; note it is especially critical to disallow all whitespace in the header line for the token, prior to the acceptance of the ':' delimiter. LWS can then be ignored prior to the header value. We bumped into only one edge case of an application injecting obs-fold into the decoded http value across the entire spectrum of httpd consumers, and deemed that a defect of the application itself. I would strongly encourage the envoyproxy implementation to perform obs-fold unfolding as a distinct regex when the bytestream is decoded, and reject it from that demarcation onwards. Note finally it is wise to guard against many other patterns in the specific header values which are prone to newline insertion via reflection attacks, e.g. %00-%1F etc. |
Just to clarify, in httpd we accomplished this by running all of the outbound headers through the same parser to ensure that a decoded \r or \n etc hadn't been injected from untrusted input. |
|
Thanks @wrowe! Just want to address the points you made to make sure they're covered. The PGV validation is for config level changes (so this would apply to any Envoy configuration fields that end up as headers, for example,
In the PR the
Knowing that it will later be rejected, I'm leaning towards disallowing TAB/SPC in the regex now. What do you think about disallowing configuration fields to add obs-fold? Again these regexes apply to Envoy's configuration fields that may populate or modify headers, and are coming from a trusted source.
In Envoy, both H/1(by default but can be runtime overridden) and H/2 codecs validate header values with a check to make sure that there were no \r or \ns. envoyproxy/envoy#7306 This isn't the case for header names though, worth creating an issue? |
Signed-off-by: Asra Ali <asraa@google.com>
|
I concur, since this is yaml, they can break value: > into multiple lines for legibility. We can simply replace all sequences of whitespace including tabs, CR, LF with single space following the RFC guideline, which would render any obs-fold into a space. But it isn't truly an obs-fold, it's a yaml fold and we want to encourage legible config. Although I find it really unusual that a filter would use untrusted input in composing a header name, it is probably worth testing, yes. |
I did a little testing and YAML fold with value: >- is parsed with single spaces, eg if you made a YAML config with The value field would be parsed as "verylong value" and that's what PGV would run the validation on.
Sorry if that was unclear, a filter doesn't, this was just referring to protocol codecs validating header values but not validating header names before passing to parsers |
|
Bump on this? With summary that
|
|
I was skimming through this and noticed the likely unwanted addition of |
Signed-off-by: Asra Ali <asraa@google.com>
Thanks! |
akonradi
left a comment
There was a problem hiding this comment.
Minor nits, otherwise LGTM
| var unknown = "" | ||
| var httpHeaderName = "^:?[0-9a-zA-Z!#$%&'*+-.^_|~\x60]+$" | ||
| var httpHeaderValue = "^[^\u0000-\u0008\u000A-\u001F\u007F]*$" |
There was a problem hiding this comment.
It looks like these regular expressions are duplicated here and in the Python code. Is it possible to unify those so we can have a single source of truth?
There was a problem hiding this comment.
The python code doesn't go through the go code checkers, but possibly I could have a const file with these regexes that python parses? Or is there a way that i can define constants in a protobuf file (validate.proto)? I think default values aren't in proto3 and that's probably an issue
There was a problem hiding this comment.
I think defining these in a separate file is heading towards the right solution, but instead of a .proto file it can be a textproto or YAML file that we check in. The contents would be of a new message type that is basically just map<KnownRegex, string> and contains the single canonical definition of the regex for each well-known regex. Then we can refer to that during Go/C++/Java code generation and include it in the Python code (not completely clear on how, but it should be possible).
There was a problem hiding this comment.
This might be too much for this PR, though. Up to you whether you want to do that here; if not, please open another issue to track cleaning up the tech debt.
There was a problem hiding this comment.
Oh, interesting. I like the textproto. It would make sense to include uuid patterns / other regexes that are duped (although the uuid dupe might be able to removed in code).
I filed an issue #316 and I'm happy to work on next week (after I use this PR in envoy!)
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
* add HTTP header well known type and C++ UUID Signed-off-by: Asra Ali <asraa@google.com> * update validate.pb.go Signed-off-by: Asra Ali <asraa@google.com> * simplify conditions Signed-off-by: Asra Ali <asraa@google.com> * fix java build Signed-off-by: Asra Ali <asraa@google.com> * bad test Signed-off-by: Asra Ali <asraa@google.com> * java Signed-off-by: Asra Ali <asraa@google.com> * check for slashes in hdr name Signed-off-by: Asra Ali <asraa@google.com> * add more testcases Signed-off-by: Asra Ali <asraa@google.com> * empty Signed-off-by: Asra Ali <asraa@google.com> * define regex for http header name/value Signed-off-by: Asra Ali <asraa@google.com> * make patterns in to well known regex Signed-off-by: Asra Ali <asraa@google.com> * fix bazel Signed-off-by: Asra Ali <asraa@google.com> * fix bazel maven_jar defn Signed-off-by: Asra Ali <asraa@google.com> * remove fixes for build Signed-off-by: Asra Ali <asraa@google.com> * fix merge mistake Signed-off-by: Asra Ali <asraa@google.com> * encode in utf-8 when writing out Signed-off-by: Asra Ali <asraa@google.com> * cleanup Signed-off-by: Asra Ali <asraa@google.com> * remove unused regex definitions Signed-off-by: Asra Ali <asraa@google.com> * add backtick Signed-off-by: Asra Ali <asraa@google.com> * simplify header value regex as blacklist Signed-off-by: Asra Ali <asraa@google.com> * fixup to match entire string + also fixup dependency Signed-off-by: Asra Ali <asraa@google.com> * python regex Signed-off-by: Asra Ali <asraa@google.com> * remove pyc Signed-off-by: Asra Ali <asraa@google.com> * there's gotta be another way Signed-off-by: Asra Ali <asraa@google.com> * remove autosaved pgs Signed-off-by: Asra Ali <asraa@google.com> Co-authored-by: Alex Konradi <akonradi@google.com> Signed-off-by: Maxim Chechel <hexdigest@gmail.com>
RFC 7230 says:
The regex for header names is:
"^:?[0-9a-zA-Z!#$%&'*+-.^_|~\u0060]+$". Includes the alphanums/whitelisted characters defined in the standard's token, along with the optional colon for psuedo headers that Envoy uses.The regex for header fields is:
"'^[^\u0000-\u0008\u000A-\u001F\u007F]*$", which blacklists control characters except for SPC and TAB.Signed-off-by: Asra Ali asraa@google.com