Skip to content

Add HTTP header well known type and C++ UUID#297

Merged
akonradi merged 31 commits intobufbuild:masterfrom
asraa:header-valid
Feb 14, 2020
Merged

Add HTTP header well known type and C++ UUID#297
akonradi merged 31 commits intobufbuild:masterfrom
asraa:header-valid

Conversation

@asraa
Copy link
Copy Markdown
Contributor

@asraa asraa commented Dec 16, 2019

RFC 7230 says:

     field-name     = token
     field-value    = *( field-content / obs-fold )
     field-content  = field-vchar [ 1*( SP / HTAB ) field-vchar ]
     field-vchar    = VCHAR / obs-text (%x80-FF)
     obs-fold       = CRLF 1*( SP / HTAB )
                           ; obsolete line folding

The regex for header names is: "^:?[0-9a-zA-Z!#$%&'*+-.^_|~\u0060]+$". Includes the alphanums/whitelisted characters defined in the standard's token, along with the optional colon for psuedo headers that Envoy uses.

The regex for header fields is: "'^[^\u0000-\u0008\u000A-\u001F\u007F]*$", which blacklists control characters except for SPC and TAB.

Signed-off-by: Asra Ali asraa@google.com

Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Dec 17, 2019

/review @rmichela @rodaine This is ready for review, would either you be able to take a first pass? Thank you so much!!

@asraa asraa changed the title add HTTP header well known type and C++ UUID Add HTTP header well known type and C++ UUID Dec 17, 2019
@htuch htuch self-requested a review December 17, 2019 21:12
Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Dec 18, 2019

@antoniovicente

@rmichela
Copy link
Copy Markdown
Contributor

The java parts look fine. Why do we need a new validation type for http header?

@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Dec 18, 2019

For Envoy's fields like response_headers_to_add or response_headers_to_remove. Right now they're base type is a string, and there's no API level validations to enforce that they don't contain NUL characters or other characters that trip up ASSERTS in code to be valid. Originally, a regex pattern validator could be used, but since that would be a lot of repetition for the regex, Lizan and others suggested a well-known type since it's an RFC standard.

Copy link
Copy Markdown
Contributor

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm supportive of the idea of well known regex validators, providing we base on them on some universal standard and they are likely to have applicability. I think header names/values pass these criteria, so great to see this coming to PGV.
/wait

asraa added 10 commits December 23, 2019 08:46
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
Copy link
Copy Markdown
Contributor

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Other than the minor feedback, would be good to get a review from someone for Go style and someone to nitpick the regexes themselves ( haven't done this yet).

Signed-off-by: Asra Ali <asraa@google.com>
Copy link
Copy Markdown
Contributor

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@securityinsanity agree, but I think for the purpose of config validation, being stricter is fine. We primarily want this for Envoy config (rather than data plane content, which I agree might need to more permissive).

@Mythra
Copy link
Copy Markdown

Mythra commented Jan 24, 2020

I agree @htuch since envoy isn’t doing it already even more so, but if it’s easy it would definitely be nice to fix.

asraa added 2 commits January 24, 2020 09:10
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
htuch
htuch previously requested changes Jan 24, 2020
Copy link
Copy Markdown
Contributor

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, much clearer now.
/wait

Signed-off-by: Asra Ali <asraa@google.com>
@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Jan 24, 2020

In rewriting this logic for the ASF HTTP Server, I adopted the very strict treatment of all request line and header lines input which has been the behavior since 2016 with little complaint.
http://httpd.apache.org/security/vulnerabilities_24.html#CVE-2016-8743

Please note, while obs-fold was once valid, and now firmly disallowed to further guard against HTTP request/response splitting attack vectors, that implementation strictly follows original advise to unfold any obs-fold into a single space before ever storing the header line. (Actually, all white space is collapsed to single-spaces). The revised header line logic is here; note it is especially critical to disallow all whitespace in the header line for the token, prior to the acceptance of the ':' delimiter. LWS can then be ignored prior to the header value.
http://svn.apache.org/viewvc/httpd/httpd/trunk/server/protocol.c?annotate=1869223#l1073

We bumped into only one edge case of an application injecting obs-fold into the decoded http value across the entire spectrum of httpd consumers, and deemed that a defect of the application itself. I would strongly encourage the envoyproxy implementation to perform obs-fold unfolding as a distinct regex when the bytestream is decoded, and reject it from that demarcation onwards.

Note finally it is wise to guard against many other patterns in the specific header values which are prone to newline insertion via reflection attacks, e.g. %00-%1F etc.

@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Jan 24, 2020

Note finally it is wise to guard against many other patterns in the specific header values which are prone to newline insertion via reflection attacks, e.g. %00-%1F etc.

Just to clarify, in httpd we accomplished this by running all of the outbound headers through the same parser to ensure that a decoded \r or \n etc hadn't been injected from untrusted input.

@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Jan 27, 2020

Thanks @wrowe! Just want to address the points you made to make sure they're covered. The PGV validation is for config level changes (so this would apply to any Envoy configuration fields that end up as headers, for example, request_headers_to_add in some filters. These are either string fields that are used to populate header values, or [HeaderValue](https://github.com/envoyproxy/envoy/blob/ac88316892cd47b6a9b58e3736e20e8863cd0d27/api/envoy/config/core/v3/base.proto#L247)). So these won't need to be parsed to include the : delimeter, nor are they outbound headers from untrusted inputs or backends.

The revised header line logic is here; note it is especially critical to disallow all whitespace in the header line for the token, prior to the acceptance of the ':' delimiter. LWS can then be ignored prior to the header value.

In the PR the HTTP_HEADER_NAME regex consisting of tokens does not include any whitespace, only alphanum and the whitelisted characters !#$%&\'*+-.^_|~\x2C. In Envoy's codecs there is some LWS trimming done in the codecs.

I would strongly encourage the envoyproxy implementation to perform obs-fold unfolding as a distinct regex when the bytestream is decoded, and reject it from that demarcation onwards.

Knowing that it will later be rejected, I'm leaning towards disallowing TAB/SPC in the regex now. What do you think about disallowing configuration fields to add obs-fold? Again these regexes apply to Envoy's configuration fields that may populate or modify headers, and are coming from a trusted source.

httpd we accomplished this by running all of the outbound headers through the same parser to ensure that a decoded \r or \n etc hadn't been injected from untrusted input.

In Envoy, both H/1(by default but can be runtime overridden) and H/2 codecs validate header values with a check to make sure that there were no \r or \ns. envoyproxy/envoy#7306 This isn't the case for header names though, worth creating an issue?

Signed-off-by: Asra Ali <asraa@google.com>
@wrowe
Copy link
Copy Markdown
Contributor

wrowe commented Jan 27, 2020

I concur, since this is yaml, they can break value: > into multiple lines for legibility. We can simply replace all sequences of whitespace including tabs, CR, LF with single space following the RFC guideline, which would render any obs-fold into a space. But it isn't truly an obs-fold, it's a yaml fold and we want to encourage legible config.

Although I find it really unusual that a filter would use untrusted input in composing a header name, it is probably worth testing, yes.

@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Jan 28, 2020

they can break value: > into multiple lines for legibility. We can simply replace all sequences of whitespace including tabs, CR, LF with single space following the RFC guideline, which would render any obs-fold into a space

I did a little testing and YAML fold with value: >- is parsed with single spaces, eg if you made a YAML config with

headers_to_add:
  key: x-authz-header
  value: >-
      verylong
      value

The value field would be parsed as "verylong value" and that's what PGV would run the validation on.
It seems like YAML folding is already replacing CR, LF with single space.

a filter would use untrusted input in composing a header name, it is probably worth testing, yes.

Sorry if that was unclear, a filter doesn't, this was just referring to protocol codecs validating header values but not validating header names before passing to parsers

@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Jan 31, 2020

Bump on this? With summary that

@gertvdijk
Copy link
Copy Markdown
Contributor

I was skimming through this and noticed the likely unwanted addition of validate/validator.pyc binary file.

(GH doesn't allow me to make that a proper review comment.)

Signed-off-by: Asra Ali <asraa@google.com>
@asraa
Copy link
Copy Markdown
Contributor Author

asraa commented Feb 3, 2020

I was skimming through this and noticed the likely unwanted addition of validate/validator.pyc binary file.

Thanks!

Copy link
Copy Markdown
Contributor

@akonradi akonradi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, otherwise LGTM

Comment on lines +17 to +19
var unknown = ""
var httpHeaderName = "^:?[0-9a-zA-Z!#$%&'*+-.^_|~\x60]+$"
var httpHeaderValue = "^[^\u0000-\u0008\u000A-\u001F\u007F]*$"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these regular expressions are duplicated here and in the Python code. Is it possible to unify those so we can have a single source of truth?

Copy link
Copy Markdown
Contributor Author

@asraa asraa Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The python code doesn't go through the go code checkers, but possibly I could have a const file with these regexes that python parses? Or is there a way that i can define constants in a protobuf file (validate.proto)? I think default values aren't in proto3 and that's probably an issue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think defining these in a separate file is heading towards the right solution, but instead of a .proto file it can be a textproto or YAML file that we check in. The contents would be of a new message type that is basically just map<KnownRegex, string> and contains the single canonical definition of the regex for each well-known regex. Then we can refer to that during Go/C++/Java code generation and include it in the Python code (not completely clear on how, but it should be possible).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be too much for this PR, though. Up to you whether you want to do that here; if not, please open another issue to track cleaning up the tech debt.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting. I like the textproto. It would make sense to include uuid patterns / other regexes that are duped (although the uuid dupe might be able to removed in code).

I filed an issue #316 and I'm happy to work on next week (after I use this PR in envoy!)

asraa added 3 commits February 4, 2020 13:43
Signed-off-by: Asra Ali <asraa@google.com>
Signed-off-by: Asra Ali <asraa@google.com>
@rodaine rodaine dismissed htuch’s stale review February 14, 2020 20:00

Unblocking this change

@akonradi akonradi merged commit 61843ae into bufbuild:master Feb 14, 2020
hexdigest pushed a commit to hexdigest/protoc-gen-validate that referenced this pull request Mar 26, 2020
* add HTTP header well known type and C++ UUID

Signed-off-by: Asra Ali <asraa@google.com>

* update validate.pb.go

Signed-off-by: Asra Ali <asraa@google.com>

* simplify conditions

Signed-off-by: Asra Ali <asraa@google.com>

* fix java build

Signed-off-by: Asra Ali <asraa@google.com>

* bad test

Signed-off-by: Asra Ali <asraa@google.com>

* java

Signed-off-by: Asra Ali <asraa@google.com>

* check for slashes in hdr name

Signed-off-by: Asra Ali <asraa@google.com>

* add more testcases

Signed-off-by: Asra Ali <asraa@google.com>

* empty

Signed-off-by: Asra Ali <asraa@google.com>

* define regex for http  header name/value

Signed-off-by: Asra Ali <asraa@google.com>

* make patterns in to well known regex

Signed-off-by: Asra Ali <asraa@google.com>

* fix bazel

Signed-off-by: Asra Ali <asraa@google.com>

* fix bazel maven_jar defn

Signed-off-by: Asra Ali <asraa@google.com>

* remove fixes for build

Signed-off-by: Asra Ali <asraa@google.com>

* fix merge mistake

Signed-off-by: Asra Ali <asraa@google.com>

* encode in utf-8 when writing out

Signed-off-by: Asra Ali <asraa@google.com>

* cleanup

Signed-off-by: Asra Ali <asraa@google.com>

* remove unused regex definitions

Signed-off-by: Asra Ali <asraa@google.com>

* add backtick

Signed-off-by: Asra Ali <asraa@google.com>

* simplify header value regex as blacklist

Signed-off-by: Asra Ali <asraa@google.com>

* fixup to match entire string + also fixup dependency

Signed-off-by: Asra Ali <asraa@google.com>

* python regex

Signed-off-by: Asra Ali <asraa@google.com>

* remove pyc

Signed-off-by: Asra Ali <asraa@google.com>

* there's gotta be another way

Signed-off-by: Asra Ali <asraa@google.com>

* remove autosaved pgs

Signed-off-by: Asra Ali <asraa@google.com>

Co-authored-by: Alex Konradi <akonradi@google.com>
Signed-off-by: Maxim Chechel <hexdigest@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants