router: add config to reject requests with invalid envoy headers#7323
Conversation
Description: Before this change, Envoy would silently ignore the `x-envoy-*` header if a client specifies an invalid value for this header (e.g. `x-envoy-max-retries: 3.0`). Introduce a `strict_check_headers` config option for `envoy.router` that adds optional support to reject requests with invalid values for the following headers: - x-envoy-upstream-rq-timeout-ms - x-envoy-upstream-rq-per-try-timeout-ms - x-envoy-max-retries - x-envoy-retry-on - x-envoy-retry-grpc-on On rejection, Envoy responds with HTTP status 400 and sets a new response flag `IH` to indicate the reason was due to an invalid header. Risk Level: Low/medium Testing: unit tests - unit test: `FilterUtility::StrictHeaderChecker` - test that router rejects request with HTTP status 400 + setting the `IH` response flag - test that config validation rejects unsupported values - manual end-to-end test `client -> envoy -> upstream server` to verify that Envoy returns a 400 and sets the response flag in the logs Docs Changes: - add inline docs to `router.proto` for `strict_check_headers` - add inline docs to `accesslog.proto` for `IH` response flag Release Notes: updated for router and accesslog Fixes envoyproxy#6482 Signed-off-by: Xiao Yu <xyu@stripe.com>
|
(6/20)
Test failure output: |
|
/retest |
|
🔨 rebuilding |
|
/retest |
|
🔨 rebuilding |
snowp
left a comment
There was a problem hiding this comment.
Thanks, this seems like a reasonable contribution.
I think you got this answered in Slack but it seems like this change caused an increase the amount of memory used per cluster, so we'll want to understand where that's coming from and ideally avoid this increase unless this feature is explicitly enabled.
| UpstreamRetryLimitExceeded = 0x8000, | ||
| // Request hit the stream idle timeout, triggering a 408. | ||
| StreamIdleTimeout = 0x10000, | ||
| // Request specified headers that were invalid under strict header checking |
There was a problem hiding this comment.
nit: end comments with period
source/common/json/config_schemas.cc
Outdated
| "properties" : { | ||
| "dynamic_stats" : {"type" : "boolean"}, | ||
| "start_child_span" : {"type" : "boolean"} | ||
| "start_child_span" : {"type" : "boolean"}, |
There was a problem hiding this comment.
the JSON API is deprecated, no need to add anything here
There was a problem hiding this comment.
Oh that's great to know! Will remove this.
Curious where I could have found this -- I don't think I saw anything in the release notes about the JSON API.
There was a problem hiding this comment.
This is used for the V1 JSON conversion, V1 was deprecated in 1.9: https://github.com/envoyproxy/envoy/blob/master/docs/root/intro/version_history.rst#190-dec-20-2018
The V2 JSON api is just using the protobuf provided utilities for serialization, so this code isn't needed anymore
source/common/router/router.h
Outdated
| const Http::LowerCaseString& target_header); | ||
|
|
||
| private: | ||
| static HeaderCheckResult has_valid_retry_fields(Http::HeaderEntry* header_entry, |
There was a problem hiding this comment.
camelCase for function names
source/common/router/router.cc
Outdated
| FilterUtility::StrictHeaderChecker::test(Http::HeaderMap& headers, | ||
| const Http::LowerCaseString& target_header) { | ||
| HeaderCheckResult r; | ||
| auto all = Http::Headers::get(); |
There was a problem hiding this comment.
I would just inline the occurrences of Http::Headers here, the indirection is making this a bit harder to read imo (and we're making an extraneous copy here).
source/common/router/router.cc
Outdated
|
|
||
| if (!config_.strict_check_headers_.empty()) { | ||
| for (const auto& header : config_.strict_check_headers_) { | ||
| auto res = FilterUtility::StrictHeaderChecker::test(headers, header); |
There was a problem hiding this comment.
nit: const where possible (here and for the body below)
source/common/router/router.h
Outdated
| uint32_t (*parseFn)(absl::string_view)) { | ||
| HeaderCheckResult r; | ||
| if (header_entry) { | ||
| auto flags = parseFn(header_entry->value().getStringView()); |
source/common/router/router.h
Outdated
|
|
||
| private: | ||
| static HeaderCheckResult has_valid_retry_fields(Http::HeaderEntry* header_entry, | ||
| uint32_t (*parseFn)(absl::string_view)) { |
There was a problem hiding this comment.
using std::function tends to make passing functions around more readable
There was a problem hiding this comment.
How about the following:
using ParseRetryFlagsFunc = std::function<uint32_t(absl::string_view header)>;
I don't love the name, so would be open to consider other suggestions!
There was a problem hiding this comment.
@snowp Thanks for your feedback! I have a few follow-up questions, but am good to re-push after I hear from you.
Re: MemoryLargeClusterSizeWithStats test failure indicating a cluster size increase by 64 bytes...
It seems that the culprit is adding a new std::vector<Http::LowerCaseString> member to Envoy::Router::FilterConfig
Per @bobby-stripe and @jmarantz's suggestion: I made EXPECT_MEMORY_EQ log the value in the skipped and non-skipped cases and verified m_per_cluster locally on macosx:
Notes:
* "baseline" : before this PR
* "char[N]" : add a `char buf_[N]` member to Envoy::FilterConfig
* "+value" : change in consumed bytes compared to baseline
baseline char[21] char[22] PR#7323
-------- -------- -------- -----------
ipv4: 49391 +0 +64 49453 (+64)
ipv6: 49572 +0 +64 49636 (+64)
How might we move forward with this? Should we move this state into the StrictHeaderChecker ctor and only instantiate one if the config.strict_check_headers() is non-empty? Open to suggestions on clean ways to approach this.
|
I think one thing you could try is to use |
- JSON API is deprecated: remove `strict_check_headers` from
`source/common/json/config_schemas.cc` and related test
- docs: document the specific demands of strict header validation
- router.{cc, h}:
- `const` where applicable
- use `std::function`
- inline `Http::Headers::get()`
Signed-off-by: Xiao Yu <xyu@stripe.com>
… + assert - clear RETRY_UNKNOWN_FIELD_ERROR when parsing `retry_on` config field - `StrictHeaderChecker::test` assert failure if target header is unsupported Signed-off-by: Xiao Yu <xyu@stripe.com>
…ithStats - Use std::unique_ptr for Envoy::Router::FilterConfig::strict_check_headers_ and only allocate it when we parse values from the config. Signed-off-by: Xiao Yu <xyu@stripe.com>
Failing error: [readability-container-size-empty,-warnings-as-errors] - source/common/router/router.h:163:9: error: the 'empty' method should be used to check for emptiness instead of 'size' - call strict_check_headers.empty() rather than Protobuf::RepeatedPtrField::size() Signed-off-by: Xiao Yu <xyu@stripe.com>
|
@snowp Thanks for your feedback -- repushed to address your feedback. This latest push makes You had suggested:
I think the default decision is whether to keep
|
|
Another option might be to store the strict headers in a bitmap (similar to how the retry flags are tracked). That way we're tracking which flags are set with minimal memory overhead (no redundant string copies). It does come at some code complexity, but it might be the right thing to avoid storing a new list of strings per router? |
I don’t feel strongly about this implementation choice and am happy to do whatever makes the most sense. I don’t think I have enough context to make this judgment call though — do you think it would make sense to tag someone who could help us tie-break on approach here? I'm glad the memory consumption test is serving its purpose of helping us consciously think about memory footprint and to consider alternatives here! |
|
The memory increase is making this change somewhat involved, so just trying to get some consensus on approach before investing too much time into one solution: For context: the PR originally stored the headers to sanitize as a To solve this two options have been suggested:
@mattklein123 @jmarantz thoughts here? |
|
IMO (2) sounds good to me, but (1) is easy and also fine if we want to start there with a TODO. |
Since the feature is off by default, I propose we start with (1) + add a TODO comment. Once this PR is merged, I can make a small follow-up PR to optimize the representation via (2). WDYT? @snowp @mattklein123 |
|
👍 |
|
re unique_ptr vs bitset...there are less than 10 supported headers, so I like the idea of a bitset (or maybe even a uint8_t and some bit-mask constants is likely enough). But am also fine to check in the unique_ptr<vector> and iterate. |
…bitset - add TODO comment - alias the unique_ptr type Signed-off-by: Xiao Yu <xyu@stripe.com>
|
@snowp Re-pushed to add a TODO comment to change to a bitset (will do in next PR). |
snowp
left a comment
There was a problem hiding this comment.
Thanks for iterating on this!
| @@ -1,4 +1,4 @@ | |||
| load("@envoy_api//bazel:api_build_system.bzl", "api_proto_library_internal") | |||
| load("@envoy_api//bazel:api_build_system.bzl", "api_go_proto_library", "api_proto_library_internal") | |||
There was a problem hiding this comment.
Added this because we want to validate values set for strict_check_headers in the config in api/envoy/config/filter/http/router/v2/router.proto#L14, which requires inclusion of com_envoyproxy_protoc_gen_validate/validate/validate.proto.
There was a problem hiding this comment.
I'm no bazel expert but I thought this was only necessary if you directly referenced the api_go_proto_library rule in this BUILD file which it doesn't seem like you're doing? I could be totally wrong here, so feel free to leave it if it's necessary for the build
There was a problem hiding this comment.
Will repush with this removed and add it back if it turns out to actually result in failed builds.
Hmm.. I rebuilt //source/exe:envoy-static (not from a clean build though) with this change removed and things seemed to work. There are other places in api/envoy/config/filter/http/** that reference proto validation without this declaration in their BUILD files.
docs/root/intro/version_history.rst
Outdated
| * access log: added a new field for route name to file and gRPC access logger. | ||
| * access log: added a new field for response code details in :ref:`file access logger<config_access_log_format_response_code_details>` and :ref:`gRPC access logger<envoy_api_field_data.accesslog.v2.HTTPResponseProperties.response_code_details>`. | ||
| * access log: added several new variables for exposing information about the downstream TLS connection to :ref:`file access logger<config_access_log_format_response_code_details>` and :ref:`gRPC access logger<envoy_api_field_data.accesslog.v2.AccessLogCommon.tls_properties>`. | ||
| * access log: added a new flag for request rejected due to failed strict header check |
source/common/router/router.h
Outdated
| const ParseRetryFlagsFunc& parseFn) { | ||
| HeaderCheckResult r; | ||
| if (header_entry) { | ||
| const auto flags = parseFn(header_entry->value().getStringView()); |
There was a problem hiding this comment.
Thinking about this some more, maybe it would make more sense to have the parseFn return std::pair<uint32_t, bool> (or something to that effect, potentially a named struct), containing both the retry flags and whether there were any unknown fields? That way we avoid having to clear the bit in the other case and the interface might be a bit self explanatory?
Sorry for the churn here, hopefully we can get this to a place where we're all happy with it
There was a problem hiding this comment.
No worries re: churn. I think a named struct would be clearer at calling sites, but going to repush with std::pair and see how it reads during review. I agree that the clearing of the UNKNOWN flag is an implementation detail callers shouldn't need to remember.
source/common/router/router.cc
Outdated
| const FilterUtility::StrictHeaderChecker::HeaderCheckResult | ||
| FilterUtility::StrictHeaderChecker::test(Http::HeaderMap& headers, | ||
| const Http::LowerCaseString& target_header) { | ||
| HeaderCheckResult r; |
There was a problem hiding this comment.
This doesn't seem to be used besides returning? Might be easier to just inline it e.g. return HeaderCheckResult();
There was a problem hiding this comment.
Good catch; this is was from a time when we always returned "valid" to ignore a target_header didn't have an associated validator. This in combination with the assert + NOT_REACHED_GCOVR_EXCL_LINE below means we ought to be able to remove both this declaration and its return statement.
source/common/router/router.cc
Outdated
| &Router::RetryStateImpl::parseRetryGrpcOn); | ||
| } else { | ||
| // Should only validate headers for which we have implemented a validator | ||
| ASSERT(false); |
There was a problem hiding this comment.
you can use NOT_REACHED_GCOVR_EXCL_LINE to indicate that we'll never hit this and exclude it from coverage
source/common/router/router.cc
Outdated
| ENVOY_STREAM_LOG(debug, "cluster '{}' match for URL '{}'", *callbacks_, | ||
| route_entry_->clusterName(), headers.Path()->value().getStringView()); | ||
|
|
||
| if (config_.strict_check_headers_ != nullptr && !config_.strict_check_headers_->empty()) { |
There was a problem hiding this comment.
I think you can omit the empty check since the for loop will just noop in that case
source/common/router/router.h
Outdated
| const Http::HeaderEntry* entry_; | ||
| }; | ||
|
|
||
| static const HeaderCheckResult test(Http::HeaderMap& headers, |
There was a problem hiding this comment.
Mind adding a comment to this explaining what it does? A more descriptive name might be useful too, maybe checkHeader?
source/common/router/router.h
Outdated
| } | ||
| } | ||
| using HeaderVector = std::vector<Http::LowerCaseString>; | ||
| using StrictCheckHeadersPtr = std::unique_ptr<HeaderVector>; |
There was a problem hiding this comment.
since this is just a unique ptr to HeaderVector, i'd probably call it just HeaderVectorPtr
| "x-envoy-max-retries", "x-envoy-retry-on", | ||
| "x-envoy-retry-grpc-on")); | ||
|
|
||
| // Each test param instantiates a router that strict-checks one particular header. |
There was a problem hiding this comment.
Maybe add a test where multiple headers fail the check?
…s, all_valid) Signed-off-by: Xiao Yu <xyu@stripe.com>
…nits Signed-off-by: Xiao Yu <xyu@stripe.com>
Signed-off-by: Xiao Yu <xyu@stripe.com>
|
@snowp PTAL? This latest push addresses the following:
|
snowp
left a comment
There was a problem hiding this comment.
Thanks, this is coming along nicely
| @@ -1,4 +1,4 @@ | |||
| load("@envoy_api//bazel:api_build_system.bzl", "api_proto_library_internal") | |||
| load("@envoy_api//bazel:api_build_system.bzl", "api_go_proto_library", "api_proto_library_internal") | |||
There was a problem hiding this comment.
I'm no bazel expert but I thought this was only necessary if you directly referenced the api_go_proto_library rule in this BUILD file which it doesn't seem like you're doing? I could be totally wrong here, so feel free to leave it if it's necessary for the build
| bool wouldRetryFromReset(const Http::StreamResetReason reset_reason); | ||
| RetryStatus shouldRetry(bool would_retry, DoRetryCallback callback); | ||
|
|
||
| static uint32_t parseRetryOn_(absl::string_view config, bool flag_parse_failures); |
There was a problem hiding this comment.
Are these used for anything anymore?
There was a problem hiding this comment.
Oops good catch. Removed.
source/common/router/router.cc
Outdated
| } | ||
| NOT_REACHED_GCOVR_EXCL_LINE | ||
| // Should only validate headers for which we have implemented a validator. | ||
| ASSERT(false); |
There was a problem hiding this comment.
NOT_REACH_GCOVR_EXCL_LINE already calls abort(), so no need for this assert
| EXTERNAL_SERVICE = 1; | ||
| // The request was denied because a header value failed a :ref:`strict header check | ||
| // <envoy_api_field_config.filter.http.router.v2.Router.strict_check_headers>` failed. | ||
| STRICT_HEADER_CHECK_FAILED = 2; |
There was a problem hiding this comment.
Not sure if this belongs here, this seems to be dealing with authorization of an endpoint, not about whether the request is valid. If there isn't a similar flag for "invalid request reason" we can probably just drop this
There was a problem hiding this comment.
I think you're right. It looks like I should instead move this to be a bool field to match treatment of the other response flags in accesslog.proto
…d `parseRetry*_` Signed-off-by: Xiao Yu <xyu@stripe.com>
…lag, not an authorization reason Signed-off-by: Xiao Yu <xyu@stripe.com>
|
@snowp PTAL? Thanks for your continued patience and feedback! Repushed to address the following:
|
|
/retest ( |
|
🤷♀️ nothing to rebuild. |
|
/azp run |
|
Commenter does not have sufficient privileges for PR 7323 in repo envoyproxy/envoy |
|
Some details on envoy-macosx build failures:
Is this a known source of flakiness? Is there a way we could have |
mattklein123
left a comment
There was a problem hiding this comment.
Thank you. Really nicely done. Great work!
Description:
Before this change, Envoy would ignore invalid
x-envoy-*header values from aclient request (e.g.
x-envoy-max-retries: 3.0), silently falling back to thedefault value for that header. Introduce a
strict_check_headersconfig optionfor
envoy.routerthat adds optional support to explicitly reject requests withinvalid values for the following headers:
On rejection, Envoy responds with HTTP status 400 and sets a new response flag
IHto indicate the reason was due to an invalid header.Risk Level: Low/medium
Testing: unit tests
FilterUtility::StrictHeaderCheckerIHresponse flagclient -> envoy -> upstream serverto verify thatEnvoy returns a 400 and sets the response flag in the logs
Docs Changes:
router.protoforstrict_check_headersaccesslog.protoforIHresponse flagRelease Notes: updated for router and accesslog
Fixes #6482