ratelimit: add support for failure_mode_allow configuration#4073
ratelimit: add support for failure_mode_allow configuration#4073ggreenway merged 14 commits intoenvoyproxy:masterfrom
Conversation
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
|
@mattklein123 can you PTAL? |
|
Assigning to @junr03 for first pass. |
junr03
left a comment
There was a problem hiding this comment.
thanks for taking this @ramaraochavali!
| // Request was unauthorized by external authorization service. | ||
| UnauthorizedExternalService = 0x1000, | ||
| // Unable to call Ratelimiting service. | ||
| RateLimitingServiceError = 0x1200, |
| Stats::Scope& scope_; | ||
| Runtime::Loader& runtime_; | ||
| Upstream::ClusterManager& cm_; | ||
| bool failure_mode_allow_; |
| std::vector<RateLimit::Descriptor> descriptors_; | ||
| const InstanceStats stats_; | ||
| Runtime::Loader& runtime_; | ||
| bool failure_mode_allow_; |
| .value()); | ||
| } | ||
|
|
||
| TEST_F(HttpRateLimitFilterTest, ErrorResponseWithFailureAllowModeOff) { |
There was a problem hiding this comment.
nit: FailureModeAllow for naming consistency
| Json::ObjectSharedPtr json_config = Json::Factory::loadFromString(json); | ||
| envoy::config::filter::network::rate_limit::v2::RateLimit proto_config{}; | ||
| Envoy::Config::FilterJson::translateTcpRateLimitFilter(*json_config, proto_config); | ||
| // TODO(ramaraochavali): move the config to yaml and use a different config for failure_mode. |
There was a problem hiding this comment.
I would go ahead and do this, and change these tests to use a SetUpTest(const std::string& yaml) setup function. Prevents the subclass addition here, as you point out in the TODO.
| Json::ObjectSharedPtr json_config = Json::Factory::loadFromString(json); | ||
| envoy::config::filter::http::rate_limit::v2::RateLimit proto_config{}; | ||
| Config::FilterJson::translateHttpRateLimitFilter(*json_config, proto_config); | ||
| // TODO(ramaraochavali): move filter_config_ to yaml and use a different config for |
There was a problem hiding this comment.
I would go ahead and do this real quick.
|
|
||
| const uint64_t request_size_ = 1024; | ||
| const uint64_t response_size_ = 512; | ||
| bool failure_mode_ = false; |
There was a problem hiding this comment.
the integration test and the unit tests use the same varaible name failure_mode_ in opposite ways. In the unit tests we use it directly for the failure_mode_allow config value, and here we use it logically opposite, i.e if failure_mode_ is true then failure_mode_allow is false. Can we make it consistent between the tests?
| std::string result; | ||
|
|
||
| static_assert(ResponseFlag::LastFlag == 0x1000, "A flag has been added. Fix this code."); | ||
| static_assert(ResponseFlag::LastFlag == 0x1200, "A flag has been added. Fix this code."); |
There was a problem hiding this comment.
For this, and all the other asserts related to the LastFlag, we not only need to fix the value. There is accompanying code that needs to be fixed. For example here we need to add the new flag to the mapping between the enum and the string.
Any suggestions appreciated on how to make this maintenance more obvious.
There was a problem hiding this comment.
Oh. Sorry..did not realize that. Will change
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
|
@junr03 addressed the comments. PTAL. |
| RequestInfo::ResponseFlag::FaultInjected, | ||
| RequestInfo::ResponseFlag::RateLimited, | ||
| RequestInfo::ResponseFlag::UnauthorizedExternalService, | ||
| RequestInfo::ResponseFlag::RateLimitServiceError, |
There was a problem hiding this comment.
I'm going to take a look in the morning.
There was a problem hiding this comment.
Great Catch. I see the problem. The hex value I gave for RateLimitServiceError is incorrect so the intersectResponseFlags is returning wrong value. I have corrected it.
| const uint64_t request_size_ = 1024; | ||
| const uint64_t response_size_ = 512; | ||
| bool failure_mode_ = false; | ||
| bool failure_mode_ = true; |
There was a problem hiding this comment.
nit: sorry I don't think I was clear with my previous comment. Now that you changed the logic of this var, can we change the name to match the config name failure_mode_allow. I think that adds clarity.
Signed-off-by: Rama <rama.rao@salesforce.com>
|
@junr03 addressed review comments. PTAL. |
|
@junr03 can you PTAL when you get time? |
|
lgtm, @ggreenway can you take a look? |
Signed-off-by: Rama <rama.rao@salesforce.com>
| // not respond back. When it is set to true, Envoy will also allow traffic in case of | ||
| // communication failure between rate limiting service and the proxy. | ||
| // Defaults to true. | ||
| google.protobuf.BoolValue failure_mode_allow = 5; |
There was a problem hiding this comment.
It's preferable to use bool instead of g.p.BoolValue. Can you change this to type bool and name failure_mode_deny (so that the default value is false)? Or is there a good reason for making this a tri-state?
There was a problem hiding this comment.
There is no other reason except to be consistent with external_authz proto naming. I will change.
| // not respond back. When it is set to true, Envoy will also allow traffic in case of | ||
| // communication failure between rate limiting service and the proxy. | ||
| // Defaults to true. | ||
| google.protobuf.BoolValue failure_mode_allow = 5; |
Signed-off-by: Rama <rama.rao@salesforce.com>
Signed-off-by: Rama <rama.rao@salesforce.com>
| [this](Http::HeaderMap& headers) { addHeaders(headers); }); | ||
| callbacks_->requestInfo().setResponseFlag(RequestInfo::ResponseFlag::RateLimited); | ||
| } else if (status == RateLimit::LimitStatus::Error) { | ||
| if (config_->failureModeAllow()) { |
There was a problem hiding this comment.
I did not change the method name in config because it seems more readable like this. LMK if you think otherwise
ggreenway
left a comment
There was a problem hiding this comment.
This is looking pretty good. A few more nits, but nothing major.
| option go_package = "v2"; | ||
|
|
||
| import "google/protobuf/duration.proto"; | ||
| import "google/protobuf/wrappers.proto"; |
There was a problem hiding this comment.
nit: this isn't needed anymore
|
|
||
| import "envoy/api/v2/ratelimit/ratelimit.proto"; | ||
| import "google/protobuf/duration.proto"; | ||
| import "google/protobuf/wrappers.proto"; |
| If the rate limit service is called, and the response for any of the descriptors is over limit, a | ||
| 429 response is returned. | ||
|
|
||
| If there is an error in calling rate limit service or rate limit service returns an error and :ref:`failure_mode_allow <envoy_api_msg_config.filter.http.rate_limit.v2.RateLimit>` is |
| error, Counter, Total errors contacting the rate limit service | ||
| over_limit, Counter, total over limit responses from the rate limit service | ||
| failure_mode_allowed, Counter, "Total requests that were error(s) but were allowed through because | ||
| of :ref:`failure_mode_allow <envoy_api_msg_config.filter.http.rate_limit.v2.RateLimit>` set to true." |
| ok, Counter, Total under limit responses from the rate limit service | ||
| cx_closed, Counter, Total connections closed due to an over limit response from the rate limit service | ||
| active, Gauge, Total active requests to the rate limit service | ||
| failure_mode_allowed, Counter, "Total requests that were error(s) but were allowed through because |
There was a problem hiding this comment.
failure_mode_deny: I'm fine with the stat being named either way, but the doc link needs to be updated.
| callbacks_->requestInfo().setResponseFlag(RequestInfo::ResponseFlag::RateLimited); | ||
| } else if (status == RateLimit::LimitStatus::Error) { | ||
| if (config_->failureModeAllow()) { | ||
| cluster_->statsScope().counter("ratelimit.failure_mode_allowed").inc(); |
There was a problem hiding this comment.
Do we need this new counter at all? We already have a counter for ratelimit service failures, and presumably the user knows which way they have this configured (failure_mode_deny value). In one case this counter will always be zero, in the other it will be equal to ratelimit.error. So is there value in this new counter?
There was a problem hiding this comment.
I think this would help in general, while looking at stats user do not have to check back on what his configuration is.
Signed-off-by: Rama <rama.rao@salesforce.com>
…nvoyproxy#4073)" This reverts commit ac0bd74.
|
@ramaraochavali @ggreenway, @danielhochman spotted a problem in this patch here https://github.com/envoyproxy/envoy/pull/4073/files#diff-12a885082b482d2c6be37666e04013b0R153. We noticed that in the new flow, we continueDecoding without checking initiating_call_. This is causing crashes at Lyft, so we'd like to revert. |
…nvoyproxy#4073)" This reverts commit ac0bd74.
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Description: Adds supports for specifying
failure_mode_allowconfiguration in rate limit filter config. It is defaulted totrueto be sync with current behaviour.Risk Level: Low
Testing: Automated tests
Docs Changes: Updated
Release Notes: Updated
Fixes #4023