Skip to content

Per connection local rate limiting#15843

Merged
htuch merged 374 commits intoenvoyproxy:mainfrom
gokulnair:perconn_rl_15637
May 25, 2021
Merged

Per connection local rate limiting#15843
htuch merged 374 commits intoenvoyproxy:mainfrom
gokulnair:perconn_rl_15637

Conversation

@gokulnair
Copy link
Copy Markdown
Contributor

@gokulnair gokulnair commented Apr 5, 2021

Commit Message:
This is a PR for scoping token buckets in the local rate limiting flow on a per connection basis as opposed to scoping it on the entire envoy instance. More details in #15637

Additional Description:
Currently, the HTTP local rate limiter's token bucket is shared across all workers, thus causing the rate limits to be applied per Envoy instance/process. This could potentially result in bad actors quickly exhausting limits on a given envoy instance before legitimate users have had a fair chance. We achieve this by adding an instance of the LocalRateLimit::LocalRateLimiterImpl to each connection object, if there isn't one already, via FilterState data

Risk Level: Low

Testing:
Added unit tests to local_ratelimit
Manually tested via curl'ing against a locally patched envoy instance.
One can send multiple requests on the same connection via curl using the following:
curl -vI example.com example.com

Docs Changes:
Added new toggle to local rate limit configuration to enable per connection local rate limiting

// Specifies the scope of the rate limiter's token bucket. 
// If set to false, the token bucket is shared across all worker threads
// thus the rate limits are applied per Envoy process.
// If set to true, a token bucket is allocated for each connection.
// Thus the rate limits are applied per connection thereby allowing
// one to rate limit requests on a per connection basis.
// If unspecified, the default value is false.
bool local_rate_limit_per_downstream_connection

Sample configuration

typed_config:
  "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
  stat_prefix: http_local_rate_limiter
  token_bucket:
    max_tokens: 10000
    tokens_per_fill: 1000
    fill_interval: 1s
  filter_enabled:
    runtime_key: local_rate_limit_enabled
    default_value:
      numerator: 100
      denominator: HUNDRED
  filter_enforced:
    runtime_key: local_rate_limit_enforced
    default_value:
      numerator: 100
      denominator: HUNDRED
  response_headers_to_add:
    - append: false
      header:
        key: x-local-rate-limit
        value: 'true'
  local_rate_limit_per_downstream_connection: true

[Optional Fixes #Issue]
#15637

@gokulnair gokulnair requested a review from mattklein123 as a code owner April 5, 2021 20:38
@repokitteh-read-only
Copy link
Copy Markdown

Hi @gokulnair, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #15843 was opened by gokulnair.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
API shepherd assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #15843 was opened by gokulnair.

see: more, trace.

@gokulnair
Copy link
Copy Markdown
Contributor Author

cc @mattklein123 @rgs1

@gokulnair gokulnair marked this pull request as draft April 5, 2021 20:45
@mattklein123 mattklein123 assigned rgs1 and mattklein123 and unassigned lizan Apr 6, 2021
@gokulnair gokulnair changed the title WIP: Per connection local rate limiting [WIP] Per connection local rate limiting Apr 6, 2021
@rgs1
Copy link
Copy Markdown
Member

rgs1 commented Apr 6, 2021

@gokulnair mind addressing the format error:

trailing whitespace: source/extensions/filters/http/local_ratelimit/local_ratelimit.h

I'll review the rest in the meanwhile, thanks!

Copy link
Copy Markdown
Member

@rgs1 rgs1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass, let's also add docs and tests. Thanks!

Copy link
Copy Markdown
Member

@rgs1 rgs1 Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put a long description here explaining what this knob does, when it might be a good idea to use it, etc, etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to make this name a little more descriptive, such as rate_limiter_per_connection maybe?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that would help. I think rate_limiter_per_connection is good, given that we have connection_pool_per_downstream_connection in the cluster msg definition. Alternatively apply_per_connection is a suggestion if you are unconvinced.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, per_connection_local_rate_limiter is also good. I think I like that one the most.

@gokulnair gokulnair marked this pull request as ready for review April 23, 2021 23:44
@gokulnair gokulnair changed the title [WIP] Per connection local rate limiting Per connection local rate limiting Apr 26, 2021
@gokulnair
Copy link
Copy Markdown
Contributor Author

Thanks for the reviews! Let me know if there's anything else that's needed to get this going.

@alyssawilk alyssawilk self-assigned this May 20, 2021
Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I'm sorry about review lag - Matt would usually take the lead here but his leave got unexpectedly extended. I'll pick it up instead!

@gokulnair
Copy link
Copy Markdown
Contributor Author

Oops, I'm sorry about review lag - Matt would usually take the lead here but his leave got unexpectedly extended. I'll pick it up instead!

Thanks @alyssawilk!

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks totally solid! Just a few minor nits, and thanks for your patience both with review lag and me getting a handle on use case :-)

EXPECT_EQ(1U, findCounter("test.http_local_rate_limit.rate_limited"));
}

TEST_F(FilterTest, RequestRateLimitedPerConnection) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT of adding an integration test? I think you can crib off of existing tests in ratelimit_integration_test.cc plus some of the h2 tests to make sure if you stick a limit of one stream, that a second stream on that connection won't (immediately) go upstream but a stream on a separate connection will pass through.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can take a look at adding an integration test for the above scenario. The remaining comments should be addressed and good to go.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! After taking a closer look, seems like the tests in ratelimit_integration_test.cc are all targeting the 'rate limiter as an external service' use case and it makes complete sense to have integration tests for that.
For local rate limiting within envoy however, doesn't really look like we have any integration tests and I'm wondering if it buys us anything more than what the unit tests do currently. Thoughts? cc @rgs1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think integration tests generally have value - I wasn't convinced setting connection data on the per-request StreamInfo would work as expected (there are no currently filters which do this, but I went ahead and tweaked some existing code to verify it worked for my own satiscation =P) but given the review delay this PR already suffered and lack of local rate limit tests to crib off of, I'm inclined to let it go this once though if you want to do a follow-up you'd totally earn brownie points :-)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brownie points it is then :-) I do have a draft up but I'm fighting the integration test api a bit at the moment, specifically with trying to get it to send multiple requests over the same connection while still ensuring that the first one generates an upstream request over its fake_upstreams_ while the subsequent locally rate limited ones don't. It's holding it up more than necessary so a follow up task is making more sense at this point.

Thanks for taking a look!!

const Filters::Common::LocalRateLimit::LocalRateLimiterImpl& Filter::getRateLimiter() {
if (!decoder_callbacks_->streamInfo().filterState()->hasData<PerConnectionRateLimiter>(
PerConnectionRateLimiter::key())) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra whitespace?

}

if (config->requestAllowed(descriptors)) {
const bool is_request_allowed = config->rateLimitPerConnection()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd either put the config->request(allowed) vs getRatelimiter().requestAllowed switch in Filter::requestAllowed or rename requestAllowed to indicate it only applies to per-connection-rate-limited requests.

ditto for getRateLimiter. getConnectionRateLimiter? with an assert that config->rateLimitPerConnection()?

Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just some minor comments / nits.

// If set to true, a token bucket is allocated for each connection.
// Thus the rate limits are applied per connection thereby allowing
// one to rate limit requests on a per connection basis.
bool local_rate_limit_per_downstream_connection = 11;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a proxy to have both per-process and per downstream connection rate limits, or just one or the other?

Another potential question is if we should consider some additional rate limit criteria in the future like downstream IP or HTTP Cookie. If we expect additional criteria in the future, we may want to make this an enum field.

Copy link
Copy Markdown
Contributor Author

@gokulnair gokulnair May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely one or the other as it stands currently as I'm not sure the added complexity of dealing with potentially conflicting token bucket quotas between the per process and per connection configurations and the precedence rules we'd have to handle buys us much in the way of functionality.

We did indeed consider an enum initially but it didn't seem like there were very many realistic use cases mainly because a lot of the toggles that were based on certain request characteristics such as IP, Cookie etc can be handled today by rate limiting on request descriptors ...

if (config->requestAllowed(descriptors)) {
const bool is_request_allowed = config->rateLimitPerConnection()
? requestAllowed(descriptors)
: config->requestAllowed(descriptors);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this branch to the requestAllowed method and call the getRateLimiter() or config requestAllowed method based on config.

It would be fine to pass config as an argument to Filter::requestAllowed

return getRateLimiter().requestAllowed(request_descriptors);
}

const Filters::Common::LocalRateLimit::LocalRateLimiterImpl& Filter::getRateLimiter() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming nit: getPerConnectionRateLimiter

header:
key: x-local-ratelimited
value: 'true'
local_rate_limit_per_downstream_connection: {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the {} is used in substitutions below. It would be good to add an end-of-line comment explaining the {}. I think that the yaml comment character is #

Copy link
Copy Markdown
Contributor Author

@gokulnair gokulnair May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an explanation below.

filter_->decodeHeaders(request_headers, false));
EXPECT_EQ(Http::FilterHeadersStatus::Continue, filter_2_->decodeHeaders(request_headers, false));
EXPECT_EQ(Http::FilterHeadersStatus::StopIteration,
filter_2_->decodeHeaders(request_headers, false));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment explaining the differences between this test and FilterTest.RequestRateLimited may be helpful for future readers. I think this shows that the limit is applied by connection by verifying that filter_2_ allows requests after filter_ hits the rate limit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a brief explanation to the test.

@gokulnair
Copy link
Copy Markdown
Contributor Author

Looks good, just some minor comments / nits.

Thanks!

Signed-off-by: Gokul Nair <gnair@twitter.com>
@gokulnair
Copy link
Copy Markdown
Contributor Author

I've addressed all comments/nits. Will take a look at adding the integration test shortly.

Signed-off-by: Gokul Nair <gnair@twitter.com>
Copy link
Copy Markdown
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I'm sure that the integration test you're looking into would make it even better.

Thanks!

@antoniovicente
Copy link
Copy Markdown
Contributor

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #15843 (comment) was created by @antoniovicente.

see: more, trace.

Copy link
Copy Markdown
Contributor

@alyssawilk alyssawilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Could you please update the PR description with the appropriate fields (testing: unit tests, docs: inline etc) and then I'll let @antoniovicente merge when he's set.

EXPECT_EQ(1U, findCounter("test.http_local_rate_limit.rate_limited"));
}

TEST_F(FilterTest, RequestRateLimitedPerConnection) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think integration tests generally have value - I wasn't convinced setting connection data on the per-request StreamInfo would work as expected (there are no currently filters which do this, but I went ahead and tweaked some existing code to verify it worked for my own satiscation =P) but given the review delay this PR already suffered and lack of local rate limit tests to crib off of, I'm inclined to let it go this once though if you want to do a follow-up you'd totally earn brownie points :-)

@antoniovicente
Copy link
Copy Markdown
Contributor

LGTM

Could you please update the PR description with the appropriate fields (testing: unit tests, docs: inline etc) and then I'll let @antoniovicente merge when he's set.

I don't have remaining comments. The remaining question is wherever or not integration tests are in the work as part of this PR or a followup. A followup makes the most sense to me at this point, we should go ahead and merge.

@gokulnair gokulnair closed this May 24, 2021
@gokulnair gokulnair reopened this May 24, 2021
@gokulnair
Copy link
Copy Markdown
Contributor Author

gokulnair commented May 24, 2021

I don't have remaining comments. The remaining question is wherever or not integration tests are in the work as part of this PR or a followup. A followup makes the most sense to me at this point, we should go ahead and merge.

I do agree that a follow task makes the most sense at this time (I do have a draft up but its still WIP)
Also, I've updated the PR description with more details. Looks like we should be good to go for the merge but let me know if there's anything else that's needed from my end.
Thanks! 👍

@gokulnair
Copy link
Copy Markdown
Contributor Author

gokulnair commented May 24, 2021

/lgtm api

@htuch Mind approving the api change once again, please? (we updated the doc in the proto file)
Thanks!

Copy link
Copy Markdown
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm api

@htuch htuch merged commit 151aa0c into envoyproxy:main May 25, 2021
leyao-daily pushed a commit to leyao-daily/envoy that referenced this pull request Sep 30, 2021
This is a PR for scoping token buckets in the local rate limiting flow on a per connection basis as opposed to scoping it on the entire envoy instance. More details in envoyproxy#15637

Currently, the HTTP local rate limiter's token bucket is shared across all workers, thus causing the rate limits to be applied per Envoy instance/process. This could potentially result in bad actors quickly exhausting limits on a given envoy instance before legitimate users have had a fair chance. We achieve this by adding an instance of the LocalRateLimit::LocalRateLimiterImpl to each connection object, if there isn't one already, via FilterState data

Risk Level: Low

Testing:
Added unit tests to local_ratelimit
Manually tested via curl'ing against a locally patched envoy instance.
One can send multiple requests on the same connection via curl using the following:
curl -vI example.com example.com

Docs Changes:
Added new toggle to local rate limit configuration to enable per connection local rate limiting

// Specifies the scope of the rate limiter's token bucket. 
// If set to false, the token bucket is shared across all worker threads
// thus the rate limits are applied per Envoy process.
// If set to true, a token bucket is allocated for each connection.
// Thus the rate limits are applied per connection thereby allowing
// one to rate limit requests on a per connection basis.
// If unspecified, the default value is false.
bool local_rate_limit_per_downstream_connection
Sample configuration

typed_config:
  "@type": type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
  stat_prefix: http_local_rate_limiter
  token_bucket:
    max_tokens: 10000
    tokens_per_fill: 1000
    fill_interval: 1s
  filter_enabled:
    runtime_key: local_rate_limit_enabled
    default_value:
      numerator: 100
      denominator: HUNDRED
  filter_enforced:
    runtime_key: local_rate_limit_enforced
    default_value:
      numerator: 100
      denominator: HUNDRED
  response_headers_to_add:
    - append: false
      header:
        key: x-local-rate-limit
        value: 'true'
  local_rate_limit_per_downstream_connection: true

Fixes envoyproxy#15637

Signed-off-by: Gokul Nair <gnair@twitter.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deps Approval required for changes to Envoy's external dependencies v2-freeze

Projects

None yet

Development

Successfully merging this pull request may close these issues.