Add descriptor support in HTTP Local Rate Limiting#14172
Add descriptor support in HTTP Local Rate Limiting#14172gargnupur wants to merge 4 commits intoenvoyproxy:masterfrom
Conversation
|
A common use case is to have local overrides for the service account of the client. Local and global RL configurations are unexpectedly different. Local rate limit can be thought of as global rate limit with the |
|
Taking the yaml from the test and editing it a little bit this is the scenario, I think we should enable:
What this says is that for any route, we allow 1000request/min but if "client_id" is "foo" and path is "foo/bar" we only allow 10 requests per min and if just "client_id" is "foo" then we allow 100 requests per minute. As said in the description above, I am thinking to map request_per_unit to tokens_per_fill and unit to fill_interval in token bucket algo implementation. Also, @mattklein123 https://github.com/envoyproxy/envoy/blob/master/api/envoy/extensions/common/ratelimit/v3/ratelimit.proto mentions the use of "client_id", how can we get that from Ratelimit actions(https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/rate_limit_filter#composing-actions)? I feel this is something we should add using ssl info... |
rgs1
left a comment
There was a problem hiding this comment.
left a few comments, deferring to @mattklein123 in terms of aligning this with the network local rate limiter from an API's PoV. Thanks!
There was a problem hiding this comment.
i think if per_route is true, we should throw an error here instead of just warning.
There was a problem hiding this comment.
we can avoid returning a copy of this, given it's an instance member
There was a problem hiding this comment.
I actually removed it as an instance member now..
There was a problem hiding this comment.
if you really want this helper, return a const ref instead of a copy
There was a problem hiding this comment.
It's an enum.. so not passing in as const ref..
mattklein123
left a comment
There was a problem hiding this comment.
Thanks for working on this. I have a vague idea of what is going on here but can you flesh out the docs a bit and I think that will help the review?
/wait
There was a problem hiding this comment.
| bool shouldRatelimit = false; | |
| bool should_rate_limit = false; |
There was a problem hiding this comment.
This is removed now..
There was a problem hiding this comment.
It would be nice to somehow gate this code entirely on whether there are any descriptors configured at all.
api/envoy/extensions/filters/http/local_ratelimit/v3/local_rate_limit.proto
Outdated
Show resolved
Hide resolved
|
/retest |
|
Retrying Azure Pipelines: |
There was a problem hiding this comment.
Could use the same syntax as token bucket above here? It feels inconsistent to have two ways to specify upper bound in the same config. The max tokens is also missing.
There was a problem hiding this comment.
It's reusing existing Descriptor object that http global rate limit uses and hence there is no max tokens and token bucket here...
There was a problem hiding this comment.
You can define a new message if you want, up to you. I agree with @kyessenov that it would be better to be consistent and the message is very simple so should be fine?
There was a problem hiding this comment.
okay.. so how about adding TokenBucket as a message in RateLimitDescriptor.
Global one can use overrride units and local token bucket.. I updated just the api proto.. ptal and I will update rest of the code using it.
There was a problem hiding this comment.
No, I don't think that makes sense. Please add a new LocalRateLimitDescriptor. That can include a RateLimitDescriptor and a token bucket config for example.
/wait
There was a problem hiding this comment.
Added... LocalRateLimitDescriptor with Entries and token bucket... don't think it makes sense to include whole RateLimitDescriptor and get RateLimitOverride which is specific to global?
There was a problem hiding this comment.
No wait, I would need whole RateLimitDescriptor in LocalRateLimitDescriptor, if I want to keep code common between global and local for populating these descriptors from RouteEntry
There was a problem hiding this comment.
updated api change..ptal
docs/root/configuration/http/http_filters/local_rate_limit_filter.rst
Outdated
Show resolved
Hide resolved
|
@mattklein123 , @rgs1 : Can you please review this again and see if design makes sense now? |
Can you fix CI and I will look? |
mattklein123
left a comment
There was a problem hiding this comment.
Before I actually review any code let's sort out the API, thank you!
/wait
docs/root/configuration/http/http_filters/local_rate_limit_filter.rst
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
You can define a new message if you want, up to you. I agree with @kyessenov that it would be better to be consistent and the message is very simple so should be fine?
mattklein123
left a comment
There was a problem hiding this comment.
At a high level I think this is the right approach. I left some API comments and some very quick code review comments. I will do a full pass once it's in a more final state. Thanks!
/wait
There was a problem hiding this comment.
In looking at this more I would just inline key and value in here or use RateLimitDescriptor.Entry
There was a problem hiding this comment.
Very quick drive by pre-full code review: This code is complicated and should not be duplicated here and elsewhere. Refactor to have a single copy of the complicated logic.
There was a problem hiding this comment.
More useful error message.
include/envoy/ratelimit/ratelimit.h
Outdated
There was a problem hiding this comment.
Don't you have to hash on key and value here? I think it would be better to have the hash operator on DescriptorEntry if you need that. Also, remover the operator== you don't need since I don't see you hashing on all of it.
There was a problem hiding this comment.
just hashing on entries as we didn't care about match on limit here
There was a problem hiding this comment.
| absl::flat_hash_map<Envoy::RateLimit::Descriptor, long int> last_refill_per_descriptor_; | |
| absl::flat_hash_map<Envoy::RateLimit::Descriptor, uint32_t> last_refill_per_descriptor_; |
There was a problem hiding this comment.
This will definitely change now as we would be using tokens only...
There was a problem hiding this comment.
Left as long int because otherwise we loose precision of count of current time.
mattklein123
left a comment
There was a problem hiding this comment.
Flushing out some more high level comments. Will take another pass once we finish the API changes, thanks!
/wait
api/envoy/extensions/filters/http/local_ratelimit/v3/local_rate_limit.proto
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Is this useful anymore since we now require LocalRateLimitDescriptor? I think I would remove all of this for now until we need it? If/when we need it I think this should be provided by route config override anyway?
There was a problem hiding this comment.
Yeah thinking of config makes it very complicated.. agreed and removed.
It was just easy to implement as it existed in global one.
include/envoy/ratelimit/ratelimit.h
Outdated
There was a problem hiding this comment.
| TokenBucket token_bucket_ = {}; | |
| TokenBucket token_bucket_; |
There was a problem hiding this comment.
This can be uint64_t or better just std::chrono::milliseconds if you are actually storing a time in here (or some other monotonic time point type)
There was a problem hiding this comment.
using std::chrono::time_point
There was a problem hiding this comment.
Where is this documented? Do we need this?
There was a problem hiding this comment.
Removed this whole file.. as there is very less common code left now..
|
Hey Matt,
I am restructuring config descriptors to be inside rate_limit_impl itself
as you suggested to remove fillDescriptor as its used only once..does that
sound ok?
Will update the whole thing at once..
On Wed, Dec 16, 2020 at 12:12 PM Matt Klein ***@***.***> wrote:
Please merge main to fix CI and then I will take another pass.
/wait
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#14172 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AI634YWC5FWHLH4TD3YZ2YDSVEICLANCNFSM4UBQEMLQ>
.
--
Thanks,
Nupur
|
b1e2d29 to
e54d90f
Compare
|
@mattklein123 , @kyessenov : All tests are passing but coverage is failing with this error: Is it possible to get where coverage is missing and get this number quickly locally? |
|
@gargnupur Coverage is uploaded in CI logs. https://storage.googleapis.com/envoy-pr/0cc63b2/coverage/index.html. |
|
mattklein123
left a comment
There was a problem hiding this comment.
Flushing out the next round of comments, thanks.
/wait
There was a problem hiding this comment.
You should be able to update docs in other places. Please try again and figure out what the issue is assuming you are syncing the protos correctly.
There was a problem hiding this comment.
Feel free to update the examples
There was a problem hiding this comment.
Why is this only done for the per_route case? I'm not sure why per_route is passed to this in general. I don't think this object should care where it's used? It should either have descriptor settings or not?
There was a problem hiding this comment.
Because we will get descriptors in routes config in filter.. hence this was added..
There was a problem hiding this comment.
Sorry I don't understand. The filter can either have a global config or a per route config with a limiter defined for either, right? Why do we need to understand per-route here?
There was a problem hiding this comment.
And descriptors will only work for per route config as at runtime we get them through rate limit actions in route components proto
There was a problem hiding this comment.
That doesn't make sense. I don't think you need per_route_config for this to work. You can have a global config for the filter which still references a route table. I think this should be removed here.
There was a problem hiding this comment.
Can this happen? Isn't this blocked by validations? Please double check you have coverage for all error handling, and remove error handling that can't happen.
There was a problem hiding this comment.
This is the validation we added.. it was previously happening in local rate limit filter but we moved parsing here...
There was a problem hiding this comment.
? You have proto validations that check this, right? How can this logic be hit?
There was a problem hiding this comment.
sure.. makes sense.. removing these
There was a problem hiding this comment.
Is this logic coped from somewhere else? Share with the parent object logic for filling out token buckets, etc.
There was a problem hiding this comment.
We are filling token buckets here only now.. not shared with anything else..
There was a problem hiding this comment.
This doesn't make sense. This is inside the descriptor loop. The same error checking must happen for the global token bucket.
There was a problem hiding this comment.
we are making sure that descriptor token bucket fill interval is a multiple of global token bucket as the timer is on global token bucket
There was a problem hiding this comment.
Ah OK I see. Please document this somewhere on the actual proto fields. This is not clear from the docs.
There was a problem hiding this comment.
I don't think this function is needed. It would be simpler to just have the requestAllowed() function return false or an enum saying no descriptors if that is actually required.
There was a problem hiding this comment.
Added this because we wanted to do check for descriptors only if they were configured and as we moved parsing of descriptors to local rate limit.. this function moved here too
There was a problem hiding this comment.
I don't think this flow makes sense and you can simplify. The filter should be able to pass all (optional) data into the limiter and have it to what it needs to do. Please simplify.
There was a problem hiding this comment.
let me look at this again...
There was a problem hiding this comment.
Can you just collapse this into a single function? It's not clear to me why we have to have separate functions here, + another findDescriptor function. Can you just pass in all the state needed to do the requestAllowed() computation and have this object sort it out?
There was a problem hiding this comment.
so combine requestAllowed with findDescriptor ? as we need requestAllowedHelper so that requestAllowed and requestAllowed for descriptor can share same synchronize code
There was a problem hiding this comment.
The only public function should be requestAllowed(...). Pass in any data needed to compute this, even if it's an empty list of descriptors. Then the function can do what it needs to do and should still be efficient. This will greatly simplify the overall logic.
|
@mattklein123 , @kyessenov : This is ready for review again! |
|
@gargnupur I think maintainers are on vacation till next week. |
787c320 to
6aadc14
Compare
|
/retest
@kyessenov : Thanks a lot! The failed test in coverage run is passing locally and in another test target on CI too, so retesting. |
|
Retrying Azure Pipelines: |
|
Please check CI. Also, DCO needs to be fixed and I'm going to review the whole thing again anyway, so please just squash the whole thing, rebase on main, and force push. @kyessenov if you want to open a fresh PR since you said you were going to finish it, let me know. That's fine also. Thank you! /wait |
Signed-off-by: gargnupur <gargnupur@google.com> Add descriptors to HTTP Local Rate Limit Signed-off-by: gargnupur <gargnupur@google.com> fix build file Signed-off-by: gargnupur <gargnupur@google.com> Add descriptors to HTTP Local Rate Limit Signed-off-by: gargnupur <gargnupur@google.com> refactor code and use rate limit Signed-off-by: gargnupur <gargnupur@google.com> cleanup Signed-off-by: gargnupur <gargnupur@google.com> Fix docs build Signed-off-by: gargnupur <gargnupur@google.com> fix clang err Signed-off-by: gargnupur <gargnupur@google.com> add one more test case Signed-off-by: gargnupur <gargnupur@google.com> Add more test Signed-off-by: gargnupur <gargnupur@google.com> fix fmt Signed-off-by: gargnupur <gargnupur@google.com> Add api change Signed-off-by: gargnupur <gargnupur@google.com> add LocalRateLimitDescriptor Signed-off-by: gargnupur <gargnupur@google.com> add LocalRateLimitDescriptor with RateLimitDescriptor Signed-off-by: gargnupur <gargnupur@google.com> LocalRateLimitDescriptor with RateLimitDescriptor impl Signed-off-by: gargnupur <gargnupur@google.com> fix clang err Signed-off-by: gargnupur <gargnupur@google.com> Use time_point in local rate limit Signed-off-by: gargnupur <gargnupur@google.com> Run proto format Signed-off-by: gargnupur <gargnupur@google.com> Apply suggestions from code review Co-authored-by: Matt Klein <mattklein123@gmail.com> Signed-off-by: gargnupur <gargnupur@google.com> fixed based on feedback Signed-off-by: gargnupur <gargnupur@google.com> fixed based on feedback Signed-off-by: gargnupur <gargnupur@google.com> fixing build Signed-off-by: gargnupur <gargnupur@google.com> fix test Signed-off-by: gargnupur <gargnupur@google.com> add more test for coverage Signed-off-by: gargnupur <gargnupur@google.com> add more test for coverage Signed-off-by: gargnupur <gargnupur@google.com> fixed based on feedback Signed-off-by: gargnupur <gargnupur@google.com> fix test and add more test and add a note Signed-off-by: gargnupur <gargnupur@google.com> fix fmt Signed-off-by: gargnupur <gargnupur@google.com> add comment about local rate limit in route_components.proto Signed-off-by: gargnupur <gargnupur@google.com>
6aadc14 to
7bf86b8
Compare
Signed-off-by: gargnupur <gargnupur@google.com>
Signed-off-by: gargnupur <gargnupur@google.com>
Signed-off-by: gargnupur <gargnupur@google.com>
|
Closing in favor of the other PR. |
Commit Message: Add descriptor support in HTTP Local Rate Limiting
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated
:]
This adds descriptor support in HTTP Local Rate Limiting so that rate limiting can happen on certain attributes.
It's trying to reuse the features in HTTP Global Rate Limiting, so that it's easier for users to configure.
Sending it as Draft Pull Request as want to get feedback on the below design.
Also, want to add ability to override as I feel that would be really useful but would have to combine it with token algorithm.
I was thinking to extend rate_limiter_impl to support this. We can use RateLimitOverride.requests_per_unit as tokens_bucket.tokens_per_fill and ask users to set type.v3.RateLimitUnit a multiple of fill_interval. Then maintain ratelimit state for these descriptors in rate_limiter_impl.
Please let me know what you guys think...