Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Paraswap rate limiting #168

Merged
merged 13 commits into from
Apr 29, 2022
Merged

Handle Paraswap rate limiting #168

merged 13 commits into from
Apr 29, 2022

Conversation

MartinquaXD
Copy link
Contributor

Fixes #114

Because we sometimes get rate limited by the paraswap API we need a mechanism to automatically back off until we are able to get proper results again.
While we are being rate limited every request to the paraswap API will return immediately with the error "rate limited".
Successive 429 results increase the time we drop requests exponentially.
If we receive a non 429 result we no longer drop any requests.

Note that once we run into the rate limit we usually take ~5min to recover from that. That is already way better than the status quo because there were already times where we got rate limited by paraswap for hours on end.

CLI help:

--paraswap-rate-limiter <PARASWAP_RATE_LIMITER>
            Configures the back off strategy for the paraswap API when our requests get rate
            limited.
            Requests issued while back off is active get dropped entirely.
            Needs to be passed as "<back_off_growth_factor>,<min_back_off>,<max_back_off>".
            back_off_growth_factor: f64 > 1.0
            min_back_off: Duration in milliseconds
            max_back_off: Duration in milliseconds

            [env: PARASWAP_RATE_LIMITER=]

Test Plan

Manual test for parsing CLI argument
Ignored unit test to see exponential back off in action

Logs
200 OK
200 OK
200 OK
200 OK
error: rate limited
sleeping for 15 milliseconds
200 OK
200 OK
200 OK
200 OK
error: rate limited
sleeping for 15 milliseconds
200 OK
200 OK
error: rate limited
sleeping for 15 milliseconds
error: rate limited
sleeping for 31 milliseconds
200 OK
error: rate limited
sleeping for 15 milliseconds
200 OK
200 OK
error: rate limited
sleeping for 15 milliseconds
200 OK
error: rate limited
sleeping for 15 milliseconds
error: rate limited
sleeping for 31 milliseconds
200 OK
200 OK
200 OK
200 OK
error: rate limited
sleeping for 15 milliseconds
error: rate limited
sleeping for 31 milliseconds
error: rate limited
sleeping for 63 milliseconds
200 OK
200 OK
error: rate limited
sleeping for 15 milliseconds
error: rate limited
sleeping for 31 milliseconds
200 OK
error: rate limited
sleeping for 15 milliseconds
error: rate limited
sleeping for 31 milliseconds
200 OK
200 OK
200 OK
error: rate limited
sleeping for 15 milliseconds
error: rate limited
sleeping for 31 milliseconds
200 OK
error: rate limited
sleeping for 15 milliseconds
200 OK
200 OK
error: rate limited
sleeping for 15 milliseconds
200 OK
error: rate limited
sleeping for 15 milliseconds
error: rate limited
sleeping for 31 milliseconds
error: rate limited
sleeping for 63 milliseconds
200 OK
200 OK
200 OK
error: rate limited
sleeping for 15 milliseconds
error: rate limited
sleeping for 31 milliseconds
error: rate limited
sleeping for 63 milliseconds
error: rate limited
sleeping for 127 milliseconds
error: rate limited
sleeping for 255 milliseconds
error: rate limited
sleeping for 511 milliseconds
error: rate limited
sleeping for 1023 milliseconds
error: rate limited
sleeping for 2047 milliseconds

@MartinquaXD MartinquaXD requested a review from a team as a code owner April 25, 2022 14:02
@codecov-commenter
Copy link

codecov-commenter commented Apr 25, 2022

Codecov Report

Merging #168 (1e5c209) into main (412f36d) will decrease coverage by 0.16%.
The diff coverage is 24.59%.

@@            Coverage Diff             @@
##             main     #168      +/-   ##
==========================================
- Coverage   64.81%   64.64%   -0.17%     
==========================================
  Files         185      185              
  Lines       38398    38577     +179     
==========================================
+ Hits        24889    24940      +51     
- Misses      13509    13637     +128     

crates/shared/src/arguments.rs Outdated Show resolved Hide resolved
crates/shared/src/http_client.rs Outdated Show resolved Hide resolved
crates/shared/src/http_client.rs Show resolved Hide resolved
Comment on lines 70 to 71
let increased_back_off = self.next_back_off.mul_f64(self.back_off_growth_factor);
self.next_back_off = std::cmp::min(increased_back_off, self.max_back_off);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be cleaner to store a rate_limited_responses_count: u64 and calculate next_back_off from scratch every time? I'm not sure.

crates/shared/src/paraswap_api.rs Show resolved Hide resolved
pub fn response_rate_limited(&mut self, previous_rate_limits: u64) -> Option<Duration> {
if self.times_rate_limited != previous_rate_limits {
// Don't increase back off if somebody else already updated it in the meantime.
return None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of returning None, should we return the current backoff here, even if we don't increment the counter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line with the comment above, we need some way to avoid printing duplicated messages.
In the first version of this implementation I logged messages while holding the Mutex lock. The code for that was way cleaner and easier to understand but also made the critical section super long.
Now I only return Some(Duration) in case we actually increased the back off and therefore need a log message.

Copy link
Contributor

@nlordell nlordell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Can we add some instrumentation to count how many times we get rate limited and how many times we don't send a request because of back-off? Would be a good way to measure just how much we get rate limited this way.

@MartinquaXD
Copy link
Contributor Author

MartinquaXD commented Apr 29, 2022

Can we add some instrumentation to count how many times we get rate limited and how many times we don't send a request because of back-off?

Makes sense. 👍
To not blow up the scope of this PR I will do it in a follow-up.
I opened an issue to not forget about it.

Comment on lines 73 to 74
let mut back_off = self.min_back_off;
for _ in 0..self.times_rate_limited {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use f64::powi or f64::powf instead of a loop. You can't get overflow panics with f64 because worst case it will stay at +INF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to avoid an overflow with Duration::mul_f64().
Being able to compute the correct factor to call mul_f64() only once wouldn't help us much if that would cause mul_f64() to panic because the result would overflow Duration.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a checked_mul_f64?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be like

let factor =  back_off_growth_factor.pow(times_rate_limited);
back_off *= factor;

I see now that Duration for some reason doesn't have a non panicking version of mul_f64 which is weird. In this case I feel it is reasonable to extract the f64 out of the duration, do the math, put it back.

Copy link
Contributor Author

@MartinquaXD MartinquaXD Apr 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there is only checked_mul and saturating_mul which both require a u32 as the argument.
I could have made the growth_factor argument an u32 but that felt overly restrictive.
But I don't feel strongly about that so I could change it if you'd like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I feel it is reasonable to extract the f64 out of the duration, do the math, put it back.

I hadn't considered that before. Good idea. 👍

@MartinquaXD MartinquaXD merged commit a9eb00a into main Apr 29, 2022
@MartinquaXD MartinquaXD deleted the paraswap-rate-limit branch April 29, 2022 10:41
@github-actions github-actions bot locked and limited conversation to collaborators Apr 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exponential Backoff on 429s For Price APIs
4 participants