Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept 429 response codes as an indication of the secondary rate limit being exceeded #1895

Merged
merged 8 commits into from
Aug 18, 2024

Conversation

holly-cummins
Copy link
Contributor

Resolves #1842.
Resolves #1805.

Description

If a user uses this library to drive the GitHub API hard, there will quite often be failures caused by secondary rate limits being breached (#1842 and #1805). There’s been some discussion of headers on #1842, and also of pluggable 'handle the rate limit' modules, but I think there’s a more fundamental problem with the handling of rate limits. (The good news is this means there's a simpler solution, too.)

The AbuseLimitHandler checks for a rate limit violation by looking for a 403 header and some headers:



@Override
boolean isError(@Nonnull GitHubConnectorResponse connectorResponse) {
    return isForbidden(connectorResponse) && hasRetryOrLimitHeader(connectorResponse);
}

HTTP_FORBIDDEN is a 403. The GitHub docs indicate that limits from GitHub are signalled with either 403 or TOO_MANY_REQUESTS responses (429). In my experience, I've only seen 429s.

What about headers?

The GitHub docs docs suggest that any headers are optional, and that in the absence of headers the code should just wait a minute (which, luckily, is what the current implementation already does):

If the retry-after response header is present, you should not retry your request until after that many seconds has elapsed. If the x-ratelimit-remaining header is 0, you should not retry your request until after the time, in UTC epoch seconds, specified by the x-ratelimit-reset header. Otherwise, wait for at least one minute before retrying. If your request continues to fail due to a secondary rate limit, wait for an exponentially increasing amount of time between retries, and throw an error after a specific number of retries.

Based on these docs, I think it's sufficient to count a 429 as a rate limit response, even if headers are missing. Semantically, a 429 is an indication of an abuse detection/rate limit scenario, and should be handled as such.

What about the rate limit handler?

GitHubRateLimitHandler also requires a 403 to 'count' a response as rate limited:

@Override
boolean isError(@NotNull GitHubConnectorResponse connectorResponse) throws IOException {
connectorResponse.statusCode() == HttpURLConnection.HTTP_FORBIDDEN
            && "0".equals(connectorResponse.header("X-RateLimit-Remaining"));

I didn't update this one, since the secondary rate limit checks are now correctly being handled on the abuse path with my changes. It might be worth updating it, or also even considering whether the distinction between the abuse and rate limit paths is so fuzzy that having both is just maintenance overhead. But that’s a different discussion.

Before submitting a PR:

  • [ X] Changes must not break binary backwards compatibility. If you are unclear on how to make the change you think is needed while maintaining backward compatibility, CONTRIBUTING.md for details.
  • [ X] Add JavaDocs and other comments explaining the behavior.
  • When adding or updating methods that fetch entities, add @link JavaDoc entries to the relevant documentation on https://docs.github.com/en/rest .
  • Add tests that cover any added or changed code. This generally requires capturing snapshot test data. See CONTRIBUTING.md for details.
  • Run mvn -D enable-ci clean install site locally. If this command doesn't succeed, your change will not pass CI.
  • [ X] Push your changes to a branch other than main. You will create your PR from that branch.

When creating a PR:

  • Fill in the "Description" above with clear summary of the changes. This includes:
  • [X ] All lines of new code should be covered by tests as reported by code coverage. Any lines that are not covered must have PR comments explaining why they cannot be covered. For example, "Reaching this particular exception is hard and is not a particular common scenario."
  • Enable "Allow edits from maintainers".

@holly-cummins holly-cummins changed the title Accept 429 as an indication of the rate limit Accept 429 as an indication of the secondary rate limit being exceeded Jul 4, 2024
@holly-cummins holly-cummins changed the title Accept 429 as an indication of the secondary rate limit being exceeded Accept 429 response codes as an indication of the secondary rate limit being exceeded Jul 4, 2024
Copy link
Member

@bitwiseman bitwiseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a definite improvement.

It still needs tests. Look at the existing abuse limit tests, which use manually constructed test data. Maybe just add one more step to the existing test rather than adding a whole new test. Up to you.

@@ -89,7 +89,7 @@ public void onError(IOException e, HttpURLConnection uc) throws IOException {
private long parseWaitTime(HttpURLConnection uc) {
String v = uc.getHeaderField("Retry-After");
if (v == null)
return 60 * 1000; // can't tell, return 1 min
return 61 * 1000; // can't tell, return 1 min plus 1 second for luck
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a functional reason for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub docs say "Otherwise, wait for at least one minute before retrying." I never know if the check is < 60s or <=60s, but it seems like waiting something that is unambiguously more than a minute is safest.

Copy link

codecov bot commented Jul 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.36%. Comparing base (7db2ec0) to head (caa5167).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1895      +/-   ##
============================================
+ Coverage     81.32%   81.36%   +0.03%     
- Complexity     2458     2463       +5     
============================================
  Files           239      239              
  Lines          7464     7469       +5     
  Branches        400      401       +1     
============================================
+ Hits           6070     6077       +7     
+ Misses         1147     1146       -1     
+ Partials        247      246       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@holly-cummins
Copy link
Contributor Author

Oh good, I'm glad it looks sensible. I've added some tests and squashed.

@bitwiseman
Copy link
Member

Please don't squash/force-push. We can squash when merging the PR.

@bitwiseman bitwiseman self-requested a review July 17, 2024 06:41
Copy link
Member

@bitwiseman bitwiseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failure needs fixing.

@holly-cummins
Copy link
Contributor Author

Test failure needs fixing.

Oops, sorry about that. And noted about the squashing - different orgs have different requirements, so I have a squash habit, but I'll keep the commits separate. I agree it does make subsequent reviews easier!

…ere retry-after is a date, and light refactoring
@holly-cummins
Copy link
Contributor Author

The CI's failing because I didn't have enough coverage, even with the new test. To get the parseWaitTime method to be testable, I had to do a bit of refactoring to make it visible to the test code. I also added a new test to cover both paths, and then I realised that the original code could fail with a NumberFormatException, because Retry-After can be a date. So I added new code and tests to handle that case. I also did a bit of refactoring of the test code and removing assertions about the mock that I was pretty sure we definitely didn't care about.

Copy link
Member

@bitwiseman bitwiseman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow response cycle.

} catch (NumberFormatException nfe) {
// The retry-after header could be a number in seconds, or an http-date
ZonedDateTime zdt = ZonedDateTime.parse(v, DateTimeFormatter.RFC_1123_DATE_TIME);
return ChronoUnit.MILLIS.between(ZonedDateTime.now(), zdt);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHubClient has a method to get Date (or Instant).
https://github.com/hub4j/github-api/blob/main/src/main/java/org/kohsuke/github/GitHubClient.java#L916

Unfortunately, diff from local machine time is not reliable. The local machine may not be synchronized with the server time. We have to use the diff from the response time.
https://github.com/hub4j/github-api/blob/main/src/main/java/org/kohsuke/github/GHRateLimit.java#L543-L571

However, this PR is a huge step forward and I'd rather have this less than perfect code added than wait for the next release.

I'll approve and file an issue to fix this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! (And thanks for explaining about the diff to the response time, that makes a lot of sense.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing. If you have time to circle back and address #1909 , that would be awesome. Thanks for the great contribution!

src/main/java/org/kohsuke/github/AbuseLimitHandler.java Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants