Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github Repository Search resulting in Secondary Rate Limit #1842

Closed
vikas-maersk opened this issue May 2, 2024 · 7 comments · Fixed by #1895
Closed

Github Repository Search resulting in Secondary Rate Limit #1842

vikas-maersk opened this issue May 2, 2024 · 7 comments · Fixed by #1895
Labels

Comments

@vikas-maersk
Copy link

vikas-maersk commented May 2, 2024

Describe the bug
Getting the below error when searching for repositories with a keyword that returns a lot of repositories in the result. My usecase can be fulfilled by just getting say top 50 results, but I do not see any way to do that on the server side but only post getting the results.

org.kohsuke.github.HttpException: {
  "documentation_url": "https://docs.github.com/free-pro-team@latest/rest/overview/rate-limits-for-the-rest-api#about-secondary-rate-limits",
  "message": "You have exceeded a secondary rate limit. Please wait a few minutes before you try again. If you reach out to GitHub Support for help, please include the request ID ...."
}

I was trying the search for repositories within my org and keywords that matched even around 850 repositories were hitting this rate limit. The github documentation quotes a much larger number.

I even tried the in:name and in:displayName within the query but even that did not seem to help.

To Reproduce
Steps to reproduce the behavior:

new GitHubBuilder().withAppInstallationToken(token).build()
                .searchRepositories()
                .org(orgName)
                .q(repoName)
                .list();

Just pass in a repo name like "test" or something that results in a lot of repository results in the above code.

Expected Behaviour
It should return the repositories or else provide a clear reason of why the error. Or else there should be a way to limit the search results on the GitHub side itself.

Additional Info
Version: org.kohsuke:github-api:1.316

@vikas-maersk
Copy link
Author

I was able to achieve this using:

new GitHubBuilder().withAppInstallationToken(token).build()
                .searchRepositories()
                .org(orgName)
                .q(query)
                .list()
                .withPageSize(appConfig.getGithubRepositorySearchLimit());

And then instead of calling the toList on the PagedSearchIterable, I iterated the result using nextPage, like this:

repositoriesByNameQuery.iterator().nextPage();

@bitwiseman
Copy link
Member

Reopening as this is an interesting case for Secondary rate limits.

@bitwiseman bitwiseman reopened this May 18, 2024
@bitwiseman bitwiseman added the bug label Jun 6, 2024
@holly-cummins
Copy link
Contributor

See also #1805

@bitwiseman
Copy link
Member

This docs page describes secondary rate limit behavior:
https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#about-secondary-rate-limits

As of this reading it says:

You may encounter a secondary rate limit if you:

  • Make too many concurrent requests. No more than 100 concurrent requests are allowed. This limit is shared across the REST API and GraphQL API.
  • Make too many requests to a single endpoint per minute. No more than 900 points per minute are allowed for REST API endpoints, and no more than 2,000 points per minute are allowed for the GraphQL API endpoint. For more information about points, see "Calculating points for the secondary rate limit."
  • Make too many requests per minute. No more than 90 seconds of CPU time per 60 seconds of real time is allowed. No more than 60 seconds of this CPU time may be for the GraphQL API. You can roughly estimate the CPU time by measuring the total response time for your API requests.
  • Create too much content on GitHub in a short amount of time. In general, no more than 80 content-generating requests per minute and no more than 500 content-generating requests per hour are allowed. Some endpoints have lower content creation limits. Content creation limits include actions taken on the GitHub web interface as well as via the REST API and GraphQL API.

These secondary rate limits are subject to change without notice. You may also encounter a secondary rate limit for undisclosed reasons.

These are incredibly loosely defined guides and you cannot query for them ahead of time. 👎 It looks like we need to take the path some users have suggested and make rate limiting much more resilient, potentially allowing users to write their own rate limit strategies for handling secondary rate limits.

@holly-cummins
Copy link
Contributor

From my perspective, I'd be happy with a default strategy which just detects the error and sleeps. In the worst case, detection could be based on pattern matching to the "message": "You have exceeded a secondary rate limit. Please wait a few minutes before you try again.." message in the response. For the sleep, a minute's sleep, perhaps with an exponential backoff, seems reasonable.

That's more or less what I'll have to implement in the client, but it would be convenient if it were implemented centrally. A customisable strategy might be a nice-to-have on top of that, but I think "go slow, but make the request succeed" is probably what most clients would want. Or maybe that's an assumption I'm making?

@bitwiseman
Copy link
Member

bitwiseman commented Jun 21, 2024

@holly-cummins
The waiting should work as of the latest release v1.322 via the fix to AbuseLimit. Please give it a try.

Unfortunately, GitHub has previously made it clear that it is the responsibility of clients toby well-behaved. Triggering the secondary rate limit too often may eventually get accounts or apps flagged for "abuse", resulting in even longer waits.

The problem is "good behavior" is poorly defined and subject to change without notice. "Go slow" sure, But how slow? They have a metric, but they expect the client to track it, rather than reporting it like they do with the primary rate limit.

@holly-cummins
Copy link
Contributor

holly-cummins commented Jun 21, 2024

@holly-cummins The waiting should work as of the latest release v1.322 via the fix to AbuseLimit. Please give it a try.

I was really excited to try 1.322, but am still seeing fairly regular failures because of the rate limit even with 1.322.

I have some node.js code that uses the GitHub API, and I did manage to work around the secondary errors with promise-retry and an exponential backoff. It doesn't translate directly, because it's javascript, but it gives me confidence in the principle. I agree that exponential backoff is pretty crude compared to the clear 'try after this time' guidance that you get when the main rate limiter is triggered, and it does carry the risk of waiting too long, or not waiting long enough, and having to wait even longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants