Skip to content

Conversation

@artem-shelkovnikov
Copy link
Member

@artem-shelkovnikov artem-shelkovnikov commented Apr 14, 2025

Closes #2636

Closes https://github.com/elastic/search-team/issues/8624

This PR attempts to fix a problem with 403 returned from GitHub being interpreted as UNAUTHORIZED when in some cases it can mean that the secondary rate limit quota has been exceeded, see https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#exceeding-the-rate-limit and changesets/action#192

We use gidgethub package that allows to interact with Github with a chosen HTTP client. We use aiohttp so we pass the aiohttp client into gitgethub library:

    @cached_property
    def _get_client(self):
        return GitHubAPI( # this is `gidgethub`
            session=self._get_session, # this is aiohttp session
            requester="",
            base_url=self.base_url,
        )

    @cached_property
    def _get_session(self):
        connector = aiohttp.TCPConnector(ssl=self.ssl_ctx)
        timeout = aiohttp.ClientTimeout(total=None)
        return aiohttp.ClientSession(
            timeout=timeout,
            raise_for_status=True, # <<<<< this got changed to False
            connector=connector,
        )

Before this PR aiohttp.ClientSession was created with raise_for_status=True. This led to the problem that gidgethub would not be able to handle exceptions - their status handling is build on reading the requests/responses coming from Github, if raise_for_status=True is set then aiohttp would just bubble up.

Because of this, we've never actually received exceptions from gidgethub library at all. In this PR I've changed raise_for_status to False and that allowed gidgethub handle and wrap status errors. Therefore I had to change the code around to catch gidgethub.HTTPException instead of aiohttp.ClientResponseError.

gidgethub.RateLimitExceeded is a subclass of gidgethub.HTTPException so I had to move its except statements up.

I've checked the code for throwing gidgethub.RateLimitExceeded and it handles 403s:

            if status_code == 403:
                rate_limit = RateLimit.from_http(headers)
                if rate_limit and not rate_limit.remaining:
                    raise RateLimitExceeded(rate_limit, message)

Why we cannot handle it on our side:

aiohttp with raise_for_status=True will raise an instance aiohttp.ClientResponseError. This will cause the body of response to be lost - aiohttp.ClientResponseError does not store it and just returns a message that represents the status. See this SO question:

The newer versions unfortunately recycle the connection once raise_for_status() is called hence you can't fetch the error body later.

We could work around it, but to do so we need to make an HTTP call ourselves, which we cannot do - gidgethub does it.

So TLDR: I've swapped our exception handling fro handling aiohttp exceptions to handling gidgethub exceptions to be more precise and allow properly catching 403 that is a secondary rate-limit.

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana

@artem-shelkovnikov artem-shelkovnikov changed the title Attempt to retry 403 errors when they are actually secondary throttli… Attempt to retry 403 errors when they are actually secondary throttling limit Apr 14, 2025
@artem-shelkovnikov artem-shelkovnikov marked this pull request as ready for review April 18, 2025 15:44
@artem-shelkovnikov artem-shelkovnikov requested a review from a team as a code owner April 18, 2025 15:44
seanstory
seanstory previously approved these changes Apr 18, 2025
Copy link
Member

@seanstory seanstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@artem-shelkovnikov artem-shelkovnikov enabled auto-merge (squash) April 22, 2025 08:55
seanstory
seanstory previously approved these changes Apr 22, 2025
@artem-shelkovnikov artem-shelkovnikov merged commit 4b940dd into main Apr 22, 2025
2 checks passed
@artem-shelkovnikov artem-shelkovnikov deleted the artem/fix-github-403-throttling branch April 22, 2025 14:42
@github-actions
Copy link

💔 Failed to create backport PR(s)

Status Branch Result
9.0 Commit could not be cherrypicked due to conflicts
8.x Commit could not be cherrypicked due to conflicts
8.18 Commit could not be cherrypicked due to conflicts
8.17 Commit could not be cherrypicked due to conflicts

To backport manually run:
backport --pr 3358 --autoMerge --autoMergeMethod squash

artem-shelkovnikov added a commit that referenced this pull request Apr 22, 2025
…ng limit (#3358)

Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 4b940dd)

# Conflicts:
#	connectors/sources/github.py
artem-shelkovnikov added a commit that referenced this pull request Apr 22, 2025
…ng limit (#3358)

Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 4b940dd)

# Conflicts:
#	connectors/sources/github.py
artem-shelkovnikov added a commit that referenced this pull request Apr 22, 2025
…ng limit (#3358)

Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 4b940dd)

# Conflicts:
#	connectors/sources/github.py
artem-shelkovnikov added a commit that referenced this pull request Apr 22, 2025
…ng limit (#3358)

Co-authored-by: Elastic Machine <[email protected]>
(cherry picked from commit 4b940dd)

# Conflicts:
#	connectors/sources/github.py
@artem-shelkovnikov
Copy link
Member Author

💚 All backports created successfully

Status Branch Result
9.0
8.x
8.18
8.17

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation

artem-shelkovnikov added a commit that referenced this pull request Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Github] Error while checking for inaccessible repositories. Exception: 403 when trying to sync private repositories

4 participants