Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub rate limit hit when running MFE image building in CI/CD pipelines #161

Closed
gabor-boros opened this issue Oct 30, 2023 · 17 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@gabor-boros
Copy link
Contributor

It seems that the tutor-mfe plugin uses the GitHub API 1 that is limited to 60 requests per hour for unauthenticated requests (and 5k for authenticated ones).

Adding credentials to these requests seems to require tutor-mfe PR to add a curl or wget command call instead of the ADD defined in the Dockerfile at the moment. Docker documentation says:

If your URL files are protected using authentication, you need to use RUN wget, RUN curl or use another tool from within the container as the ADD instruction does not support authentication. – reference

Also, this would require adding optional build arguments to the Dockerfile to set the credentials.

-- related forum discussion: https://discuss.openedx.org/t/tutor-mfe-build-github-api-authentication/11542

@gabor-boros
Copy link
Contributor Author

gabor-boros commented Oct 31, 2023

@regisb @arbrandes

In my opinion a possible solution could be (what I already mentioned in the issue description) to use RUN curl ... instead of ADD in the Dockerfile. We could have extra arguments for the Dockerfile to define the auth credentials for the call.

By default the auth credential argument could be empty, hence being backward compatible.

@arbrandes
Copy link
Collaborator

arbrandes commented Nov 1, 2023

Presuming curl or wget is already in the image, it could work. Note the implied doubt: we're trying to second-guess the caching mechanism by forcing the file to be re-fetched via ADD at every single run. I don't know if RUN will have the same effect - it pretty much doesn't for the git clone line immediately following.

To test the change, you have to re-run the build after pushing a single commit to the target branch. If the commit makes it into the build, it worked. If it didn't... it didn't.

@gabor-boros
Copy link
Contributor Author

@arbrandes It probably won't as the RUN command will be cached. The only way to work that around (and not disabling the cache) is to pass the commit reference directly to the Dockerfile instead of the branch name (i.e. do the api call right before building the image). That way we would still have the possibility to authenticate the API calls and to have proper layer caching.

FTR: I don't know how many Tutor plugins (or even the main Tutor repo) are using GitHub APIs like this, but the issue should be fixed at all occurrences.

@gabor-boros
Copy link
Contributor Author

@regisb & @arbrandes Would you mind I'm opening a PR to work this around using the method I mentioned above?

@regisb
Copy link
Contributor

regisb commented Nov 7, 2023

@gabor-boros the whole point of this ADD statement is that it is not cached. Cache for subsequent steps will be cleared if there was a change upstream. This feature has proved very useful in re-building the mfe image when we target a branch, and not a tag. Such a use case happens for instance when running on nightly, or building the candidate release branch (open-release/quince.master).

Please see this PR for more information: #131

I understand that hitting the GitHub rate limit is an issue, and we need to address that. Can we do that in a way that preserves this feature?

@regisb regisb added the bug Something isn't working label Nov 7, 2023
@gabor-boros
Copy link
Contributor Author

@regisb how about what I wrote above?

pass the commit reference directly to the Dockerfile instead of the branch name (i.e. do the api call right before building the image). That way we would still have the possibility to authenticate the API calls and to have proper layer caching.

@regisb
Copy link
Contributor

regisb commented Nov 7, 2023

I usually lean very strongly towards pinned dependencies, and this includes explicit commit sha1 or tags. But in the present case we do not have access to the commit sha1, precisely because we are building the image for a branch. This is unusual, but a useful feature nonetheless. Does that make sense? If not, maybe we can talk about it in a call.

@gabor-boros
Copy link
Contributor Author

[..] in the present case we do not have access to the commit sha1 [...]

@regisb the API response explicitly contains the commit SHA1. At least the call to https://api.github.com/repos/openedx/frontend-app-profile/git/refs/heads/master returns the following:

{
  "ref": "refs/heads/master",
  "node_id": "MDM6UmVmMTY1MTA2NDMwOnJlZnMvaGVhZHMvbWFzdGVy",
  "url": "https://api.github.com/repos/openedx/frontend-app-profile/git/refs/heads/master",
  "object": {
    "sha": "c6c5521ecf720dafe292874327716bf3b70ac7c0",
    "type": "commit",
    "url": "https://api.github.com/repos/openedx/frontend-app-profile/git/commits/c6c5521ecf720dafe292874327716bf3b70ac7c0"
  }
}

So the following would do the trick to retrieve the SHA1 from the API:

$ curl -s https://api.github.com/repos/openedx/frontend-app-profile/git/refs/heads/master | jq -r '.object.sha'
c6c5521ecf720dafe292874327716bf3b70ac7c0

Did I misunderstand something? If you feel a call would be beneficial, it is good for me too.

@regisb
Copy link
Contributor

regisb commented Nov 7, 2023

Just got out of a chat with @gabor-boros. Thanks for taking the time Gábor!

@arbrandes The ADD statement is causing two important issues:

  1. GitHub API rate limits during build.
  2. Breaks the build with private GitHub repositories.

For these reasons, I would lean towards the solution proposed by Gábor: move the API call outside of the build, run it on the host (only for openedx/* repos and open.release/*.master branches) and pass the git commit sha1 to the Docker build as an argument that would bust the cache.

Am I making sense? @gabor-boros would you like to open a PR or should we do it?

@gabor-boros
Copy link
Contributor Author

@regisb It was a really useful discussion I believe!

Am I making sense?

Absolutely, just one question (I probably missed something): Why only for openedx/* repos and open.release/*.master? Although that's not an issue yet, that wouldn't fix the second point: private GitHub repositories.

So if the private repo is github.com/example/frontend-app-something you wouldn't do an (optionally) authenticated call? How would you handle those cases?

@regisb
Copy link
Contributor

regisb commented Nov 7, 2023

Why only for openedx/* repos and open.release/*.master?

The goal would be to address the case of GitHub private repositories. Basically, we would use that caching feature only for open source repositories and branches.

So if the private repo is github.com/example/frontend-app-something you wouldn't do an (optionally) authenticated call?

That is correct.

How would you handle those cases?

I wouldn't. Private repositories would not benefit from the caching feature.

@gabor-boros
Copy link
Contributor Author

I wouldn't. Private repositories would not benefit from the caching feature.

@regisb Fair enough, though it would mean that the API calls can still be authenticated, right?

@regisb
Copy link
Contributor

regisb commented Nov 7, 2023

I wouldn't even try to make authenticated calls. It's just not worth the trouble. Would you still be affected by the rate limit though?

@gabor-boros
Copy link
Contributor Author

So you mean to build the image, it will be the responsibility of the caller to provide the commit sha1 as a build argument for the Dockerfile. Right? By caller, I mean the CI job or developer who is calling tutor MFE build using CLI.

If that's the case, then I guess that covers everyone's needs as authentication can be done when getting the commit sha1 from GitHub API.

@gabor-boros
Copy link
Contributor Author

would you like to open a PR or should we do it?

I'm happy to do it.

@regisb I'm back to work. Would you have some time tomorrow to sync about the exact solution to be on the same page before getting started on the implementation?

@regisb regisb self-assigned this Nov 14, 2023
@regisb regisb moved this from Backlog to In Progress in Tutor project management Nov 14, 2023
@regisb
Copy link
Contributor

regisb commented Nov 14, 2023

We agreed to change the refs dynamically in plugin.py, such that GitHub API calls are only made when we build the "open-release/*.master" or "master" branches from the openedx upstream repos. Thus, something like:

@MFE_APPS.add(priority=tutor_hooks.priorities.LOW)
def _add_mfe_apps_refs(apps: dict[str, MFE_ATTRS_TYPE]) -> dict[str, MFE_ATTRS_TYPE]:
    for mfe, app in apps.items():
        if app["repository"].startswith("https://github.com/openedx/") and app["version"] == "master":
            app["refs"] = app["repository"].replace("github.com", "api.github.com/repos") + "/git/refs/heads/" + app["version"]
    return apps

@regisb regisb removed their assignment Nov 14, 2023
regisb pushed a commit that referenced this issue Dec 8, 2023
See the extended conversation here:
#163

Close #161
@regisb
Copy link
Contributor

regisb commented Dec 9, 2023

Closed by c4ed70c

@regisb regisb closed this as completed Dec 9, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Tutor project management Dec 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

Successfully merging a pull request may close this issue.

3 participants