[vcpkg_from_github]: add support for fine-grained GH tokens#44241
Conversation
|
We are facing failures with @jimwang118 do you have some tips for us so that we can resolve this blocking issue? A build with a previous, which were functionally identical, had much more failures:
|
|
proxygen baseline regression: #44232 |
|
After merging |
| function(vcpkg_from_github) | ||
| cmake_parse_arguments(PARSE_ARGV 0 "arg" | ||
| "" | ||
| "USE_TARBALL_API" |
There was a problem hiding this comment.
It doesn't seem correct for every port to need to be explicitly told to do this.
- Is the tarball we get substantially different or should we apply this change all the time?
- Alternately, should this be based on something else being set, like there being a token in one of the other values?
Thanks!
There was a problem hiding this comment.
- We could detect the fact that we have one of these new tokens by parsing the token string.
- The problem is that the hash code of those tarballs is different. So doing that would break backward compatibility.
I am 100% open for alterantive solutions.
The name USE_TARBALL_API relates to the implementation and not to the purpose.
Do you have better ideas?
There was a problem hiding this comment.
Maybe, USE_FINE_GRAINED_ACCESS_TOKENS, the feature is documented here:
The fact is that is the tarball API that produces tarballs whith a slightly different content, but their content are 100% equivalent after unpacking them with VCPKG's CMake code.
There was a problem hiding this comment.
I did some digging on this and it looks like the token goes in a header rather than the URI? https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28#download-a-repository-archive-tar
There was a problem hiding this comment.
- Yes, the token goes in the HTTP header, wasn't the case also before?
- If not we could maybe fix the older GitHub-API call and keep the backward compatibility.
There was a problem hiding this comment.
I'm also digging a little bit more on the API we use currently here:
Binary stability by the current API is guaranteed only on the tarball package after the decompression (maybe you are already expecting that, not sure):
The exact compression settings used to generate a zipball or tarball may change over time. The extracted contents won't change if the branch or tag doesn't change, but the outer compressed archive may have a different byte layout. GitHub will give at least six months' notice before changing compression settings.
And late in the same page, the owners of this API are suggesting to rather use the other endpoint I'm introducing with this API:
If you rely on stability of source code archives for reproducibility (ensuring you always get identical files inside the archive), we recommend using the archives REST API with a commit ID for :ref. Using the commit ID ensures you'll always get the same file contents inside the archive and you’ll be immune to repositories rewriting tags or moving branch heads.
With these 2 things in mind, I would suggest you to define a deprecation strategy for the older endpoint.
And we can introduce with this PR the first step of this strategy where we add the new feature without breaking all of the existing looking forward for the migration.
But it is, of course, up to you.
Whatever solution you decide, what we need is just supporting fine-grainded tokens and keep our fork of this really nice project as close as possible with upstream.
So far it is almost identical.
There was a problem hiding this comment.
Note that on the REST API, there are rate limits. Unauthenticated users might quickly hit the limit of currently 60 requests per hour.
https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#primary-rate-limit-for-unauthenticated-users
There was a problem hiding this comment.
- This probably means we could use this new API only when we have a token.
There was a problem hiding this comment.
And late in the same page, the owners of this API are suggesting to rather use the other endpoint I'm introducing with this API:
I think they mean using a commit rather than a head or tag.
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Thanks for the new feature! I submitted a documentation change here: MicrosoftDocs/vcpkg-docs#476 |
* [vcpkg_from_github] Document USE_TARBALL_API. See microsoft/vcpkg#44241 * Fix #use_tarball_api
Description
The problem
With CMake utility function
vcpkg_from_github:With
AUTHORIZATION_TOKENwe can pass the GitHub API token to access private repositories.The current implementation is invoking
curllike this:curl --fail \ --location "https://github.com/${GH_OWNRER}/${GH_REPO}/archive/${GH_REF}.tar.gz" \ --output "${GH_REF}.tar.gz" \ --header "Authorization: token ${GH_TOKEN}"The problem is that the used endpoint seems to not work well with fine-grained tokens:
The solution
This solution does the following:
USE_TARBALL_API.The alternative andpoint looks like the following one:
curl --fail \ --location "https://api.github.com/repos/${GH_OWNRER}/${GH_REPO}/tarball/${GH_REF}" \ --output "${GH_REF}.tar.gz" \ --header "Authorization: token ${GH_TOKEN}"Notes
The tarball downloaded with the new API endpoiunt:
As a consecuence of that:
SHA512need to be updated.