Skip to content

[vcpkg_from_github]: add support for fine-grained GH tokens#44241

Merged
BillyONeal merged 6 commits intomicrosoft:masterfrom
globusmedical:vcpkg_from_github/github-fine-grained-tokens
Apr 25, 2025
Merged

[vcpkg_from_github]: add support for fine-grained GH tokens#44241
BillyONeal merged 6 commits intomicrosoft:masterfrom
globusmedical:vcpkg_from_github/github-fine-grained-tokens

Conversation

@rressi-at-globus
Copy link
Contributor

@rressi-at-globus rressi-at-globus commented Mar 7, 2025

Description

The problem

With CMake utility function vcpkg_from_github:

With AUTHORIZATION_TOKEN we can pass the GitHub API token to access private repositories.

The current implementation is invoking curl like this:

curl --fail \
     --location "https://github.com/${GH_OWNRER}/${GH_REPO}/archive/${GH_REF}.tar.gz" \
     --output "${GH_REF}.tar.gz" \
     --header "Authorization: token ${GH_TOKEN}"

The problem is that the used endpoint seems to not work well with fine-grained tokens:

  • either it doesn't support them at all
  • either it support them, but not with SSO authorization

The solution

This solution does the following:

  • add the CMake argument option USE_TARBALL_API.
  • if set to yes use an alternate endpoint to perform the checkout.
  • otherwise it uses the same endpoint than before.

The alternative andpoint looks like the following one:

curl --fail \
     --location "https://api.github.com/repos/${GH_OWNRER}/${GH_REPO}/tarball/${GH_REF}" \
     --output "${GH_REF}.tar.gz" \
     --header "Authorization: token ${GH_TOKEN}"

Notes

The tarball downloaded with the new API endpoiunt:

  • is compatible with the older one
  • but it is not binary identical.

As a consecuence of that:

  • the value for parameter SHA512 need to be updated.
  • which prevents us to just switch to the new API without adding the new CMake option argument.

@jimwang118 jimwang118 self-assigned this Mar 10, 2025
@jimwang118 jimwang118 added the category:tool-update The issue is with build tool or build script, which requires update or should be executed correctly label Mar 10, 2025
@rressi-at-globus
Copy link
Contributor Author

We are facing failures with x64_linux but it is not clear to us the reason for it:

SUMMARY FOR x64-linux
  SUCCEEDED: 2101
  BUILD_FAILED: 1
REGRESSIONS:
REGRESSION: proxygen:x64-linux failed with BUILD_FAILED. If expected, add proxygen:x64-linux=fail to /mnt/vss/_work/1/s/scripts/azure-pipelines/../ci.baseline.txt.
/mnt/vss/_work/1/s/scripts/azure-pipelines/test-modified-ports.ps1 : vcpkg ci testing failed; this is usually a bug in a port. Check for failure logs attached to the run in Azure Pipelines.
At /mnt/vss/_work/_temp/azureclitaskscript1741602799652_inlinescript.ps1:9 char:1
+ & scripts/azure-pipelines/test-modified-ports.ps1 -Triplet x64-linux  …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
+ FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,test-modified-ports.ps1

@jimwang118 do you have some tips for us so that we can resolve this blocking issue?

A build with a previous, which were functionally identical, had much more failures:

  • maybe we are facing ephemeral failures here?

@dg0yt
Copy link
Contributor

dg0yt commented Mar 11, 2025

proxygen baseline regression: #44232

@rressi-at-globus
Copy link
Contributor Author

After merging master into this port, the build is still failing because of proxygen on x64_linux:

SUMMARY FOR x64-linux
  SUCCEEDED: 2348
  BUILD_FAILED: 1
REGRESSIONS:
REGRESSION: proxygen:x64-linux failed with BUILD_FAILED. If expected, add proxygen:x64-linux=fail to /vcpkg/scripts/azure-pipelines/../ci.baseline.txt.
Write-Error: vcpkg ci testing failed; this is usually a bug in a port. Check for failure logs attached to the run in Azure Pipelines.

function(vcpkg_from_github)
cmake_parse_arguments(PARSE_ARGV 0 "arg"
""
"USE_TARBALL_API"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem correct for every port to need to be explicitly told to do this.

  1. Is the tarball we get substantially different or should we apply this change all the time?
  2. Alternately, should this be based on something else being set, like there being a token in one of the other values?

Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • We could detect the fact that we have one of these new tokens by parsing the token string.
  • The problem is that the hash code of those tarballs is different. So doing that would break backward compatibility.

I am 100% open for alterantive solutions.

The name USE_TARBALL_API relates to the implementation and not to the purpose.

Do you have better ideas?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, USE_FINE_GRAINED_ACCESS_TOKENS, the feature is documented here:

The fact is that is the tarball API that produces tarballs whith a slightly different content, but their content are 100% equivalent after unpacking them with VCPKG's CMake code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some digging on this and it looks like the token goes in a header rather than the URI? https://docs.github.com/en/rest/repos/contents?apiVersion=2022-11-28#download-a-repository-archive-tar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Yes, the token goes in the HTTP header, wasn't the case also before?
  • If not we could maybe fix the older GitHub-API call and keep the backward compatibility.

Copy link
Contributor Author

@rressi-at-globus rressi-at-globus Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also digging a little bit more on the API we use currently here:

Binary stability by the current API is guaranteed only on the tarball package after the decompression (maybe you are already expecting that, not sure):

The exact compression settings used to generate a zipball or tarball may change over time. The extracted contents won't change if the branch or tag doesn't change, but the outer compressed archive may have a different byte layout. GitHub will give at least six months' notice before changing compression settings.

And late in the same page, the owners of this API are suggesting to rather use the other endpoint I'm introducing with this API:

If you rely on stability of source code archives for reproducibility (ensuring you always get identical files inside the archive), we recommend using the archives REST API with a commit ID for :ref. Using the commit ID ensures you'll always get the same file contents inside the archive and you’ll be immune to repositories rewriting tags or moving branch heads.

With these 2 things in mind, I would suggest you to define a deprecation strategy for the older endpoint.

And we can introduce with this PR the first step of this strategy where we add the new feature without breaking all of the existing looking forward for the migration.

But it is, of course, up to you.

Whatever solution you decide, what we need is just supporting fine-grainded tokens and keep our fork of this really nice project as close as possible with upstream.

So far it is almost identical.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that on the REST API, there are rate limits. Unauthenticated users might quickly hit the limit of currently 60 requests per hour.
https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28#primary-rate-limit-for-unauthenticated-users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This probably means we could use this new API only when we have a token.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And late in the same page, the owners of this API are suggesting to rather use the other endpoint I'm introducing with this API:

I think they mean using a commit rather than a head or tag.

@PhoebeHui PhoebeHui assigned Cheney-W and unassigned jimwang118 Apr 2, 2025
@rressi-at-globus rressi-at-globus requested a review from dg0yt April 14, 2025 21:26
@BillyONeal
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@BillyONeal BillyONeal added the info:needs-maintainer-attention Lets the current 'on rotation' vcpkg maintainer know they need to look at this. label Apr 22, 2025
@BillyONeal
Copy link
Member

Thanks for the new feature! I submitted a documentation change here: MicrosoftDocs/vcpkg-docs#476

@BillyONeal BillyONeal merged commit 11972bd into microsoft:master Apr 25, 2025
18 checks passed
vicroms pushed a commit to MicrosoftDocs/vcpkg-docs that referenced this pull request May 1, 2025
* [vcpkg_from_github] Document USE_TARBALL_API.

See microsoft/vcpkg#44241

* Fix #use_tarball_api
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category:tool-update The issue is with build tool or build script, which requires update or should be executed correctly info:needs-maintainer-attention Lets the current 'on rotation' vcpkg maintainer know they need to look at this.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants