-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trusted publishing: possible race condition related to GH repo move #16843
Comments
Thanks for the detailed timeline @webknjaz!
Hmm, this part seems unlikely: there's no logic anywhere in Warehouse for transferring settings between different Trusted Publishers. That includes internal IDs, i.e. there's no copying of the user or repo IDs between publishers connected to the same project. I'll double check the deletion logic, though, and see if there's anything funky there. Another potential candidate here is inconsistency on GitHub's side during repo migrations: I'd be very surprised if it was the case, but it's possible that the initial TP registration for |
Just checked, and we always delete trusted publishers by unique ID, meaning that there should be no confusion/cross-contamination between publishers at that stage: warehouse/warehouse/manage/views/__init__.py Lines 1890 to 1961 in 9b82274
Could you check the project's event log on PyPI as well? I'm curious if it has any other hints/possible discrepancies. |
That's a good point. I wouldn't be surprised, OTOH, since we're regularly observing GitHub's eventual consistency causing surprising effects in large projects like Ansible.
If the owner ID is important to check, perhaps Warehouse could detect such cases and send email notifications to the project owners when such changes happen? And maybe reflect that in the security log? |
I was starting at that to compose the table posted above. I didn't include events of users being removed/re-added because of the project to org migration, since TP is bound to projects, not accounts. The last event preceding the move to the PyPI org was the successful sdist upload via TP 3 days prior to the move. And right after the move, there's removal of @bdraco (due to the #13558 bug). And re-adding him as a maintainer after the first TP config attempt (so he'd still show up on the public project page). I extended the table with the GH repo move time. |
That's the deletion code, though, my concern was about adding a new one while the previous TP exists: warehouse/warehouse/manage/views/__init__.py Lines 1446 to 1557 in 9b82274
|
Maybe, it'd be helpful to log warehouse/warehouse/oidc/forms/github.py Line 124 in 9b82274
|
Or extend warehouse/warehouse/manage/views/__init__.py Lines 1536 to 1545 in 9b82274
|
I'll think about it some more, but I think this will be difficult with the current architecture: lookups are currently abstracted/have a uniform interface across different TP providers, so we'd need to break that interface or add some kind of reporting knob to it in the GitHub case. But that may not be as bad as I'm thinking, especially if we don't do the reporting at the user level but instead at the Sentry level (since these should be very exceptional cases anyways).
Yeah, logging those in their associated event makes sense to me -- I'll think a bit about how to best do that in a way that doesn't break the genericity of the current events. |
I spent a bit more time trying to figure this out, and my best theory so far is that GitHub took the 'eventual' in 'eventually consistent' a little too literally here, and continued returning the old owner ID in their API ~40 minutes after the repo transfer. That's pretty much the only thing that could explain how the inconsistency happened here without any of the user-controlled TP configuration changing. I'll look into adding some Sentry or similar alerting, to see if we can catch it happening again. |
I would expect that your guess has high probability of being true, based on my experience with GitHub API, Apps, platform in general. In big repos, we observe inconsist states of GitHub that don't clear for weeks sometimes. |
Describe the bug
So there was a repository initially created at https://github.com/Bluetooth-Devices/propcache with trusted publishing configured and working. We then moved it to https://github.com/aio-libs/propcache, and so I had to create a new trust link. The project also got moved to the PyPI org before the trusted publishing reconfiguration.
What happened was that after these moves and reconfig, the CI pipeline that worked before stopped being able to publish.
I strongly suspect a race condition somewhere in the logic for adding a new trusted publisher.
So let me start with the timeline:
Publisher: GitHub
URL:
https://github.com/aio-libs/propcache
Workflow: ci-cd.yml
Publisher: GitHub
URL:
https://github.com/Bluetooth-Devices/propcache
Workflow: ci-cd.yml
Publisher: GitHub
URL:
https://github.com/aio-libs/propcache
Workflow: ci-cd.yml
Publisher: GitHub
URL:
https://github.com/aio-libs/propcache
Workflow: ci-cd.yml
Note
The log does not show this, but in all cased the GitHub Environment was set to
pypi
.Perhaps, this should be improved.
Both failures look like this (except for the tag number, which is irrelevant):
The trusted publisher configs show up exactly the same before and after re-creation.
My observation is that when I was reconfiguring the trust, I first created the new trust link, and only then deleted the old one. The suspicion is that the old link's properties got somehow transferred to the new trust config, causing PyPI to still expect the old GH repo URL.
But later, when I deleted the trusted publishing right away and created a new one, those properties weren't inherited, and it became possible to look it up with what OIDC includes.
Can this be related to the resurrection prevention logic? Could you check this @woodruffw? Also, perhaps you @di could find anything in the platform logs that isn't surfaced in the common UI?
The above explanation is the only idea I came up with. And if that's the case, this is a bug.
I'm starting to think that something like this might be happening when several GH repos are configured in the same project. This is something to check. But beyond this, I don't know how to narrow this down to a smaller repro...
Expected behavior
Migrating projects, followed by proper reconfiguration, shouldn't let the trust end up in an inconsistent state.
To Reproduce
🤷♂️ See the ideas above.
My Platform
GHA
Additional context
A few CI jobs are linked in the table above.
The text was updated successfully, but these errors were encountered: