-
Notifications
You must be signed in to change notification settings - Fork 963
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Publish a list of malicious packages that have been taken down #4703
Comments
@di How do I find packages that have been taken down - from the database point of view? Is there any flag ( |
@waseem18 Nope, there isn't, so we would have to add that flag and manually infer it from the comments of previously removed packages. |
Okay, So we add the flag to the respective table and we set it to If I understand you correctly, as the data of already removed packages doesn't exist on our database, we would need to infer it from the Warehouse GH So after we add the flag, the API call would return any packages that are flagged as malicious + the list we have of already removed packages. Please do correct me if I'm wrong. |
There is a What I meant was that once we add the ability to mark a |
Gotcha! So we can add the And the API end point would query for the entities of |
Please be sure to provide the reason for each takedown case - e.g. DMCA request, government/security services involvement, somebody's whim, etc. |
This issue is only about malicious packages, which are taken down by the PyPI admistrators at their discretion. |
Is anyone working on this? I would like to work toward this during the Bloomberg Sprint. If nothing else, figuring out how this works/is exposed from Warehouse's side should be a good start. |
I'm not working on this @pradyunsg . Feel free to pick it up. |
Hey I will pick it up at the Bloomberg NYC Sprint! |
#4962 relates to this issue |
Blocked on #5117. |
Not necessarily, we manually remove malicious packages sometimes and the ability to automatically detect malicious packages shouldn't prevent us from publishing which packages we've manually taken down. |
#4962 mostly implements the first step towards this, but wasn't finished. |
Per #7840, this list should include all "blocked" packages along with the reason for blocking, if applicable. |
To be clear, you mean providing a publicly accessible list/table of all blocked packages and why they were blocked; and not changing/putting up new releases on that name. Correct? |
Thanks, my comment was unclear, updated to 'this list should include all "blocked" packages', I'm not suggesting we actually publish (create releases for) these packages. |
Is there a need for the flag in the database to distinguish between "blocked" reasons? As long as we're preserving PyPI admin discretion (which I agree with), it seems like that additional sort of information doesn't need to be exposed at this level. And in my understanding, that would simplify this down to an API and possibly a formatted page (though I'm not totally convinced) that would return the list of blocked names. So all of the previous PR is not needed. Though given #7840, perhaps we can also return a different status code for blocked names on install (rather than 404)? That would allow installers to handle an exceptional case directly, rather than having to maintain a list from our new API. Guessing this just needs someone to work on it? |
For anyone interested, my PR in #8533 works but is probably stalled on having a good path for the API. All the existing JSON APIs are under Happy to receive any suggestions either here or there. I don't have near enough insight into PyPI's routing design to make a confident decision myself. |
Probably blocked on #284. |
If we're going to wait for a complete API redesign and potential technology change, can we just manually dump the list of banned names into a public text file somewhere until that's ready? |
As far as I understand, PyPI still does not provide any reasonable way to check whether a package has been taken down (I hit a package name that is not listed on PyPI but prohibited). From the viewpoint of a package developer, this is not a good situation; the only way to check whether a package name was already taken by package name squatters and then taken down by admins is to try to squat the package name. |
@di what do you think about getting these feeding into PyPA Advisory Database? Then it would feed into OSV and anything else consuming those data sources. Of course it'd also be great to get pip-audit able to detect these. I realise that it would involve a few changes since I think currently everything depends on the pypi package JSON existing and it won't for these removed packages, but I think it'd be worth trying. I'm happy to have a go at getting a couple of initial advisory entries created and see what happens from there |
@westonsteimel I had the same thought. We could either do that, which is a bit circuitous but will work, or if we decided OSV or the Advisory Database is not the right place for these types of things, we could just take the easier route of including these in the Let's raise an issue at https://github.com/pypa/advisory-db to decide whether malware/spam/etc should generate advisories. |
Noting here that there's now an OSV database hosted by the OpenSSF that tracks this information: https://github.com/ossf/malicious-packages |
Also, note that that database only includes a fraction of the packages being taken down, there is currently no PyPI -> OSV link that populates that database. |
See here for an update on this topic: https://discuss.python.org/t/pypi-malware-observation-report-outcomes-private-preview/49060 |
What's the problem this feature will solve?
Users who may have possibly installed malicious packages don't have insight into what packages have been taken down by PyPI administrators.
Describe the solution you'd like
PyPI should publish both a human-readable and machine-readable (API) list of malicious packages that have been taken down. Ideally the human-readable list would be sortable by package name, or by the date it was created/taken down.
Additional context
Feature request to automatically uninstall packages via this API in
pip
: pypa/pip#5777The text was updated successfully, but these errors were encountered: