-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should malware/spam/etc which gets removed from pypi generate advisories here? #45
Comments
Perhaps it should pass some sort of threshold for number of downloads or something to make it more worthwhile? |
@oliverchang I'm curious if there is precedent for this in other advisory dbs. |
I do think npm at least publishes advisories for malicious packages (I think theirs is entirely via GitHub's advisories now if I remember correctly?) https://github.com/advisories?page=5&query=malicious+ecosystem%3Anpm |
I think it's fair game to include these, and the reporting can re-use the existing infrastructure / tooling (i.e. pip-audit). As @westonsteimel mentioned, other vuln DBs like GHSA also track these. |
@oliverchang, I did attempt using the existing analysis on one of these (I think one of the ones from this jfrog article), and it does cause some failures because, of course, these packages have been removed from pypi, so when it attempts to extract versions from pypi project JSON it fails. I did notice we do appear to have the version info in the input pypi_versions.json from the BigQuery query though. |
Do we need some mechanism for "this entire project is malicious, regardless of version"? |
Hey all 👋 Just want to chime in on this in hopes of reviving the conversation and to share some of github's thinking. npm does indeed make a point of publishing advisories for malware packages. Those packages are also pulled and the namespace for the package is forever more dead. Pulling the packages prevents future exploitation of course and the alert is to inform users who have already downloaded the package in order to minimize an attacker's window of opportunity. So to that end I think it would 100% be valuable to publish malware takedown advisories.
It might be nice, but this can be achieved with uncapped version ranges in any advisory. eg. Something else I would like to ask which may be more controversial is that in the event a package is taken the namespace for that package also be taken down/reserved/be made never usable again. The rational here is so that these advisories need not be invalidated over time as new users re-use package names. |
Hey, thanks for that insight!
As great as this sounds I think it's not possible in practice for an ecosystem like Python that (currently) only has a single global namespace and has an open registration system for projects. A very common occurrence is that a legitimate maintainer publishes a source repo, and before they get a chance to publish this to PyPI, an attacker beats them to it and publishes a malicious version of that project name. We'd want to take down the project and inform people that a specific release was malicious, but we wouldn't want to block the legitimate maintainers from eventually publishing that name on PyPI. |
Totally fair. Certainly for one off malware I would not suggest burning the namespace, but maybe the specific version. Anyway, advisories on malware is very much a 👍 from me 😄 |
It's probably worth framing this in the context of this work 144K packages is a pretty wild number. The list exists though Looks like there are 7824 PyPI packages. I think in the case of malicious packages, there's not much to say other than "don't use this". It differs from a traditional security advisory that tends to need some additional details for the people on the receiving end. I think for something like GitHub or OSV, it should be trivial to create the data for this based off something as simple as a CSV file. One doesn't need a lot of details, just a list of bad versions. |
Which should authenticate these packages sources and their Git references so only the package maintainer could use his own repo (an idea by @jossef) . Regarding naming - that's of course a wider discussion. |
I suspect package to repo authentication is out of scope for this topic. I'd love to see it even if opt-in only, but for the moment it's not in place and malware advisories would be useful with or without it. @di let me know if there's anything I can do from the github side to help 👍 |
One comment re the info, a package might be malware, or specific versions might have malware (but older/newer versions are fine). So unless the package is removed, some source of data saying which ones are safe/unsafe to use is advisable. It doesn't necessarily have to live in the pypa database though. |
So for now, I think the key bit here is having a way for pypi to communicate the list of packages you shouldn't be getting from Pypi. While I think that eventually it likely should be a part of an API definition (perhaps as an alternative 'list packages' interface), given that Pypi and its mirrors are used often as the root, it makes sense to communicate at least what you shouldn't expect to be getting from Pypi and as an advisory for any metadata mirrors to at least have a stance on how they should react with packages are added to or removed from the pypi-removed list. I'd like to propose an idea here for a format, though I can understand if something more structured like json might be preferable: a So the process would be, in either a scheduled and/or on-change basis, a job would generate a new blocklist file from the existing DB table and if different from the existing file, make the pull request to the advisory database to update the file. What do others think? I'd be willing to spike a little work on it if this seems like a reasonable approximation. |
Reviving this topic a bit, I think it makes sense to have packages which have been removed from PyPI listed in this database. My thinking is:
I'm unsure how automated this process could be, but a PYSEC that contains the removed package name, versions (assuming that versions can't be re-used post-deletion, please check this assumption), and hashes of the files I think would be enough for pip-audit to detect a malicious package? |
I can unequivocally say the the GSD project (https://gsd.id) would like to do ID's for these issues, especially as we can properly tag them, currently they would get tagged as "concern": { But we can definitely look at adding a "malware" category (I suspect there are enough of these across multiple ecosystems to make it worth doing). We are also happy to support automation in order to get you GSD ID's quickly and easily, like we do for the Linux Kernel already (several thousand per year). |
Following from some discussion in pypi/warehouse#4703, do we think that packages removed from PyPI due to being classified as malware, etc should cause advisories to be generated here?
The text was updated successfully, but these errors were encountered: