-
-
Notifications
You must be signed in to change notification settings - Fork 36
Description
We want to enable the collect/ endpoint for Pypi to avoid failure to collect a pypi PURL like pkg:pypi/[email protected]
:
{
"status": "cannot fetch Package data for pkg:pypi/[email protected]: no available handler"
}
We need a handler that will be similar to the npm handler:
- https://github.com/aboutcode-org/purldb/blob/main/minecode/collectors/npm.py
The overall process: - for a PURL with a version, fetch JSON from the Pypi API. For instance the URL https://pypi.org/project/Jinja2/3.1.5/ has a JSON counterpart at https://pypi.org/pypi/Jinja2/3.1.5/json or https://pypi.org/pypi/Jinja2/json for any version
- then map to a ScanCode TK packagedcode object and finally push to be saved in the DB and scanned in the queue.
Later also support a PURL without version to collect all the versions.
There are few twists:
A) there are multiple packages for one version, and we need to use the file_name qualifier for each of the multiple sdist and wheels and create all packages, one for each file.
For instance, https://pypi.org/project/lxml/5.3.1/#files has 100's of wheels and we should create one for each like with lxml-5.3.1-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
: pkg:pypi/[email protected]?file_name=lxml-5.3.1-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl
B) because of this we likely have weird data in the main PurlDB with invalid PURLs because we were not that before. The package set may not work OK with these qualifiers
Also we have code we can reuse for this:
purldb/minecode/miners/pypi.py
Line 106 in b257482
class PypiPackageReleaseVisitor(HttpJsonVisitor): - the main interesting code is at
purldb/minecode/miners/pypi.py
Line 146 in b257482
class PypiPackageMapper(Mapper): - and the tests at https://github.com/aboutcode-org/purldb/blob/main/minecode/tests/miners/test_pypi.py
The "legacy" code can likely be left as-is to create a new https://github.com/aboutcode-org/purldb/blob/main/minecode/collectors/pypi.py module, copying select parts as needed.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status