Skip to content

Add on-demand package data collection for PyPI #468

@pombredanne

Description

@pombredanne

We want to enable the collect/ endpoint for Pypi to avoid failure to collect a pypi PURL like pkg:pypi/[email protected] :

{
  "status": "cannot fetch Package data for pkg:pypi/[email protected]: no available handler"
}

We need a handler that will be similar to the npm handler:

Later also support a PURL without version to collect all the versions.

There are few twists:

A) there are multiple packages for one version, and we need to use the file_name qualifier for each of the multiple sdist and wheels and create all packages, one for each file.

For instance, https://pypi.org/project/lxml/5.3.1/#files has 100's of wheels and we should create one for each like with lxml-5.3.1-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl: pkg:pypi/[email protected]?file_name=lxml-5.3.1-pp310-pypy310_pp73-manylinux_2_28_aarch64.whl

B) because of this we likely have weird data in the main PurlDB with invalid PURLs because we were not that before. The package set may not work OK with these qualifiers

Also we have code we can reuse for this:

The "legacy" code can likely be left as-is to create a new https://github.com/aboutcode-org/purldb/blob/main/minecode/collectors/pypi.py module, copying select parts as needed.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Validated

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions