Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create PyPI packages dataset #38

Open
nightlark opened this issue Jan 9, 2025 · 2 comments
Open

Create PyPI packages dataset #38

nightlark opened this issue Jan 9, 2025 · 2 comments
Assignees

Comments

@nightlark
Copy link
Collaborator

nightlark commented Jan 9, 2025

Question: how do we handle PyPI packages that have native libraries? e.g. numpy

Script to create PyPI dataset (converted into sqlite database with name of package on PyPI and "import names") is pushed to a branch -- need to figure out how to process the data.

@nightlark
Copy link
Collaborator Author

nightlark commented Jan 16, 2025

  • Any normalization/lowercase of import names?
    • Maybe already in the form they should be for importing? Mostly check to see if any platforms can import with arbitrary case while another one can't
    • casefold vs lowercase: German ẞ => ss?
  • Submodules that get installed?
    • odoo
  • Binary files in wheels?
    • More interesting for scanning a specific wheel/Python package installation, rather than capturing in dataset

@nightlark
Copy link
Collaborator Author

1/23 - using lowercase probably works; Windows treats the corner cases like German characters as separate paths (Windows is not doing casefolding for directory names)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants