Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve identification of conda package files #4083

Open
mjherzog opened this issue Jan 9, 2025 · 4 comments
Open

Improve identification of conda package files #4083

mjherzog opened this issue Jan 9, 2025 · 4 comments

Comments

@mjherzog
Copy link
Member

mjherzog commented Jan 9, 2025

Working with SCTK v32.3.1 (running in SCIO v34.9.3), SCTK does not currently identify the installed files for a conda package in the Resources for_packages field. This data seems to be readily available in a set of .json files located under /conda-meta/ directory where conda is installed - typically opt/conda. The file names are in the format <package name>-<package-version>.json
This pattern is present for both Anaconda and miniconda distributions.

AyanSinhaMahapatra added a commit that referenced this issue Jan 13, 2025
Parse conda metadata JSON manifests and use the package data
and files information present to improve conda package assembly.

Reference: #4083
Signed-off-by: Ayan Sinha Mahapatra <[email protected]>
@AyanSinhaMahapatra
Copy link
Member

Before scanning a docker image: docker://continuumio/miniconda3 resulted in:

218 packages
388 dependencies
18355 files

  • 9213 files in a package
  • 9142 files not in a package

With #4089 above:

295 packages
388 dependencies
18355 files

  • 16469 files in a package
  • 1886 files not in a package

So we should do much better conda resource assigning with this PR merged and released to SCIO

@mjherzog
Copy link
Member Author

Excellent

@simrancharde
Copy link

simrancharde commented Jan 18, 2025

Before scanning a docker image: docker://continuumio/miniconda3 resulted in:

218 packages 388 dependencies 18355 files

  • 9213 files in a package
  • 9142 files not in a package

With #4089 above:

295 packages 388 dependencies 18355 files

  • 16469 files in a package
  • 1886 files not in a package

So we should do much better conda resource assigning with this PR merged and released to SCIO

how to solve #4083 issue? we need to add scanner.py fike for_package and then parse it in json file ? anyrhing else

@AyanSinhaMahapatra AyanSinhaMahapatra self-assigned this Jan 20, 2025
@AyanSinhaMahapatra
Copy link
Member

@simrancharde thanks for your interest, but this already has a fix at #4089, could you check out our open good first issues instead: https://github.com/aboutcode-org/scancode-toolkit/issues?q=is%3Aissue%20state%3Aopen%20label%3A%22good%20first%20issue%22, this is where we need help mostly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants