-
-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design and use a proper "universal" and unique package identifier #805
Comments
The naming conventions set forth by Grafeas are at least partially in use by Grafeas partners. |
@R2wenD2 Thank you for chiming in! Any pointer to some public or open source reference? or additional details you can share? Any documentation beyond your readme? |
@mnonnenmacher @sschuberth ping, as this is a topic that is of interest you based on this discussion aboutcode-org/aboutcode#6 (comment) |
The best public pointer I can share are jFrog xray component identifiers It has small differences from Grafeas (docker and generic files), but the others match. Does that help? We're using the identifiers suggested by the package manager where applicable and just prefixing with info about which package manager has specified them. I believe the open question here is about what we use to specify the package manager. What is currently missing from the Grafeas list that you'd like to see? |
@pombredanne FYI, the Grafeas spec is at https://github.com/Grafeas/Grafeas |
@pombredanne Also, right now we use the tuple |
@sschuberth this approach works well for Maven and NPMs but may not be work for other package managers/formats/identifiers. Some remarks:
So one thing is if we could make everything fit in a schema with a fixed number of segments OR have a variable number of segments depending on the package managers/format/repo technology. I tend to think the later is more flexible and provide more resilience to changes for the future. |
I tend to agree with @pombredanne that the latter is more resilient and preferred. |
Correct. We simply use an empty thing in such cases.
|
@R2wenD2 you wrote:
Thank you for the xray pointer! Anyone there that you could ping?
At least Rubygems, Composer, Golang and CPAN for a start and many more ;) I can suggest/contribute conventions back to Grafeas FWIW |
Yes, our 'ecosystem' is basically a 'package manager'. |
@pombredanne we don't have something similar to So really it needs to be something like For our purposes, especially considering Go (which has no registry), we've been treating the fully qualified url to the canonical package page as the unique identifier, although some package managers (like Bower) don't have individual urls for packages, so we just make one up from the registry domain. |
@andrew Thanks! you wrote:
That's a clean and mostly universal approach too! But it does not always convey what a package "format" would be pointed to by a given URL unless this is for well known registries? @R2wenD2 this brings up a possible ambiguities in the grafeas/xray approach:
It does not include a notion of which repo/registry this packages lives in and furthermore each package name/version may have more than one "artifact" e.g. an sdist and many wheels for Python or an mri and jruby gem for Ruby, etc. So in some cases you identify actual exact files or stack of files and in some other cases you identify some pointers to the primary public package repo for this format. This may be OK, but this may be also a source of confusion? |
@chen-keinan re #805 (comment) I may assume wrongly that you may be involved with Jfrog xrays? or not? |
@elad165 Should be able to help here. |
@elad165 any feedback? |
So here is my proposal for ScanCode at least and ABC Data in general: Now about the parts of a package identifier: 1. First is a part to identify what is called ecosystem (openshitf), package_manager (here.com ORT), package_type (ass today in ScanCode) or URL scheme (in the Grafeas or Jfrog/xray URLs). The name used does not matter much, but what this means matters: it captures in a short string a lot of info:
There is no best attribute name to capture all this, but the meaning is clear enough. Each
2. After the 3. Next, we have 4. And then we have 5. The last important part is the package " 6. I left aside anything about content-based identifiers (e.g. checksums): this is a separate and solved topic that does not need much discussion IMHO... though I like to be able to identify a plain file that is effectively a package using only a checksum like in xray (with a " So in recap the package id is either a string, or discrete fields using this convention:
|
And here is a short summary: I propose we use five parts/fields to identify a package:
Most are optional and can also be composed in some URL as in: The exact format to use for his URL is not fully specified yet. And we add an extra field to point to an alternative package |
Here is a revised design: A package identifier is defined by six parts or fields that form a hierarchy, from the least specific to the most specific identifying information:
At the minimum a type and a name are required. A package identifier is either discrete fields or a URL string using these conventions:
For instance:
The string would UTF-8 encoded, with percent-encoding were needed https://en.wikipedia.org/wiki/Percent-encoding with these rules for each part:
The parsing approach would be:
And to get an exact download, we either provide an optional registry base url if this is not on the standard public registry and/or an optional full direct download URL. I cannot think of any case I know of that would not work with this approach. |
* Add new PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers * Add OrderedDictType for schematics * Remove unused Package methods for versions Signed-off-by: Philippe Ombredanne <[email protected]>
* Support PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers Signed-off-by: Philippe Ombredanne <[email protected]>
* Add new PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers * Add OrderedDictType for schematics * Remove unused Package methods for versions Signed-off-by: Philippe Ombredanne <[email protected]>
* Support PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
ok, I did not receive much pushback on the details, so I am assuming this is a good thing! Aesthetically, the // looks much better I guess! I have a Python implementation in the #275 branch here: It should be trivial to have a Go or Ruby or JS implementation @R2wenD2 I would like to also contribute this spec to Grafeas FWIW. |
Can you add a proposal issue to Grafeas? Feel free to copy your design proposal above - I just want to make sure folks interested in Grafeas have a chance to review. |
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <[email protected]>
* Add new PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers * Add OrderedDictType for schematics * Remove unused Package methods for versions Signed-off-by: Philippe Ombredanne <[email protected]>
* Support PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <[email protected]>
* Add new PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers * Add OrderedDictType for schematics * Remove unused Package methods for versions Signed-off-by: Philippe Ombredanne <[email protected]>
* Support PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <[email protected]>
* Add new PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers * Add OrderedDictType for schematics * Remove unused Package methods for versions Signed-off-by: Philippe Ombredanne <[email protected]>
* Support PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <[email protected]>
* Add new PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers * Add OrderedDictType for schematics * Remove unused Package methods for versions Signed-off-by: Philippe Ombredanne <[email protected]>
* Support PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <[email protected]>
* Add new PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers * Add OrderedDictType for schematics * Remove unused Package methods for versions Signed-off-by: Philippe Ombredanne <[email protected]>
* Support PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <[email protected]>
* Add new PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers * Add OrderedDictType for schematics * Remove unused Package methods for versions Signed-off-by: Philippe Ombredanne <[email protected]>
* Support PackageIdentifier class for #805 as Package property and as discrete type:namespace/name@version?qualifiers#path fields * Improved DependentPackage definitions using a package idenitifier and simpler flags. Do not use a mapping per scope anymore. * Improve related packages definitions with a PackageRelationship class using from/to package identifiers Signed-off-by: Philippe Ombredanne <[email protected]>
Signed-off-by: Philippe Ombredanne <[email protected]>
* This is a first rough implmentation using https://github.com/package-url/packageurl-python * Based on package-url/purl-spec#1 Signed-off-by: Philippe Ombredanne <[email protected]>
Package URL is now implemented alright in develop and works well. Next step is to call the prul spec as a 1.0 |
I am closing this now. The Package URL lives its own life now at https://github.com/package-url ... and is heavily used in ScanCode and other places. |
We need a proper package id that is universal and unique: the difficulty is that each package management technology uses more or less parts in an identifiers beyond basic name+version: Maven GAVs, RPM NEVRA, etc.
A simple solution is to have a single string ID with a prefix that describes what this is about and have a variable number of slash or colon-separated segments URN/URI-style such as used in:
This should probably be defined in ABC Data and with aboutcode-org/aboutcode#6
The text was updated successfully, but these errors were encountered: