Skip to content

Conversation

@prabhu
Copy link
Contributor

@prabhu prabhu commented Aug 23, 2025

First fix PR out of several.

Methodology

To arrive at the fixes, I trained up and utilised three separate LLMs. (Fixes are simply not possible with traditional linting and jsonschema validation tools.)

  1. Qwen3-Coder - The student model struggling to understand the PURL specification based on the current jsonschemas. (Spec bugs and ambiguities are some reasons behind them.)
  2. GLM-4.5 - Tasked with creating an extended version of the Package URL specification that will make it easy for all machine learning models to understand PURL.
  3. Gemini-2.5-Pro - Tasked with reviewing and identifying critical flaws in the enhancements proposed by GLM-4.5.

Human was involved in the loop constantly steering and restarting the conversation threads to reduce mistakes. The result is captured in this repo.

Once the enhanced version of PURL was created, Gemini-2.5-Pro was trained with this enhanced version and asked to fix the current version, strongly adhering to the current JSON schema specification.

This methodology section is fyi and will not be included in subsequent PRs.

Fixes for alpm

  • Fixed $id
  • alpm names could include characters such as @, ., _, +, -. Enhanced the note to make the percent-encoding obvious.
  • name and version have requirement=required
  • subpath is explicitly prohibited
  • Added extra distro and repository_url qualifiers. The note clearly recommends these two qualifiers but they weren't included in the definition section.
  • Improved examples

NOTE: creating purl4ml and each such PR is a time consuming activity. It will take a while for me to send all PRs due to my travel.

prabhu added 2 commits August 23, 2025 14:44
Signed-off-by: Prabhu Subramanian <[email protected]>
Signed-off-by: Prabhu Subramanian <[email protected]>
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. See some comments. This is not schema valid for now.

},
"name_definition": {
"note": "The name is the package name. It is not case sensitive and must be lowercased.",
"requirement": "required",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a property defined in the schema for name, because a purl without a name is mostly harmless in all cases, so if you run make conf && make check you will get:

$ make check
-> Validate JSON schemas
ok -- validation done
The following files were checked:
  schemas/purl-test.schema.json
  schemas/purl-type-definition.schema.json
  schemas/purl-types-index.schema.json
-> Validate JSON data files against the schemas
ok -- validation done
The following files were checked:
  purl-types-index.json
Schema validation errors were encountered.
  types/alpm-definition.json::$.qualifiers_definition[0]: Additional properties are not allowed ('examples' was unexpected)
  types/alpm-definition.json::$.qualifiers_definition[1]: Additional properties are not allowed ('examples' was unexpected)
  types/alpm-definition.json::$.qualifiers_definition[2]: Additional properties are not allowed ('examples' was unexpected)
make: *** [Makefile:53: checkjson] Error 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that part was schema-valid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If name is not mandatory we risk the proliferation of non actionable purls for given types. So some strictness is better. I consciously avoided regex, min lengths and other properties that was suggested by Gemini so had some filter but completely forgot to validate. Will fix the validation errors and push later this evening.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prabhu
Copy link
Contributor Author

prabhu commented Aug 23, 2025

Edited and pushed changes from my phone! Ran validation with codespaces.

@prabhu
Copy link
Contributor Author

prabhu commented Aug 23, 2025

All frontier models think that an alpm purl with a subpath is not valid! We are reinforcing the incorrect learning by only using tests with subpath: null. https://github.com/package-url/purl-spec/blob/main/tests/types/alpm-test.json

@prabhu prabhu marked this pull request as draft August 24, 2025 20:16
Copy link
Member

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some bits for your review.

{
"$schema": "https://packageurl.org/schemas/purl-type-definition.schema-1.0.json",
"$id": "https://packageurl.org/types/github-definition.json",
"$id": "https://packageurl.org/types/alpm-definition.json",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid and important change

},
"name_definition": {
"note": "The name is the package name. It is not case sensitive and must be lowercased.",
"requirement": "required",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

]
},
"version_definition": {
"requirement": "required",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Versions are never required, so this change must be removed.

"description": "The distribution name when using multiple distributions."
},
{
"key": "repository_url",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a standard qualifier so it not really needed here IMHO, but it does not hurt either... @johnmhoran what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants