Skip to content

feat: rework validation schema options#216

Merged
mmarseu merged 12 commits intomainfrom
exclusive-validate-schema-options
Aug 8, 2024
Merged

feat: rework validation schema options#216
mmarseu merged 12 commits intomainfrom
exclusive-validate-schema-options

Conversation

@mmarseu
Copy link
Copy Markdown
Collaborator

@mmarseu mmarseu commented Jul 1, 2024

The mutual exclusivity implemented through argparse's add_mutually_exclusive_group() method has a subtle and undocumented bug (or call it an unexpected behavior, if you like) that affects options with default values.
If an option with a default value is part of such a group, the option doesn't count as "present on the command-line" if the user explicitly passed its default value.

So in our case, --schema-type and --schema-path were meant to be mutually exclusive but --schema-type has a default value of default. So, this invocation correctly raises an error:

cdx-ev validate --schema-type custom --schema-path myschema.json bom.json

but this invocation does not:

cdx-ev validate --schema-type default --schema-path myschema.json bom.json

because for options with default values, argparse can't tell the difference between

  • "the option isn't there", and
  • "the option has been specified with its default value".

This PR fixes this minor problem by forgoing argparse's built-in default mechanism and replacing it with a hand-coded fallback to the default value in the invoke_validate() function.

It includes a few other minor housekeeping changes:

  • Update help text for both related command-line options
  • Delete unused schema files
  • Add strict schema type to documentation
  • Make --schema-path option a proper Path-typed object
  • Refactor schema loading code

@mmarseu mmarseu requested a review from italvi July 1, 2024 14:24
@mmarseu mmarseu self-assigned this Jul 1, 2024
@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request unittests labels Jul 1, 2024
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Jul 1, 2024

Coverage

Coverage Report •
FileStmtsMissCoverMissing
__main__.py3212093%217–218, 235, 245, 655–656, 660–665, 667, 670, 680–683, 687, 854
error.py230100% 
validator
   helper.py43295%36, 38
   validate.py94792%35, 56, 75–76, 115, 131, 166
TOTAL16979694% 

Tests Skipped Failures Errors Time
299 2 💤 0 ❌ 0 🔥 5.860s ⏱️

@mmarseu mmarseu mentioned this pull request Jul 2, 2024
5 tasks
Copy link
Copy Markdown
Collaborator

@italvi italvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmarseu some observations from my side.

Further: If I provide a JSON-file, e.g. an SBOM as schema-path the program crashes with:
image

Maybe a sanity check, e.g. with checking if something like $schema": "http://json-schema.org/draft-07/schema#" is within the provided schema (without the specific schema, so only json-schema.org), is a good idea here?

Comment thread cdxev/__main__.py
Comment thread cdxev/validator/validate.py Outdated
Comment thread cdxev/validator/validate.py Outdated
Comment thread cdxev/validator/validate.py
Comment thread docs/available_commands.md
@mmarseu
Copy link
Copy Markdown
Collaborator Author

mmarseu commented Aug 6, 2024

@mmarseu some observations from my side.

Further: If I provide a JSON-file, e.g. an SBOM as schema-path the program crashes with: image

Maybe a sanity check, e.g. with checking if something like $schema": "http://json-schema.org/draft-07/schema#" is within the provided schema (without the specific schema, so only json-schema.org), is a good idea here?

This must be what you meant when you said you found an error that has always been there. I was dumb enough to promise I'd fix it 🙄
Anyways, it's done. User-provided schema are now validated using the proper jsonschema function.

For some reason I started getting a whole lot of mypy errors in validate.py unrelated to my changes so I had to fix those, too.
There is a single new mypy ignore comment which is due to an error in the types-jsonschema package. If I can get python/typeshed#12484 merged, we can remove it.

@mmarseu mmarseu requested a review from italvi August 6, 2024 13:25
Copy link
Copy Markdown
Collaborator

@italvi italvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mmarseu after some testing, I found two "bugs", where I think this one is the most critical one: if the JSON-Schema is e.g. an SBOM without dependencies-array or just an empty JSON, i.e. {}, the validation is successful as jsonschema fallbacks to Draft7Validator if no $schema is found in the provided schema.

But as this is something for the lib itself, I approve the MR. The only thing I could think of: Add a disclaimer in the documentation to be aware of this behavior. But I won't insist on this one, so I leave the decision to you: Merge or change - I can approve it again 😉

Just for documentation purposes: The other bug was that the check for the mutual exclusive usage of schema-type and schema-path only works if a valid choice for schema-type is provided, otherwise an error is thrown that schema-type is not one of (default, custom, strict). But this is an "issue" with argparse.

@mmarseu
Copy link
Copy Markdown
Collaborator Author

mmarseu commented Aug 8, 2024

@mmarseu after some testing, I found two "bugs", where I think this one is the most critical one: if the JSON-Schema is e.g. an SBOM without dependencies-array or just an empty JSON, i.e. {}, the validation is successful as jsonschema fallbacks to Draft7Validator if no $schema is found in the provided schema.

But as this is something for the lib itself,

That's not even a problem of the library, that's just the way JSON Schema is specified. A completely empty JSON object is a valid schema according to the spec. Also, there are no restrictions on additional properties, you can put anything in a schema (probably to support custom extensions).
So, basically, almost any JSON object could be a valid schema. The only exception are properties with the same names as something found in the JSON Schema meta schema but which don't validate against it — for instance a field named dependencies which exists in both JSON Schema and CycloneDX.

I approve the MR. The only thing I could think of: Add a disclaimer in the documentation to be aware of this behavior. But I won't insist on this one, so I leave the decision to you: Merge or change - I can approve it again 😉

MERGE!!!

Just for documentation purposes: The other bug was that the check for the mutual exclusive usage of schema-type and schema-path only works if a valid choice for schema-type is provided, otherwise an error is thrown that schema-type is not one of (default, custom, strict). But this is an "issue" with argparse.

argparse is far from perfect. That may be of the areas where it could be slightly improved. On the other hand, I imagine it could be very hard to separate between multiple errors in the CLI where each one is legitimate in their own right and simple follow-up errors where you really only have to fix the first and the rest falls into place.

@mmarseu mmarseu merged commit cf7a671 into main Aug 8, 2024
@mmarseu mmarseu deleted the exclusive-validate-schema-options branch August 8, 2024 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request settings_changes unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants