Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema for defer_call contains invalid JSON Schema, causing some schema validators to fail #1474

Closed
sirosen opened this issue Jan 17, 2024 · 2 comments · Fixed by #1476
Closed
Labels
area: json schema Changes related to the JSON schema.

Comments

@sirosen
Copy link
Contributor

sirosen commented Jan 17, 2024

This was encountered and reported against check-jsonschema (a JSON Schema CLI): python-jsonschema/check-jsonschema#376

That bug report covers two issues:

  1. there is invalid/malformed data in the taskfile JSON Schema
  2. the python-jsonschema projects do not handle the malformed data with a clear error

I'm pursuing (2) independently. Regarding (1), here's the malformed JSON Schema data:

"anyOf": [
"string",
{

"string" is not a valid subschema. Possibly this was meant to be {"type": "string"} (?) but it's not obvious.

Aside: Fixing the structure of definitions

The current schema has all definitions nested under an intermediate key, i.e. definitions/3/foo rather than definitions/foo. That weakens the ability of downstream validation of the definitions section of the schema.

definitions is defined under the Draft 7 metaschema as an object whose values are all, themselves, valid schemas.
The current structure of the schema means that this is validating #/definitions/3 rather than #/definitions/defer_call (which would be much more useful).

It would be nice to move all of the definitions up one level so that metaschema validation can catch issues like this.
(Shameless plug: check-jsonschema --check-metaschema schema.json can do this for you. 🙂 )

Under the latest draft definition of $defs, this kind of structure is more clearly spelled out, using "MUST"s to clarify that a schema which doesn't follow this structure is invalid. Under the Draft 7 spec, I think it's pretty clear that this was the intent based on the metaschema, but it wasn't called out explicitly.


I have no direct experience with task, so I might not be the right person to modify the schema appropriately.

But I'm happy to submit one or two PRs to

  • remove the errant "string" in defer_call
  • remove the 3 from definitions nesting (If this is versioning, we could rename all of the defs to end with v3, as in env -> env_v3)

Just let me know how to proceed!

@task-bot task-bot added the state: needs triage Waiting to be triaged by a maintainer. label Jan 17, 2024
@pd93
Copy link
Member

pd93 commented Jan 17, 2024

Hi @sirosen. Thanks for the report. This is actually a duplicate of #1471, reported just a few days before, but since there is a bit more detail here, I'll keep this issue open and close the other one.

Our schema is definitely in need of a bit of TLC. It started as a 3rd-party contribution to the schemastore repo and was brought in-house not too long ago. I can't speak for @andreynering, but besides the small bits of work I've done to maintain it here, I'm not hugely experienced with the details of JSON Schema, so any recommendations/advice you have for cleaning it up are very welcome - PRs are also very welcome.

"string" is not a valid subschema. Possibly this was meant to be {"type": "string"} (?) but it's not obvious.

defer can be defined in one of two ways (docs):

cmds:
  - defer: echo "foo"

or

cmds:
  - defer: { task: "print-foo" }

so I believe your assertion that it should in fact be {"type": "string"} is correct.

The current schema has all definitions nested under an intermediate key, i.e. definitions/3/foo rather than definitions/foo. That weakens the ability of downstream validation of the definitions section of the schema.

If this is versioning, we could rename all of the defs to end with v3, as in env -> env_v3

This was originally intended for versioning yes. However, we are not currently maintaining any other versions in the schema and we've actually just dropped support for v2 schemas (good timing), so I'm personally happy to accept a PR that drops this intermediate key. We can always revisit this if we need to support multiple versions again in the future.

definitions is defined under the Draft 7 metaschema as an object whose values are all, themselves, valid schemas. The current structure of the schema means that this is validating #/definitions/3 rather than #/definitions/defer_call (which would be much more useful).

It would be nice to move all of the definitions up one level so that metaschema validation can catch issues like this. (Shameless plug: check-jsonschema --check-metaschema schema.json can do this for you. 🙂 )

Under the latest draft definition of $defs, this kind of structure is more clearly spelled out, using "MUST"s to clarify that a schema which doesn't follow this structure is invalid. Under the Draft 7 spec, I think it's pretty clear that this was the intent based on the metaschema, but it wasn't called out explicitly.

This is really useful info. I have a couple of questions if you don't mind:

  1. You mentioned draft 7 and the latest draft (2020-12). Is there any reason to update our schema (I'm mainly thinking about the $schema key at the top) and more importantly, is there any reason we shouldn't?
  2. Regarding your shameless plug. I'd actually like to set up some meta schema validation in our CI to catch issues like this in the future before they get merged and your tool seems pretty useful for this. Is there a GitHub action for it or do you have a workflow template we could use?

But I'm happy to submit one or two PRs

Please feel free. If you have any Task-specific questions, feel free to reach out here on over on our Discord.

@pd93 pd93 added type: bug Something not working as intended. area: json schema Changes related to the JSON schema. and removed state: needs triage Waiting to be triaged by a maintainer. labels Jan 17, 2024
@sirosen
Copy link
Contributor Author

sirosen commented Jan 17, 2024

Thanks for the reply! Let me work up a (single) PR with the improvements in the next day or two and we can take it from there.

And sorry for not seeing the earlier issue!

I have a couple of questions if you don't mind

I'll do my best! As a small caveat, I wouldn't consider myself an expert in JSON Schema -- maybe an "intermediate user". My tool is a frontend for an implementation by a true expert. 😄

  1. You mentioned draft 7 and the latest draft (2020-12). Is there any reason to update our schema (I'm mainly thinking about the $schema key at the top) and more importantly, is there any reason we shouldn't?

There isn't strong consensus about this, as there are pros and cons. I would advise keeping it at draft 7 at least until you have more time to consider these notes. But probably I would just stick with draft 7 until you have a reason to use something newer.

  • Some implementations may reject/error on drafts which they don't recognize as "too new". Draft 7 is old enough that any reasonable validator supports it. But the 2019 and 2020 drafts have been slow to arrive in all of the implementations, so you may lose support for certain clients if you use them. I would want to research some mainstream implementations and think about supported use cases before using anything newer than Draft 7.

  • The newer drafts do have some stricter validation, but JSON Schema itself is defined to be extensible in such a way that the metaschemas will never be able to catch all of the "obvious errors" which a human author would recognize as invalid. So you might get some improved validation of the schema, but I don't think it's enough to justify a change.

  • There are some cleanups and improvements to the spec in the newer versions, in terms of specificity in corner cases and spec clarity/documentation. So if you're writing something new, there's value in using the newer drafts for your experience as an author.

It's all a rolling version situation and everyone is left on their own to decide what to support. I generally think the new drafts are nicer to use, but you'll need to weigh that against what clients you would drop by using the 2019 draft. In my own job/life I haven't had to think very hard about this, so it's possible that there are significant benefits to the newer drafts that I don't know about.

  1. Regarding your shameless plug. I'd actually like to set up some meta schema validation in our CI to catch issues like this in the future before they get merged and your tool seems pretty useful for this. Is there a GitHub action for it or do you have a workflow template we could use?

I don't have a template or GitHub Action/Workflow, but since it's a CLI app I can provide a quick primer on installing it and then I could offer up a PR or draft PR to add it.

check-jsonschema started its life as a pre-commit hook (naming is hard! 😉 ), so CI and workflow integration is very much part of what it's meant for. If you're familiar with pre-commit, you could set it up for this repo, but I'll assume not and/or that you don't want to require a Python toolchain for contributors.

First, for local usage, if you care:

long-winded thing about how to install a Python CLI application

The main distribution channel is pypi.org (the Python Package Index), so if you have Python background, you can pick it up with pip install check-jsonschema. Homebrew has a recipe too, so there's brew install check-jsonschema.

If you aren't a Python developer, I'd very, very strongly recommend using pipx for the install. (I've been singing its praises at work for years, and have been happy with the effects on everyone's workflows, including some less technical personnel.) The pipx install docs should hopefully cover you. The idea with pipx is that it will manage virtualenvs in ~/.local/pipx/, so you get isolated installs of, primarily, CLI applications.

Python currently/still has a pretty weak story around building single-file binaries, and I haven't been able to invest in trying some of the build tools which address this. So for now, pipx install check-jsonschema is the best experience I can offer -- once you get pipx, that is. 😄


Second, for use in GitHub Actions:

GitHub provides actions for setting up python environments, and I believe that the latest worker images ship with pipx preinstalled. So I'm pretty sure (read: 70% sure) you can add a script step which does

pipx install check-jsonschema
check-jsonschema --check-metaschema foo/bar/schema.json

but I haven't tested this out yet. Let me try and see what the worker images provide (I'll do this soon).

@pd93 pd93 removed the type: bug Something not working as intended. label Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: json schema Changes related to the JSON schema.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants