Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: generate YAML schema for config #701

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

Nerixyz
Copy link

@Nerixyz Nerixyz commented Oct 11, 2024

This PR extends generate-config-info.py to generate a YAML/JSON schema for the config file. As explained in the issue, this can be used to provide autocompletion and linting. The "YAML schema" is just a JSON schema.

Currently, the schema is committed to the repo. I'm not sure if that's desired. The main upside of this is, that users can link to the schema using (raw) GitHub links. This allows linking to tags, but most importantly it allows linking to specific commits. However, there's now duplicate information in the repo. CI ensures the schema is always up-to-date, though.

Alternatively, this could be done as part of the Antora build (probably, I'm not that familiar with it).

Closes #700.

Copy link
Collaborator

@alandefreitas alandefreitas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's very interesting. I'll research how IDEs integrate that and include it in the documentation. Is this a standard format for validating the fields, or are there competing formats? It's the same format as the schema for vcpkg manifest files, right?

Currently, the schema is committed to the repo. I'm not sure if that's desired. The main upside of this is, that users can link to the schema using (raw) GitHub links. This allows linking to tags, but most importantly it allows linking to specific commits. However, there's now duplicate information in the repo. CI ensures the schema is always up-to-date, though.

Yes, that's the first thing that came to mind. And I'm still trying to figure it out. If we follow the pattern of other artifacts from generate-config-info.py, then it wouldn't be checked in. It could be added as an extra file in the releases.

However, we ensure these other artifacts from generate-config-info.py are always correct because the code that uses them still has to compile and pass the tests. We can't guarantee the same for mrdocs.schema.json. Is there a way to check that the resulting file is at least valid? Some tool or something?

Or should the original config.json file follow the specification directly at some point in the future? Is there a way to associate categories with options in the specification?

Unless there's evidence a specific design is much better, we can keep the file directly in the repository unless there's a lot of evidence this is a bad idea.

Alternatively, this could be done as part of the Antora build (probably, I'm not that familiar with it).

Yes. I would rule that out.

@@ -383,6 +383,10 @@ jobs:
package-generators: ${{ matrix.mrdocs-package-generators }}
package-artifact: false

# If this fails, the config was updated without updating the schema
- name: Check YAML schema
run: git --no-pager diff --exit-code -- docs/mrdocs.schema.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry for my ignorance. How will this command know if the config JSON file changed?

As far as I understand, this command checks for changes only in the docs/mrdocs.schema.json file and compares it to the last committed version. But this file not changing is not an error by itself.

Also, if I understand correctly, docs/mrdocs.schema.json only contains a subset of the config file so not all changes in the config file require changes in docs/mrdocs.schema.json. For instance, if some field changes from string to path, docs/mrdocs.schema.json will not change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that generate-config-info.py runs during the build and updates docs/mrdocs.schema.json. Then, after the build, CI checks if the file was updated (i.e. the schema changed, but wasn't committed).

Copy link
Collaborator

@alandefreitas alandefreitas Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, OK. So, it returns 0 if the contents are the same, and changing the contents is always an error because it should have been committed.

In CI, we're also having an issue with checkout not constantly downloading the source files from the repository. This is fixable by setting up git first.

Or we could remove this test since it does not test much—only that the file has been touched.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CI, we're also having an issue with checkout not constantly downloading the source files from the repository. This is fixable by setting up git first.

Or we could remove this test since it does not test much—only that the file has been touched.

Then it's easier to have a separate script, and remove the test from CI, I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... Yes. We don't strictly need a separate script, but that's the best idea (Unless the logic for the schema reuses a lot of the logic for generating the C++ files but I doubt that.) because the CMake scripts would be generating a file we don't need to compile the library.

I'm looking at https://www.schemastore.org/json/, and they suggest a MegaLinter action to check files. It could replace the whole step. The only problem is this MegaLinter is way more than we need.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at schemastore.org/json, and they suggest a MegaLinter action to check files. It could replace the whole step. The only problem is this MegaLinter is way more than we need.

ajv can validate YAML files. We use it at Chatterino to validate JSON files in CI. Here, the command-line would be

npx ajv validate -s mrdocs.schema.json -d mrdocs.yml

Through this, it also validates the schema. Seems it doesn't like https:// in the schema, but expects http://.

Copy link
Collaborator

@alandefreitas alandefreitas Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This could make a better GHA step here. Using npx makes it all much more manageable.

However, we want to validate mrdocs.schema.json against the meta schema https://json-schema.org/draft/2020-12/schema.

What's essential for now is this PR can succeed in CI.

@@ -12,6 +12,8 @@ include::partial$mrdocs-example.yml[]

The xref:usage.adoc[Usage] page provides a detailed explanation of what to combine options from the configuration file and the command line.
The <<config-options-reference>> section provides a detailed explanation of the options available.
To get linting and autocompletion in the config file, the https://github.com/redhat-developer/yaml-language-server[YAML language server] can be used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves a section including where the file can be found, the specification, etc...

If I understand correctly, as the file follows that json spec, many IDEs support this format even without the user having to know about anything.

The most important/actionable information is that the user needs to put that comment at the top of the file.

@Nerixyz
Copy link
Author

Nerixyz commented Oct 12, 2024

Is this a standard format for validating the fields, or are there competing formats?

JSON-schema is the de-facto standard for many config-languages (i.e. not just JSON), since it's so versatile. In addition to YAML, it's also used for TOML (e.g. in taplo - schema for Cargo.toml).

It's the same format as the schema for vcpkg manifest files, right?

Yes, in JSON the standard is to specify a "$schema". At least on Microsoft's documentation, they guide users to use the schema from GitHub.

We can't guarantee the same for mrdocs.schema.json. Is there a way to check that the resulting file is at least valid? Some tool or something?

That's mostly what I'm trying to do with the check in CI. An alternative could be to have a separate script, run that one in CI, and check if the output changed (with Git).

Or should the original config.json file follow the specification directly at some point in the future? Is there a way to associate categories with options in the specification?

Unfortunately, there's no way of having categories for options in a JSON schema. Furthermore, the current config.json has a few options that specify "command-line-only".

@@ -0,0 +1,255 @@
{
"$schema": "https://json-schema.org/draft-07/schema",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, we could update to the new draft. Most schemas I saw were using older drafts, but I don't think that should be a problem.

@alandefreitas
Copy link
Collaborator

alandefreitas commented Oct 13, 2024

That's mostly what I'm trying to do with the check-in CI. An alternative could be to have a separate script, run that one in CI, and check if the output changed (with Git).

I see. Unless there's a tool to check if the file is valid, checking if the file has been touched or changed does not provide much additional value. We could remove the test for now.

I considered using the meta schema to check if the resulting file matches it. But we can go with what we have for now. Whoever is pushing a new commit should check if the file changed. I still believe it's possible to use the metaschema of the specification.

I'll at least create an issue. I saw this list of tools, but I haven't investigated it yet.

Unfortunately, there's no way of having categories for options in a JSON schema. Furthermore, the current config.json has a few options that specify "command-line-only".

I see. Some fields could contain extra information, so using the same file for both things could reduce redundancy and implicitly ensure we're testing the file. That's unfortunate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Provide Config Schema
2 participants