Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema Publish workflow #4162

Merged
merged 5 commits into from
Nov 7, 2024
Merged

Schema Publish workflow #4162

merged 5 commits into from
Nov 7, 2024

Conversation

ralfhandl
Copy link
Contributor

@ralfhandl ralfhandl commented Oct 25, 2024

Follow-up to

Replace WORK-IN-PROGRESS placeholders with date of last change of YAML source:

  • if meta changes, republish all four files
  • if dialect changes, republish it, schema, and schema_base
  • if schema changes, republish it and schema_base
  • if schema_base changes, republish it
  • new date stamp for each schema is the maximum date of last commit in the dependency chain:
flowchart LR
    schema_base
    schema
    dialect
    meta
    schema --> |default| dialect
    schema_base --> |$ref| schema
    schema_base --> |$ref| dialect
    dialect --> |$ref| meta
Loading
  • GitHub workflow
  • bash script called from workflow
  • Dependencies between schema files
  • Use systematic source schema names

Example PR created by this workflow

@ralfhandl ralfhandl requested review from a team as code owners October 25, 2024 13:56
@ralfhandl ralfhandl added Housekeeping script Pull requests that update Bash or JavaScript code labels Oct 26, 2024
Copy link
Member

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused because this seems to get triggered for any commit to the main branch rather than filtering on the schemas, and it does not seem to check that the $GITHUB_SHA commit includes at least one schema either.

But I'm not very familiar with either github actions or bash scripting, so what am I missing here?

if [ "$lastCommitDate" \> "$maxDate" ]; then
maxDate=$lastCommitDate
fi
datesHash["$schema"]=$maxDate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want each datesHash["$schema"] to have $schema's date, which would be $lastCommitDate and not $maxDate? It's not clear to me what $maxDate is intended to do here.

Copy link
Contributor Author

@ralfhandl ralfhandl Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schemas are visited in the order of their dependency:

  • If meta changes, dialect needs to be updated. If dialect itself has changed, its new date is the maximum of the two change dates.
  • If dialect changes, schema needs to be updated. If schema itself has changed, its new date is the maximum of the two change dates.
  • If schema changes, schema_base needs to be updated. If schema_base itself has changed, its new date is the maximum of the two change dates.

This is what the loop & $maxDate do because we currently have a linear dependency chain. If we get more files and non-linear dependencies, this has to be revisited.

@ralfhandl
Copy link
Contributor Author

this seems to get triggered for any commit to the main branch rather than filtering on the schemas

This action needs to run if

  • one of the YAML files in folder schemas changes,
  • scripts/schema-publish.sh changes,
  • scripts/yaml2json/yaml2json.js changes, or
  • .github/workflows/schema-publish.yaml changes

We could do that with a paths filter in addition to the branches filter, but then the TSC collective mind needs to remember changing this filter if these dependencies change.

Given that the workflow only creates a pull request for publishing the schemas if it actually produces changes compared to gh-pages, I think it is safer and more future-proof to run the workflow on any change to main without a paths filter.

@ralfhandl
Copy link
Contributor Author

it does not seem to check that the $GITHUB_SHA commit includes at least one schema

According to GitHub Docs on the push and workflow_dispatch events GITHUB_SHA is

Tip commit pushed to the ref

or

Last commit on the GITHUB_REF branch or tag

This seems to indicate that if I

  1. change schema.yaml and commit
  2. change README.md and commit
  3. push

Then the GITHUB_SHA will be the commit changing README.md.

It seems safer to refresh the JSON files based on the date of the last change to the corresponding YAML file and its dependent YAML files.

@handrews
Copy link
Member

@ralfhandl

It seems safer to refresh the JSON files based on the date of the last change to the corresponding YAML file and its dependent YAML files.

Yes if we're not filtering the trigger then we need to filter the commits.

@ralfhandl
Copy link
Contributor Author

Yes if we're not filtering the trigger then we need to filter the commits.

We use the date of the last commit for each file to produce its relevant date, in combination with the new dates of all files it references.

The workflow then creates a pull request for publishing all new/changed/renamed/deleted files.

Git figures out which files actually need to be in the pull request, and if there are none, no pull request is created.

This is the same mechanism we use for publishing the spec HTML files.

@handrews
Copy link
Member

@ralfhandl sorry for not finishing up with this yesterday- I was trying to figure out the code and suggest some comments to put in it because I get what you're saying but for some reason I am finding the code really hard to follow. I think the name lastDate is throwing me because while I get why you called it that ,it does not behave like I expect a variable with that name to behave. I expected it to accumulate the very last date across all and then use it, but perhaps you meant "last" in the sense of "previous"?

@ralfhandl
Copy link
Contributor Author

Changed the name to something hopefully less confusing and added a few comments.

Copy link
Member

@handrews handrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I find this easier to follow (and I now realize that my previous comment was misleading as I was confused by maxDate more than lastDate), but I'm finding it hard to come up with an improvement so I suppose I'll just stop standing in the way. I don't know why I find this so baffling.

@ralfhandl ralfhandl merged commit 7e96988 into OAI:main Nov 7, 2024
2 checks passed
@ralfhandl ralfhandl deleted the main-schema-publish branch November 7, 2024 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Housekeeping script Pull requests that update Bash or JavaScript code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants