Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local-first + generated CI #4148

Open
jprochazk opened this issue Nov 6, 2023 · 4 comments
Open

Local-first + generated CI #4148

jprochazk opened this issue Nov 6, 2023 · 4 comments
Labels
🧑‍💻 dev experience developer experience (excluding CI) enhancement New feature or request 🚢 CI

Comments

@jprochazk
Copy link
Member

jprochazk commented Nov 6, 2023

We do a lot of work on CI, and it's extremely difficult to keep track of how it all fits together. We've also had to deal with a lot of pain arising from the fact that:

  • Every workflow on CI is written in untyped YAML
  • Most of what CI does can't run locally 1

There are two big changes we can make to drastically improve the situation:

  1. Make everything on CI runnable locally
  2. Design a custom DSL and transpile it to GHA YAML files

Local-first CI

Every job first installs it dependencies, and then it runs some code2. We want to ensure that all of that code is also runnable on every developer machine locally. To achieve this, we have to refactor every CI job to be a wrapper over this basic two-step process (install + run).

The sync release assets workflow is a great example of what we want all of our CI to look like. Especially the fact that the inputs to the script are passed in explicitly.

Some notes for the process of extracting a CI job to run locally:

  • Installing dependencies should be done using our docker image(s) together with pixi.
  • Most third-party actions simply wrap a CLI tool (or multiple), so we should replace their usage with usage of the underlying CLI tool(s)

Codegen GHA away

As long as every job is not much more complex than install + run, it should be possible to ditch the GHA YAML files entirely, and instead use a custom DSL as the input to a GHA YAML file generator. Even if all this code generator did was use a different configuration file format and transpiled it to YAML, it would still be a big improvement in developer experience, but we can do much more than that.

Some (unordered, tentative) goals for this DSL and code generator:

  • Strongly typed inputs/outputs for every step of every job
  • A better mechanism for code reuse
    • Must have the ability to inline code
  • Automatic job sequencing
    • Given the inputs and outputs of each job, we know the dependencies between jobs, so we can schedule jobs to run in parallel if they don't depend on each other.
  • Generate a variant of the same workflow for contributors
    • Sanitized and runs on pull_request with approval
  • Local runner with the ability to execute entire workflows E2E
    • Every job uses a docker image + pixi, so this doesn't seem too far fetched
    • It should also support --dry-run to display the execution plan

We don't have to meet all of the above goals. The only strict requirement is that the DSL is not YAML, and it's possible to author the files without deep knowledge of GHA.

We will likely continue to hand-author some workflows with very specific requirements, but this should be usable for all jobs that perform builds/tests/linting.

Footnotes

  1. Not because it must run on CI, we just haven't put in the work to make it run locally

  2. Some jobs may install additional dependencies later in their lifecycle, but that's more of a consequence of our job sequencing, and not a requirement for the job to work.

@jprochazk jprochazk added enhancement New feature or request 🧑‍💻 dev experience developer experience (excluding CI) 🚢 CI labels Nov 6, 2023
@teh-cmc
Copy link
Member

teh-cmc commented Nov 6, 2023

Design a custom DSL and transpile it to GHA YAML files

I would even prefer no DSL at all: just define a bunch of classes for Workflows/Jobs/Steps/etc and simply work with actual Python code.
Build lists and graphs using good old code then dump everything as YAML.

In fact we don't even have to define these classes... they already exist.

Pushing this logic even further: do we even need to go through an intermediate YAML representation at all? Can't PyGithub configure workflows straight from in-memory objects?

@jprochazk
Copy link
Member Author

Build lists and graphs using good old code then dump everything as YAML.

I think that qualifies as a DSL 😄. But I agree that it should be "good old code" as much as possible.

Pushing this logic even further: do we even need to go through an intermediate YAML representation at all? Can't PyGithub configure workflows straight from in-memory objects?

GHA requires the intermediate YAML files. There's no way to dispatch a job for a workflow that doesn't have a workflow ID, and those are given out to every workflow file.

@jprochazk
Copy link
Member Author

jprochazk commented Nov 6, 2023

I have a bit of a crazy proposal: I think we should use TypeScript for the "DSL" part. The code generator would use deno to create a barebones JS environment in which we'd execute .ts files to produce the high-level definition of each workflow. The high-level definition would then be compiled to the GHA YAML files.

I know, JS does not spark joy. But here's why I think it makes sense:

  • If you ignore everything that makes JavaScript a terrible language, there's still enough language left to use for our DSL needs.
  • TypeScript is the only language that's runtime interpreted with a decent type system.
    • At the very least, the type system is not nearly as insane as Python's.
  • We get to use all the tooling that exists for TS, for us that would be the LSP and some auto-formatter.
  • We get to write the workflow definitions in an imperative style, a.k.a. good old code.
  • Reusing code would be as simple as refactoring it into a function and calling it. Those functions could also be imported from other files.
  • The code generator would use swc to transpile JS -> TS, and deno_core to execute it. That means the whole code generator could be written in Rust and use no external tools.

I definitely want to be careful not to overengineer this, but I would also like it if we didn't have to use JSON, TOML, or YAML for the DSL, and it doesn't seem like there's "middle ground" option that wouldn't either have sub-par tooling (such as building a custom language) or be annoying to use (such as Python)

@jprochazk
Copy link
Member Author

We had a lengthy call about this:

It's clear that the valuable part of codegen is the ability to:

  • Check the inputs/outputs of each job
  • Run workflows locally

TypeScript is one option for the high-level configuration language, but before we decide on that we need to do more design work to determine what we actually need to be able to specify in the workflow definitions.

For now, we'll be focusing on refactoring our CI to be local-first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🧑‍💻 dev experience developer experience (excluding CI) enhancement New feature or request 🚢 CI
Projects
None yet
Development

No branches or pull requests

2 participants