Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: report downstream check task results #1385

Merged
merged 5 commits into from
Nov 3, 2022

Conversation

sachindshinde
Copy link
Contributor

This PR does two things:

  • Refactors the logic for check_workflow runners to better handle certain race conditions that can arise.
    • To summarize, the tasks of a check workflow can run in parallel, and all must pass for the workflow to pass. Critically though, a single task failing will immediately cause the check to fail, even if other tasks are still running.
    • In this case, it's possible (but rare) for a downstream task to fail before the operations task fails. When this happens, the workflow fails but there's no operations task result, causing an AdhocError.
    • To avoid this, we pull the logic out check_response.rs for handling other task failures, and take the following general approach when processing workflow results:
      • We iterate through each known task kind, and if any of them has failed, we return those errors.
      • If all known tasks have succeeded, then we check whether the workflow has succeeded. If it has, we return the results of the operations task result (I've generally seen this as returning a changelog).
      • If the workflow has failed, we return E036 to indicate some other check task has failed.
    • The order of task kind iteration is vaguely based on a sense of "if two task kinds fail, whose errors would be more useful to see?", so I went with build, operations, downstream. (Ultimately though you'd see whichever fails first.)
  • Improves the messaging for downstream check task failures, so that it shows blocking downstream variants that are failing.
    • This involves introducing a new error code, E037, for downstream check task failures specifically.
    • Note that downstream check workflows run in parallel, and the downstream check task fails immediately once a single blocking downstream variant has failed. This means that the list of failed blocking downstream variants we show may be a subset of the full list.

A few notes:

  • You could imagine a world where if two tasks run in parallel and both fail, we show both error results, but it's relatively rare for such a thing to happen (they'd both have to fail within the same polling interval).
  • You could imagine a world where we show more information for downstream check workflows; we may go this route in the future, but for now we're just looking to show some basic information.
  • You could imagine a world where Rover waits for tasks and the workflow to fully complete, so e.g. you can see all task failures instead of the first (or all failed blocking downstream variants instead of the first). This is possible via testing for certain conditions while polling, but it would lengthen Rover check times and potentially increase the rate of timeouts (which may be undesirable in CI/CD), so I imagine this would be an optional flag for checks at best (the demand for such a feature is also unclear).

(false, false) => "The operations task",
(true, false) => "The build and operations tasks",
(true, true) => "The build, operations, and downstream tasks",
(false, true) => unreachable!("Can't have a downstream task without a build task"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if this is something we could express in the schema? perhaps downstream tasks are nested under build task results?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially nest dependent tasks under their dependencies, although it becomes a bit trickier if dependents have multiple dependencies/the task graph gets more complicated. (It would be nice to represent the relationship in some programmatic fashion though in the API.)

|| matches!(workflow_status, CheckWorkflowStatus::PASSED)
{
let result = operations_result.ok_or(RoverClientError::AdhocError {
msg: "Operations check task has no result.".to_string(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a better error we can return here/do we think it is unreachable/other changes here make this unlikely?

Copy link
Contributor Author

@sachindshinde sachindshinde Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a check workflow has passed, then all its tasks must have passed, which includes the operations check task (so it must have a result). Similarly, if the operations check task has failed, then it also must have a result.

So if we encounter this message/case, it basically means there's a bug in our API, or the semantics of our task API have changed in a breaking way. (So at the very least, it should be unlikely.) What error type would you recommend throwing in this case? (Could go with unreachable() if that's preferred.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with the RoverClientError::MalformedResponse error type that indicates the field that was null that shouldn't be

Copy link
Contributor

@EverlastingBugstopper EverlastingBugstopper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤟🏼

@EverlastingBugstopper EverlastingBugstopper changed the title fix: Improve downstream check task messaging and handle race conditions feat: report downstream check task results Nov 3, 2022
@EverlastingBugstopper EverlastingBugstopper added feature 🎉 new commands, flags, functionality, and improved error messages 💅 polish and removed 💅 polish labels Nov 3, 2022
@EverlastingBugstopper EverlastingBugstopper merged commit 5b1263e into main Nov 3, 2022
@EverlastingBugstopper EverlastingBugstopper deleted the sachin/improve-check-task-messaging branch November 3, 2022 20:06
EverlastingBugstopper added a commit that referenced this pull request Nov 14, 2022
# [0.10.0] - 2022-11-10

> Important: 1 potentially breaking change below, indicated by **❗
BREAKING ❗**

## ❗ BREAKING ❗

- **Fix implementation of `--header` argument - @EverlastingBugstopper,
#1369 fixes #1365**

This change tightens up usage of the `--header` argument used for
`introspect` commands by disallowing previously valid (but undocumented)
usage like this: `--header "Header-1: value" "Header-2: value"`. After
this change, you _must_ conform to what we have in the documentation,
which indicates separate instances of the `--header` argument for each
header, like so: `--header "Header-1: value" --header "Header-2:
value"`.

## 🚀 Features

- **Provide prebuilt binaries for ARM devices - @EverlastingBugstopper,
#1356 fixes #582**

As of this release, [`rover.apollo.dev`](https://rover.apollo.dev)
delivers prebuilt binaries from our GitHub release for ARM devices. Most
notably this means that Docker on M1 devices should work out of the box.
You should be able to replace any custom builds in your tooling pipeline
with a call to the [official curl
installer](https://www.apollographql.com/docs/rover/getting-started/#linux--macos-installer).

- **Report downstream check task results - @sachindshinde, #1385**

When running `rover subgraph check` commands, if the proposed schema
would cause downstream failures (i.e. with contracts), those failures
are now reported in the check response.

- **Faster `rover supergraph compose` - @EverlastingBugstopper, #1392
fixes #992**

Rover now resolves all subgraph schemas in parallel when running `rover
supergraph compose` on a `supergraph.yaml` file. This should improve the
speed to compose large supergraphs significantly. This change also
drastically improves error handling by reporting _all_ issues with
resolving subgraph schemas (and informing you which schema(s) failed to
resolve) rather than exiting on the first failed schema resolution.

- **Add `--polling-interval` to `rover dev` - @patrick91, #1377 fixes
#1221**

You can now set `--polling-interval` when running `rover dev` to change
the frequency of introspection poll requests for subgraphs that don't
provide the schema from the file system with the `--schema` argument.

- **Adds `--skip-update-check` to skip the once-per-day update check -
@Tsing, #1396 fixes #1394**

Once per day, Rover checks if there is a new version available for
update and notifies the user if there is. There is now a flag you can
pass to disable this check: `--skip-update-check`.

- **Respect the `NO_COLOR` environment variable - @chnn, #1360**

`rover` will not use color in any output when executed with the
`NO_COLOR` environment variable set to `true`.

## 🛠 Maintenance

- **Updates from clap v3 to clap v4 - @EverlatingBugstopper, #1404 fixes
#1400**

This release updated the command line argument parsing library to major
version 4. There should be no noticeable compatibility issues with this
update, only lighter binaries. The look and feel of the main `rover
--help` output has changed to a neutral color palette along with this
change.

- **Updates Rust to 1.65.0 - @EverlastingBugstopper, #1399**

- **Updates node.js to v18 - @renovate, #1389**

- **Updates node dev-dependencies - @renovate, #1204 and zs#1398**

- **Remove dependency on the `saucer` crate - @EverlastingBugstopper,
#1402**

- **Updates `introspector-gadget` to 0.2.0 - @EverlastingBugstopper,
#1386**

- **Only cache dependencies in CI, not whole `/target` -
@EverlastingBugstopper, #1387**

- **Use `engine@main` instead of `engine@current` to fetch the API
schema - @EverlastingBugstopper, #1368**

- **Use `lychee` as a link checker instead of npm - @ptondereau, #1328
fixes #1306**

We now use a Rust-based link checker to check the links in the Rover
repository instead of a node-based link checker (that was much more
flaky).

- **Describe latest federation versions in
`./latest_plugin_versions.json` - @EverlastingBugstopper, #1363**

When you run `rover supergraph compose`, the latest version of
composition is automatically downloaded to your machine, these latest
version numbers are now stored in `./latest_plugin_versions.json` in the
Rover repo.

- **Rename `apollo-` headers to `apollographql-` headers - @jsegaran,
#1411**

- **Update npm to v9 - @renovate, #1412**

## 📚 Documentation

- **Update studio algolia key to graphos - @trevorblades, #1384**

- **Fix some broken links - @StephenBarlow, #1376**

- **Fix a typo in the migration guide instructing the use of `check`
instead of `publish` - @EverlastingBugstopper, #1364 fixes #1361**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature 🎉 new commands, flags, functionality, and improved error messages 💅 polish
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants