feat: report downstream check task results #1385

sachindshinde · 2022-10-22T03:58:41Z

This PR does two things:

Refactors the logic for check_workflow runners to better handle certain race conditions that can arise.
- To summarize, the tasks of a check workflow can run in parallel, and all must pass for the workflow to pass. Critically though, a single task failing will immediately cause the check to fail, even if other tasks are still running.
- In this case, it's possible (but rare) for a downstream task to fail before the operations task fails. When this happens, the workflow fails but there's no operations task result, causing an AdhocError.
- To avoid this, we pull the logic out check_response.rs for handling other task failures, and take the following general approach when processing workflow results:
  - We iterate through each known task kind, and if any of them has failed, we return those errors.
  - If all known tasks have succeeded, then we check whether the workflow has succeeded. If it has, we return the results of the operations task result (I've generally seen this as returning a changelog).
  - If the workflow has failed, we return E036 to indicate some other check task has failed.
- The order of task kind iteration is vaguely based on a sense of "if two task kinds fail, whose errors would be more useful to see?", so I went with build, operations, downstream. (Ultimately though you'd see whichever fails first.)
Improves the messaging for downstream check task failures, so that it shows blocking downstream variants that are failing.
- This involves introducing a new error code, E037, for downstream check task failures specifically.
- Note that downstream check workflows run in parallel, and the downstream check task fails immediately once a single blocking downstream variant has failed. This means that the list of failed blocking downstream variants we show may be a subset of the full list.

A few notes:

You could imagine a world where if two tasks run in parallel and both fail, we show both error results, but it's relatively rare for such a thing to happen (they'd both have to fail within the same polling interval).
You could imagine a world where we show more information for downstream check workflows; we may go this route in the future, but for now we're just looking to show some basic information.
You could imagine a world where Rover waits for tasks and the workflow to fully complete, so e.g. you can see all task failures instead of the first (or all failed blocking downstream variants instead of the first). This is possible via testing for certain conditions while polling, but it would lengthen Rover check times and potentially increase the rate of timeouts (which may be undesirable in CI/CD), so I imagine this would be an optional flag for checks at best (the demand for such a feature is also unclear).

…race conditions

EverlastingBugstopper · 2022-10-24T20:50:31Z

crates/rover-client/src/error.rs

+        (false, false) => "The operations task",
+        (true, false) => "The build and operations tasks",
+        (true, true) => "The build, operations, and downstream tasks",
+        (false, true) => unreachable!("Can't have a downstream task without a build task"),


i wonder if this is something we could express in the schema? perhaps downstream tasks are nested under build task results?

We could potentially nest dependent tasks under their dependencies, although it becomes a bit trickier if dependents have multiple dependencies/the task graph gets more complicated. (It would be nice to represent the relationship in some programmatic fashion though in the API.)

EverlastingBugstopper · 2022-10-24T20:51:11Z

crates/rover-client/src/operations/graph/check_workflow/runner.rs

+        || matches!(workflow_status, CheckWorkflowStatus::PASSED)
+    {
+        let result = operations_result.ok_or(RoverClientError::AdhocError {
+            msg: "Operations check task has no result.".to_string(),


is there a better error we can return here/do we think it is unreachable/other changes here make this unlikely?

If a check workflow has passed, then all its tasks must have passed, which includes the operations check task (so it must have a result). Similarly, if the operations check task has failed, then it also must have a result.

So if we encounter this message/case, it basically means there's a bug in our API, or the semantics of our task API have changed in a breaking way. (So at the very least, it should be unlikely.) What error type would you recommend throwing in this case? (Could go with unreachable() if that's preferred.)

Let's go with the RoverClientError::MalformedResponse error type that indicates the field that was null that shouldn't be

…ck error

EverlastingBugstopper

🤟🏼

@EverlastingBugstopper

# [0.10.0] - 2022-11-10 > Important: 1 potentially breaking change below, indicated by **❗ BREAKING ❗** ## ❗ BREAKING ❗ - **Fix implementation of `--header` argument - @EverlastingBugstopper, #1369 fixes #1365** This change tightens up usage of the `--header` argument used for `introspect` commands by disallowing previously valid (but undocumented) usage like this: `--header "Header-1: value" "Header-2: value"`. After this change, you _must_ conform to what we have in the documentation, which indicates separate instances of the `--header` argument for each header, like so: `--header "Header-1: value" --header "Header-2: value"`. ## 🚀 Features - **Provide prebuilt binaries for ARM devices - @EverlastingBugstopper, #1356 fixes #582** As of this release, [`rover.apollo.dev`](https://rover.apollo.dev) delivers prebuilt binaries from our GitHub release for ARM devices. Most notably this means that Docker on M1 devices should work out of the box. You should be able to replace any custom builds in your tooling pipeline with a call to the [official curl installer](https://www.apollographql.com/docs/rover/getting-started/#linux--macos-installer). - **Report downstream check task results - @sachindshinde, #1385** When running `rover subgraph check` commands, if the proposed schema would cause downstream failures (i.e. with contracts), those failures are now reported in the check response. - **Faster `rover supergraph compose` - @EverlastingBugstopper, #1392 fixes #992** Rover now resolves all subgraph schemas in parallel when running `rover supergraph compose` on a `supergraph.yaml` file. This should improve the speed to compose large supergraphs significantly. This change also drastically improves error handling by reporting _all_ issues with resolving subgraph schemas (and informing you which schema(s) failed to resolve) rather than exiting on the first failed schema resolution. - **Add `--polling-interval` to `rover dev` - @patrick91, #1377 fixes #1221** You can now set `--polling-interval` when running `rover dev` to change the frequency of introspection poll requests for subgraphs that don't provide the schema from the file system with the `--schema` argument. - **Adds `--skip-update-check` to skip the once-per-day update check - @Tsing, #1396 fixes #1394** Once per day, Rover checks if there is a new version available for update and notifies the user if there is. There is now a flag you can pass to disable this check: `--skip-update-check`. - **Respect the `NO_COLOR` environment variable - @chnn, #1360** `rover` will not use color in any output when executed with the `NO_COLOR` environment variable set to `true`. ## 🛠 Maintenance - **Updates from clap v3 to clap v4 - @EverlatingBugstopper, #1404 fixes #1400** This release updated the command line argument parsing library to major version 4. There should be no noticeable compatibility issues with this update, only lighter binaries. The look and feel of the main `rover --help` output has changed to a neutral color palette along with this change. - **Updates Rust to 1.65.0 - @EverlastingBugstopper, #1399** - **Updates node.js to v18 - @renovate, #1389** - **Updates node dev-dependencies - @renovate, #1204 and zs#1398** - **Remove dependency on the `saucer` crate - @EverlastingBugstopper, #1402** - **Updates `introspector-gadget` to 0.2.0 - @EverlastingBugstopper, #1386** - **Only cache dependencies in CI, not whole `/target` - @EverlastingBugstopper, #1387** - **Use `engine@main` instead of `engine@current` to fetch the API schema - @EverlastingBugstopper, #1368** - **Use `lychee` as a link checker instead of npm - @ptondereau, #1328 fixes #1306** We now use a Rust-based link checker to check the links in the Rover repository instead of a node-based link checker (that was much more flaky). - **Describe latest federation versions in `./latest_plugin_versions.json` - @EverlastingBugstopper, #1363** When you run `rover supergraph compose`, the latest version of composition is automatically downloaded to your machine, these latest version numbers are now stored in `./latest_plugin_versions.json` in the Rover repo. - **Rename `apollo-` headers to `apollographql-` headers - @jsegaran, #1411** - **Update npm to v9 - @renovate, #1412** ## 📚 Documentation - **Update studio algolia key to graphos - @trevorblades, #1384** - **Fix some broken links - @StephenBarlow, #1376** - **Fix a typo in the migration guide instructing the use of `check` instead of `publish` - @EverlastingBugstopper, #1364 fixes #1361**

Improve downstream check task messaging and fix a few bugs regarding …

cbc53a1

…race conditions

sachindshinde requested review from EverlastingBugstopper and lrlna as code owners October 22, 2022 03:58

Merge branch 'main' into sachin/improve-check-task-messaging

9d8d088

EverlastingBugstopper assigned sachindshinde and EverlastingBugstopper Oct 24, 2022

EverlastingBugstopper added the 💅 polish label Oct 24, 2022

EverlastingBugstopper added this to the vNext milestone Oct 24, 2022

EverlastingBugstopper reviewed Oct 24, 2022

View reviewed changes

EverlastingBugstopper approved these changes Oct 24, 2022

View reviewed changes

EverlastingBugstopper and others added 3 commits October 25, 2022 14:30

Merge branch 'main' into sachin/improve-check-task-messaging

6bfd3ad

Fix English issue for this/these in downstream error messaging

fcdec1d

Shift from AdHocError to MalformedResponse for a particular graph che…

4b19264

…ck error

EverlastingBugstopper self-requested a review November 3, 2022 20:02

EverlastingBugstopper approved these changes Nov 3, 2022

View reviewed changes

EverlastingBugstopper changed the title ~~fix: Improve downstream check task messaging and handle race conditions~~ feat: report downstream check task results Nov 3, 2022

EverlastingBugstopper added feature 🎉 new commands, flags, functionality, and improved error messages 💅 polish and removed 💅 polish labels Nov 3, 2022

EverlastingBugstopper merged commit 5b1263e into main Nov 3, 2022

EverlastingBugstopper deleted the sachin/improve-check-task-messaging branch November 3, 2022 20:06

EverlastingBugstopper mentioned this pull request Nov 10, 2022

release: v0.10.0 #1406

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: report downstream check task results #1385

feat: report downstream check task results #1385

sachindshinde commented Oct 22, 2022

EverlastingBugstopper Oct 24, 2022

sachindshinde Nov 2, 2022

EverlastingBugstopper Oct 24, 2022

sachindshinde Nov 3, 2022 •

edited

Loading

EverlastingBugstopper Nov 3, 2022

EverlastingBugstopper left a comment

feat: report downstream check task results #1385

feat: report downstream check task results #1385

Conversation

sachindshinde commented Oct 22, 2022

EverlastingBugstopper Oct 24, 2022

Choose a reason for hiding this comment

sachindshinde Nov 2, 2022

Choose a reason for hiding this comment

EverlastingBugstopper Oct 24, 2022

Choose a reason for hiding this comment

sachindshinde Nov 3, 2022 • edited Loading

Choose a reason for hiding this comment

EverlastingBugstopper Nov 3, 2022

Choose a reason for hiding this comment

EverlastingBugstopper left a comment

Choose a reason for hiding this comment

sachindshinde Nov 3, 2022 •

edited

Loading