Move QP compatibility checks into constructor and add metric by SimonSapin · Pull Request #5811 · apollographql/router

SimonSapin · 2024-08-13T14:14:17Z

This aligns better with experimental_query_planner_mode: both_best_effort falling back to legacy when the QueryPlanner::new constructor returns an error.

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

Exceptions

Note any exceptions here

Notes

It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩

github-actions · 2024-08-13T14:14:30Z

@SimonSapin, please consider creating a changeset entry in /.changesets/. These instructions describe the process and tooling.

router-perf · 2024-08-13T14:14:49Z

SimonSapin · 2024-08-13T14:15:10Z

~~Draft while we consider adding a new metric for constructor errors~~

Edit: metrics can be a follow-up PR

goto-bus-stop · 2024-08-13T14:54:59Z

apollo-federation/src/query_plan/query_planner.rs

+                field
+                    .directives
+                    .iter()
+                    .filter(|d| d.name.as_str() == JOIN_FIELD)


Can we look up the correct name of the directive on the supergraph's .metadata()?

On one hand, probably yes. On the other hand, we have enough occurrences of "join__ in the code base that I doubt anything works if a supregraph uses import renames.

Maybe composition supports renames when reading subgraphs be never emits them in a supergraph?

This adds a function to capture metrics for rust qp intialisation. We want to capture whether or not initialisation was a success. We also want to capture which unsupported feature triggered a failure - `@context`, progressive `@overrides` or a fed1 supergraph. There is an addition of "internal init error" to capture the remaining possible match arms. In order to classify unsupported features, this commit also adds an `UnsupportedFeatureKind` enum to apollo-federation errors.

goto-bus-stop

I like the single attribute!

One thing we should keep in mind: does this give users a scary warning on startup if they're using default configuration (both_best_effort) and new federation features (like prog. override)? It should be clear that they don't need to worry about anything except fed 1 issues, I think.

apollo-router/src/query_planner/bridge_query_planner.rs

goto-bus-stop · 2024-08-15T12:05:24Z

apollo-router/src/query_planner/bridge_query_planner.rs

+            }
+        }
+
+        metric_rust_qp_init(None);


this should be in an else or we get duplicate metics

this made me realise we are missing the other two kinds of errors in this match - AggregateFederationError and MultipleFederationErrors. They would have been counted as a "success" 🙈

Co-authored-by: Renée <renee.kooi@apollographql.com>

additionally adds integration tests with reloading a schema with a broken one for `new`, `both_best_effort` and `both` query planner modes.

lrlna · 2024-08-16T13:52:13Z

@goto-bus-stop at the moment we get a WARN:

2024-08-16T13:45:45.605895Z WARN  Failed to initialize the new query planner, falling back to legacy: The supergraph schema failed to produce a valid API schema:  `experimental_query_planner_mode: new` or `both` cannot yet be used with progressive overrides. Remove uses of progressive overrides to try the experimental query planner, otherwise switch back to `legacy` or `both_best_effort`.",

Followed by a log saying the reload is complete:

2024-08-16T13:45:45.777923Z INFO  reload complete

Should we soften the wording a bit more in the falling back to legacy?

lrlna · 2024-08-16T14:07:42Z

apollo-router/tests/integration/query_planner.rs

+}
+
+#[tokio::test(flavor = "multi_thread")]
+async fn fed1_schema_with_both_best_effort_qp() {


added reload to both_best_effort integration tests for all 3 features (fed1, context, progressive overrides) so we can also check that the log differs between setting the config to new and both_best_effort

lrlna · 2024-08-16T14:08:33Z

apollo-router/tests/integration/query_planner.rs

+use crate::integration::common::graph_os_enabled;
+use crate::integration::IntegrationTest;
+
+const PROMETHEUS_METRICS_CONFIG: &str = include_str!("telemetry/fixtures/prometheus.router.yaml");


added prometheus to the config so we can also check that the metrics are emitted for the reload integration tests.

lrlna · 2024-08-16T16:37:42Z

This is ready for a re-review on Monday @goto-bus-stop. The failing CI is due to a transitive dependency that requires a rustc upgrade.

SimonSapin · 2024-08-19T15:19:04Z

apollo-router/src/query_planner/bridge_query_planner.rs

+    if let Some(init_error_kind) = init_error_kind {
+        u64_counter!(
+            "apollo.router.lifecycle.query_planner.init",
+            "Rust query planner initialization",
+            1,
+            "init.error_kind" = init_error_kind,
+            "init.is_success" = false
+        );
+    } else {
+        u64_counter!(
+            "apollo.router.lifecycle.query_planner.init",
+            "Rust query planner initialization",
+            1,
+            "init.is_success" = true
+        );
+    }


Is it ok for a metric attribute to be sometimes missing?

I think so. but it could maybe be collapsed to a single attribute (init.error_kind = "none"). Though it's the same cardinality either way

SimonSapin · 2024-08-19T15:28:02Z

We can make the warning less scary by downgrading it to info level and rewording the message to start with "Falling back to legacy QP" instead of having that part be in the middle of a long message

SimonSapin · 2024-08-20T08:02:40Z

apollo-router/tests/integration/query_planner.rs

+    router
+        .assert_log_contains(
+            "could not create router: \
+             The supergraph schema failed to produce a valid API schema: \


The mention of API schema here is not quite correct. It comes from apollo-router/src/error.rs:

impl From<FederationError> for ServiceBuildError { fn from(err: FederationError) -> Self { ServiceBuildError::ApiSchemaError(err) } }

QP initialization is the only place where this enum variant is used (since API schema generation has moved to apollo_router::spec::Schema creation), so it should be renamed to QueryPlannerInit or something, and the message reworded

blocking comment addressed

SimonSapin added 2 commits August 13, 2024 13:08

Move QP lifecycle tests to a dedicated module

dc408ec

Move QP compat testing to constructor, add integration tests

8f925fb

SimonSapin requested a review from a team August 13, 2024 14:14

SimonSapin requested review from a team, TylerBloom, dariuszkuc, duckki, goto-bus-stop, lrlna and sachindshinde as code owners August 13, 2024 14:14

apollo-bot2 assigned SimonSapin Aug 13, 2024

SimonSapin marked this pull request as draft August 13, 2024 14:14

SimonSapin marked this pull request as ready for review August 13, 2024 14:50

goto-bus-stop reviewed Aug 13, 2024

View reviewed changes

SimonSapin and others added 5 commits August 13, 2024 17:33

Add TODO comment for new metric

793f91e

Merge branch 'dev' into simon/qp-lifecycle

7e8bbc4

const UNSUPPORTED_OVERRIDES should just be "overrides"

6a90334

remove "failure" from metric definition

009436b

goto-bus-stop previously requested changes Aug 15, 2024

View reviewed changes

lrlna and others added 4 commits August 15, 2024 15:13

Apply suggestions from code review

64f33e1

Co-authored-by: Renée <renee.kooi@apollographql.com>

add a missing branches when checking for initialisation

503bc89

additionally adds integration tests with reloading a schema with a broken one for `new`, `both_best_effort` and `both` query planner modes.

add metrics asserts to query planner integration tests

30f9d7e

Merge branch 'dev' into simon/qp-lifecycle

b28cbf5

lrlna enabled auto-merge (squash) August 16, 2024 13:29

add integration tests reloading config to both_best_effort

91062d4

lrlna reviewed Aug 16, 2024

View reviewed changes

lrlna added 3 commits August 16, 2024 16:34

extra asserts for reload logs falling back to legacy qp

4e9890e

add the missing supergraph to context_with_legacy_qp

8dff332

typo

dc2504d

lrlna requested a review from goto-bus-stop August 16, 2024 16:37

SimonSapin commented Aug 19, 2024

View reviewed changes

lrlna added 2 commits August 19, 2024 18:04

Merge branch 'dev' into simon/qp-lifecycle

ba83265

Merge branch 'dev' into simon/qp-lifecycle

b95b1c7

SimonSapin commented Aug 20, 2024

View reviewed changes

SimonSapin disabled auto-merge August 20, 2024 09:03

Downgrade QP fallback log to INFO level, remove mention of API schema

49c6a69

SimonSapin enabled auto-merge (squash) August 20, 2024 10:27

SimonSapin changed the title ~~Move QP compatibility checks into constructor~~ Move QP compatibility checks into constructor and add metric Aug 20, 2024

lrlna approved these changes Aug 20, 2024

View reviewed changes

Merge branch 'dev' into simon/qp-lifecycle

0ba82ad

SimonSapin merged commit 4e6773c into dev Aug 20, 2024

SimonSapin deleted the simon/qp-lifecycle branch August 20, 2024 10:55

abernix mentioned this pull request Aug 28, 2024

prep release: v1.53.0 #5905

Merged

Conversation

SimonSapin commented Aug 13, 2024

Footnotes

Uh oh!

github-actions bot commented Aug 13, 2024

Uh oh!

router-perf bot commented Aug 13, 2024

Uh oh!

SimonSapin commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

goto-bus-stop left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lrlna commented Aug 16, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lrlna commented Aug 16, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SimonSapin commented Aug 19, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SimonSapin commented Aug 13, 2024 •

edited

Loading

goto-bus-stop left a comment •

edited

Loading