Move QP compatibility checks into constructor and add metric#5811
Move QP compatibility checks into constructor and add metric#5811SimonSapin merged 19 commits intodevfrom
Conversation
|
@SimonSapin, please consider creating a changeset entry in |
|
CI performance tests
|
|
Edit: metrics can be a follow-up PR |
| field | ||
| .directives | ||
| .iter() | ||
| .filter(|d| d.name.as_str() == JOIN_FIELD) |
There was a problem hiding this comment.
Can we look up the correct name of the directive on the supergraph's .metadata()?
There was a problem hiding this comment.
On one hand, probably yes. On the other hand, we have enough occurrences of "join__ in the code base that I doubt anything works if a supregraph uses import renames.
Maybe composition supports renames when reading subgraphs be never emits them in a supergraph?
This adds a function to capture metrics for rust qp intialisation. We want to capture whether or not initialisation was a success. We also want to capture which unsupported feature triggered a failure - `@context`, progressive `@overrides` or a fed1 supergraph. There is an addition of "internal init error" to capture the remaining possible match arms. In order to classify unsupported features, this commit also adds an `UnsupportedFeatureKind` enum to apollo-federation errors.
There was a problem hiding this comment.
I like the single attribute!
One thing we should keep in mind: does this give users a scary warning on startup if they're using default configuration (both_best_effort) and new federation features (like prog. override)? It should be clear that they don't need to worry about anything except fed 1 issues, I think.
| } | ||
| } | ||
|
|
||
| metric_rust_qp_init(None); |
There was a problem hiding this comment.
this should be in an else or we get duplicate metics
There was a problem hiding this comment.
this made me realise we are missing the other two kinds of errors in this match - AggregateFederationError and MultipleFederationErrors. They would have been counted as a "success" 🙈
Co-authored-by: Renée <renee.kooi@apollographql.com>
additionally adds integration tests with reloading a schema with a broken one for `new`, `both_best_effort` and `both` query planner modes.
|
@goto-bus-stop at the moment we get a 2024-08-16T13:45:45.605895Z WARN Failed to initialize the new query planner, falling back to legacy: The supergraph schema failed to produce a valid API schema: `experimental_query_planner_mode: new` or `both` cannot yet be used with progressive overrides. Remove uses of progressive overrides to try the experimental query planner, otherwise switch back to `legacy` or `both_best_effort`.",Followed by a log saying the reload is complete: 2024-08-16T13:45:45.777923Z INFO reload completeShould we soften the wording a bit more in the |
| } | ||
|
|
||
| #[tokio::test(flavor = "multi_thread")] | ||
| async fn fed1_schema_with_both_best_effort_qp() { |
There was a problem hiding this comment.
added reload to both_best_effort integration tests for all 3 features (fed1, context, progressive overrides) so we can also check that the log differs between setting the config to new and both_best_effort
| use crate::integration::common::graph_os_enabled; | ||
| use crate::integration::IntegrationTest; | ||
|
|
||
| const PROMETHEUS_METRICS_CONFIG: &str = include_str!("telemetry/fixtures/prometheus.router.yaml"); |
There was a problem hiding this comment.
added prometheus to the config so we can also check that the metrics are emitted for the reload integration tests.
|
This is ready for a re-review on Monday @goto-bus-stop. The failing CI is due to a transitive dependency that requires a rustc upgrade. |
| if let Some(init_error_kind) = init_error_kind { | ||
| u64_counter!( | ||
| "apollo.router.lifecycle.query_planner.init", | ||
| "Rust query planner initialization", | ||
| 1, | ||
| "init.error_kind" = init_error_kind, | ||
| "init.is_success" = false | ||
| ); | ||
| } else { | ||
| u64_counter!( | ||
| "apollo.router.lifecycle.query_planner.init", | ||
| "Rust query planner initialization", | ||
| 1, | ||
| "init.is_success" = true | ||
| ); | ||
| } |
There was a problem hiding this comment.
Is it ok for a metric attribute to be sometimes missing?
There was a problem hiding this comment.
I think so. but it could maybe be collapsed to a single attribute (init.error_kind = "none"). Though it's the same cardinality either way
|
We can make the warning less scary by downgrading it to |
| router | ||
| .assert_log_contains( | ||
| "could not create router: \ | ||
| The supergraph schema failed to produce a valid API schema: \ |
There was a problem hiding this comment.
The mention of API schema here is not quite correct. It comes from apollo-router/src/error.rs:
impl From<FederationError> for ServiceBuildError {
fn from(err: FederationError) -> Self {
ServiceBuildError::ApiSchemaError(err)
}
}
QP initialization is the only place where this enum variant is used (since API schema generation has moved to apollo_router::spec::Schema creation), so it should be renamed to QueryPlannerInit or something, and the message reworded
This aligns better with
experimental_query_planner_mode: both_best_effortfalling back tolegacywhen theQueryPlanner::newconstructor returns an error.Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩