Merged
Conversation
|
carodewig
approved these changes
Jul 1, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🚀 Features
Support JWT audience (
aud) validation (PR #7578)Adds support for validating the audience (
aud) claim for JWTs received by the router. This allows the router to ensure that the JWT is intendedfor the specific audience it is being used with, enhancing security by preventing token misuse across different audiences.
The following configuration will validate the JWT's
audclaim against the specified audiences and ensure a match with eitherhttps://my.apiorhttps://my.other.api. If theaudclaim does not match either of those configured audiences, the router will reject the request.By @Velfi in #7578
Prioritize existing requests over query parsing and planning during "warm up" (PR #7223)
The router warms up its query planning cache after a schema or configuration change. This change decreases the priority
of warm up tasks in the compute job queue, to reduce the impact of warmup on serving requests.
This change adds new values to the
job.typedimension of the following metrics:apollo.router.compute_jobs.duration- A histogram of time spent in the compute pipeline by the job, including the queue and query planning.job.type: (query_planning,query_parsing,introspection,query_planning_warmup,query_parsing_warmup)job.outcome: (executed_ok,executed_error,channel_error,rejected_queue_full,abandoned)apollo.router.compute_jobs.queue.wait.duration- A histogram of time spent in the compute queue by the job.job.type: (query_planning,query_parsing,introspection,query_planning_warmup,query_parsing_warmup)apollo.router.compute_jobs.execution.duration- A histogram of time spent to execute job (excludes time spent in the queue).job.type: (query_planning,query_parsing,introspection,query_planning_warmup,query_parsing_warmup)apollo.router.compute_jobs.active_jobs- A gauge of the number of compute jobs being processed in parallel.job.type: (query_planning,query_parsing,introspection,query_planning_warmup,query_parsing_warmup)By @carodewig in #7223
Persisted queries: Include operation name in
PERSISTED_QUERY_NOT_IN_LISTerror for debuggability (PR #7768)When persisted query safelisting is enabled and a request has an unknown PQ ID, the GraphQL error now has the extension field
operation_namecontaining the GraphQL operation name (if provided explicitly in the request). Note that this only applies to thePERSISTED_QUERY_NOT_IN_LISTerror returned when manifest-based PQs are enabled, APQs are disabled, and the request contains an operation ID that is not in the list.By @glasser in #7768
Cooperative cancellation for query planning
This release introduces cooperative cancellation support for query planning operations. This feature allows the router
to gracefully handle query planning timeouts and cancellations, improving resource utilization.
Metrics are emitted for cooperative cancellation:
apollo.router.query_planning.plan.durationmetric.query_planningspan.The
modecan be set tomeasureorenforce. We recommend starting withmeasure. Inmeasuremode, the router will measure the time taken for query planning and emit metrics accordingly. Inenforcemode, the router will cancel query planning operations that exceed the specified timeout.To configure cooperative cancellation in measure mode:
By @Velfi in #7604
🐛 Fixes
Align
on_graphql_errorselector withsubgraph_on_graphql_error(PR #7676)The
on_graphql_errorselector will now returntrueorfalse, in alignment with thesubgraph_on_graphql_errorselector. Previously, the selector would returntrueorNone.By @carodewig in #7676
GraphQL responses should remain spec-compliant when coprocessors return invalid payloads (PR #7680)
In this PR we added checks on GraphQL responses returned from coprocessors to ensure compliance with GraphQL specifications. For subscriptions, an omission occurred which didn't return
dataWhen it's a subscription using websocket it was not returning any data and so was not a correct GraphQL response payload. This is a fix to always return valid GraphQL response when doing the websocket handshake.By @bnjjj in #7680
SigV4 configurations are again operable (PR #7726)
Fixed an issue introduced in Router 2.3.0 where some SigV4 configurations would fail to start, preventing communication with SigV4-enabled services.
By @dylan-apollo in #7726
Improve error message for invalid variables (Issue #2984)
When a variable in a GraphQL request is missing or contains an invalid value, the router now returns more useful error messages. Example:
By @SimonSapin in #7567
Labels on metrics emitted via Prometheus (PR #7394)
When configuring
telemetry.exporters.metrics.common.resourceto globally add labels on metrics, these labels were not exported on some Prometheus metrics. This is accomplished if you setresource_selectortoall(default isnone).This only occurred with Prometheus and not OTLP.
By @bnjjj in #7394
Forbid unknown
@linkdirectives for supergraph schemas wherepurposeisEXECUTIONorSECURITYThe legacy JavaScript query planner forbid any usage of unknown
@linkspecs in supergraph schemas with eitherEXECUTIONorSECURITYvalue set for theforargument (aka, the spec's "purpose"). This behavior had not been ported to the native query planner previously. This PR implements the expected behavior in the native query planner.By @duckki in #7587
Supergraph stage correctly receives
on_graphql_errorselector (PR #7669)The
on_graphql_errorselector will now correctly fire on the supergraph stage; previously it only worked on the router stage.By @carodewig in #7669
Invalid type condition in
@deferfetchThe query planner was adding an inline spread (
...) conditioned on theQuerytype in deferred subgraph fetch queries. Such a query would be invalid in the subgraph when the subgraph schema renamed the rootquerytype to somethhing other thanQuery. The fix removes the root type condition from all subgraph queries, so that they stay valid even when root types are renamed.By @duckki in #7580
Preserve
content-typefor file uploads when Rhai scripts are in use (PR #7559)If a Rhai script was invoked during File Upload processing, then the "Content-Type" of the Request was not preserved correctly. This would cause a File Upload to fail.
The error message would be something like:
By @garypen in #7559
OTLP metric HTTP endpoint behavior (PR #7595)
We make substantial updates to OpenTelemetry when we released router 2.0, but didn't catch that OpenTelemetry changed how it processed "endpoints" (destinations for metrics and traces) until now.
With the undetected change, the router wasn't setting the path correctly, resulting in failure to export metrics over HTTP when using the "default" endpoint. Neither metrics via gRPC or traces were impacted.
We have fixed our interactions with the dependency and improved our testing to make sure this does not occur again. Additionally, the router now supports setting standard OpenTelemetry environment variables for endpoints.
There is still a known problem when using environment variables to configure endpoints for the HTTP protocol when transmitting to an un-encrypted endpoint (i.e., TLS not configured). This affects the following environment variables:
OTEL_EXPORTER_OTLP_ENDPOINTOTEL_EXPORTER_OTLP_METRICS_ENDPOINTOTEL_EXPORTER_OTLP_TRACES_ENDPOINTWhen these environment variables are set to insecure hosts, messages will appear in the logs indicating an error, but the metrics and traces will still be sent correctly:
This is tracked upstream at open-telemetry/opentelemetry-collector#10952.
By @garypen in #7595
Add
graphql.operation.nameattribute toapollo.router.opened.subscriptionscounter (PR #7606)The
apollo.router.opened.subscriptionsmetric has angraphql.operation.nameattribute applied to identify the named operation of subscriptions which are still open.By @bnjjj in #7606
🛠 Maintenance
Measure
preview_extended_error_metricsin Apollo config telemetry. (PR #7597)By @timbotnik in #7597
📚 Documentation
Apollo Runtime Container is documented in Deployment
The Apollo Runtime Container is included in our documentation for Deployment options. It also includes instructions for running Apollo Router with the Apollo MCP Server.
By @jonathanrainer and @lambertjosh in #7734 and #7668
Fix incorrect reference to
apollo.router.schema.load.duration(PR #7582)The in-memory cache documentation was referencing an incorrect metric to track schema load times. Previously it was referred to as
apollo.router.schema.loading.time, whereas the metric being emitted by the router since v2.0.0 is actuallyapollo.router.schema.load.duration. This is now fixed.By @lrlna in #7582
Re-introduce the "graph artifact" documentation for containers PR #7752
Adds back accidentally overwritten docs which occurred in PR 7734. The missing commit added graph artifact usage information.
By @lambertjosh in #7752