Skip to content

Comments

prep release: v2.4.0#7789

Merged
abernix merged 3 commits into2.4.0from
prep-2.4.0
Jul 2, 2025
Merged

prep release: v2.4.0#7789
abernix merged 3 commits into2.4.0from
prep-2.4.0

Conversation

@abernix
Copy link
Member

@abernix abernix commented Jul 1, 2025

Note

When approved, this PR will merge into the 2.4.0 branch which will — upon being approved itself — merge into main.

Things to review in this PR:

  • Changelog correctness (There is a preview below, but it is not necessarily the most up to date. See the Files Changed for the true reality.)
  • Version bumps
  • That it targets the right release branch (2.4.0 in this case!).

🚀 Features

Support JWT audience (aud) validation (PR #7578)

Adds support for validating the audience (aud) claim for JWTs received by the router. This allows the router to ensure that the JWT is intended
for the specific audience it is being used with, enhancing security by preventing token misuse across different audiences.

The following configuration will validate the JWT's aud claim against the specified audiences and ensure a match with either https://my.api or https://my.other.api. If the aud claim does not match either of those configured audiences, the router will reject the request.

authentication:
 router:
   jwt:
     jwks: # This key is required.
       - url: https://dev-zzp5enui.us.auth0.com/.well-known/jwks.json
         issuers: # optional list of issuers
           - https://issuer.one
           - https://issuer.two
         audiences: # optional list of audiences
           - https://my.api
           - https://my.other.api
         poll_interval: <optional poll interval>
         headers: # optional list of static headers added to the HTTP request to the JWKS URL
           - name: User-Agent
             value: router
     # These keys are optional. Default values are shown.
     header_name: Authorization
     header_value_prefix: Bearer
     on_error: Error
     # array of alternative token sources
     sources:
       - type: header
         name: X-Authorization
         value_prefix: Bearer
       - type: cookie
         name: authz

By @Velfi in #7578

Prioritize existing requests over query parsing and planning during "warm up" (PR #7223)

The router warms up its query planning cache after a schema or configuration change. This change decreases the priority
of warm up tasks in the compute job queue, to reduce the impact of warmup on serving requests.

This change adds new values to the job.type dimension of the following metrics:

  • apollo.router.compute_jobs.duration - A histogram of time spent in the compute pipeline by the job, including the queue and query planning.
    • job.type: (query_planning, query_parsing, introspection, query_planning_warmup, query_parsing_warmup)
    • job.outcome: (executed_ok, executed_error, channel_error, rejected_queue_full, abandoned)
  • apollo.router.compute_jobs.queue.wait.duration - A histogram of time spent in the compute queue by the job.
    • job.type: (query_planning, query_parsing, introspection, query_planning_warmup, query_parsing_warmup)
  • apollo.router.compute_jobs.execution.duration - A histogram of time spent to execute job (excludes time spent in the queue).
    • job.type: (query_planning, query_parsing, introspection, query_planning_warmup, query_parsing_warmup)
  • apollo.router.compute_jobs.active_jobs - A gauge of the number of compute jobs being processed in parallel.
    • job.type: (query_planning, query_parsing, introspection, query_planning_warmup, query_parsing_warmup)

By @carodewig in #7223

Persisted queries: Include operation name in PERSISTED_QUERY_NOT_IN_LIST error for debuggability (PR #7768)

When persisted query safelisting is enabled and a request has an unknown PQ ID, the GraphQL error now has the extension field operation_name containing the GraphQL operation name (if provided explicitly in the request). Note that this only applies to the PERSISTED_QUERY_NOT_IN_LIST error returned when manifest-based PQs are enabled, APQs are disabled, and the request contains an operation ID that is not in the list.

By @glasser in #7768

Cooperative cancellation for query planning

This release introduces cooperative cancellation support for query planning operations. This feature allows the router
to gracefully handle query planning timeouts and cancellations, improving resource utilization.

Metrics are emitted for cooperative cancellation:

  • Records the "outcome" of query planning on the apollo.router.query_planning.plan.duration metric.
  • Records the "outcome" of query planning on the query_planning span.

The mode can be set to measure or enforce. We recommend starting with measure. In measure mode, the router will measure the time taken for query planning and emit metrics accordingly. In enforce mode, the router will cancel query planning operations that exceed the specified timeout.

To configure cooperative cancellation in measure mode:

supergraph:
  query_planning:
    experimental_cooperative_cancellation:
      enabled: true
      mode: measure
      timeout: 1s

By @Velfi in #7604

🐛 Fixes

Align on_graphql_error selector with subgraph_on_graphql_error (PR #7676)

The on_graphql_error selector will now return true or false, in alignment with the subgraph_on_graphql_error selector. Previously, the selector would return true or None.

By @carodewig in #7676

GraphQL responses should remain spec-compliant when coprocessors return invalid payloads (PR #7680)

In this PR we added checks on GraphQL responses returned from coprocessors to ensure compliance with GraphQL specifications. For subscriptions, an omission occurred which didn't return data When it's a subscription using websocket it was not returning any data and so was not a correct GraphQL response payload. This is a fix to always return valid GraphQL response when doing the websocket handshake.

By @bnjjj in #7680

SigV4 configurations are again operable (PR #7726)

Fixed an issue introduced in Router 2.3.0 where some SigV4 configurations would fail to start, preventing communication with SigV4-enabled services.

By @dylan-apollo in #7726

Improve error message for invalid variables (Issue #2984)

When a variable in a GraphQL request is missing or contains an invalid value, the router now returns more useful error messages. Example:

-invalid type for variable: 'x'
+invalid input value at x.coordinates[0].longitude: found JSON null for GraphQL Float!

By @SimonSapin in #7567

Labels on metrics emitted via Prometheus (PR #7394)

When configuring telemetry.exporters.metrics.common.resource to globally add labels on metrics, these labels were not exported on some Prometheus metrics. This is accomplished if you set resource_selector to all (default is none).

telemetry:
  exporters:
    metrics:
      common:
        resource:
          "test-resource": "test"
      prometheus:
        enabled: true
        resource_selector: all # This will add resources on every metrics

This only occurred with Prometheus and not OTLP.

By @bnjjj in #7394

Forbid unknown @link directives for supergraph schemas where purpose is EXECUTION or SECURITY

The legacy JavaScript query planner forbid any usage of unknown @link specs in supergraph schemas with either EXECUTION or SECURITY value set for the for argument (aka, the spec's "purpose"). This behavior had not been ported to the native query planner previously. This PR implements the expected behavior in the native query planner.

By @duckki in #7587

Supergraph stage correctly receives on_graphql_error selector (PR #7669)

The on_graphql_error selector will now correctly fire on the supergraph stage; previously it only worked on the router stage.

By @carodewig in #7669

Invalid type condition in @defer fetch

The query planner was adding an inline spread (...) conditioned on the Query type in deferred subgraph fetch queries. Such a query would be invalid in the subgraph when the subgraph schema renamed the root query type to somethhing other than Query. The fix removes the root type condition from all subgraph queries, so that they stay valid even when root types are renamed.

By @duckki in #7580

Preserve content-type for file uploads when Rhai scripts are in use (PR #7559)

If a Rhai script was invoked during File Upload processing, then the "Content-Type" of the Request was not preserved correctly. This would cause a File Upload to fail.

The error message would be something like:

"message": "invalid multipart request: Content-Type is not multipart/form-data",

By @garypen in #7559

OTLP metric HTTP endpoint behavior (PR #7595)

We make substantial updates to OpenTelemetry when we released router 2.0, but didn't catch that OpenTelemetry changed how it processed "endpoints" (destinations for metrics and traces) until now.

With the undetected change, the router wasn't setting the path correctly, resulting in failure to export metrics over HTTP when using the "default" endpoint. Neither metrics via gRPC or traces were impacted.

We have fixed our interactions with the dependency and improved our testing to make sure this does not occur again. Additionally, the router now supports setting standard OpenTelemetry environment variables for endpoints.

There is still a known problem when using environment variables to configure endpoints for the HTTP protocol when transmitting to an un-encrypted endpoint (i.e., TLS not configured). This affects the following environment variables:

  • OTEL_EXPORTER_OTLP_ENDPOINT
  • OTEL_EXPORTER_OTLP_METRICS_ENDPOINT
  • OTEL_EXPORTER_OTLP_TRACES_ENDPOINT

When these environment variables are set to insecure hosts, messages will appear in the logs indicating an error, but the metrics and traces will still be sent correctly:

2025-06-06T15:12:47.992144Z ERROR  OpenTelemetry metric error occurred: Metrics exporter otlp failed with the grpc server returns error (Unknown error): , detailed error message: h2 protocol error: http2 error tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: GoAway(b"", FRAME_SIZE_ERROR, Library) }))
2025-06-06T15:12:47.992763Z ERROR  OpenTelemetry trace error occurred: Exporter otlp encountered the following error(s): the grpc server returns error (Unknown error): , detailed error message: h2 protocol error: http2 error tonic::transport::Error(Transport, hyper::Error(Http2, Error { kind: GoAway(b"", FRAME_SIZE_ERROR, Library) }))

This is tracked upstream at open-telemetry/opentelemetry-collector#10952.

By @garypen in #7595

Add graphql.operation.name attribute to apollo.router.opened.subscriptions counter (PR #7606)

The apollo.router.opened.subscriptions metric has an graphql.operation.name attribute applied to identify the named operation of subscriptions which are still open.

By @bnjjj in #7606

🛠 Maintenance

Measure preview_extended_error_metrics in Apollo config telemetry. (PR #7597)

By @timbotnik in #7597

📚 Documentation

Apollo Runtime Container is documented in Deployment

The Apollo Runtime Container is included in our documentation for Deployment options. It also includes instructions for running Apollo Router with the Apollo MCP Server.

By @jonathanrainer and @lambertjosh in #7734 and #7668

Fix incorrect reference to apollo.router.schema.load.duration (PR #7582)

The in-memory cache documentation was referencing an incorrect metric to track schema load times. Previously it was referred to as apollo.router.schema.loading.time, whereas the metric being emitted by the router since v2.0.0 is actually apollo.router.schema.load.duration. This is now fixed.

By @lrlna in #7582

Re-introduce the "graph artifact" documentation for containers PR #7752

Adds back accidentally overwritten docs which occurred in PR 7734. The missing commit added graph artifact usage information.

By @lambertjosh in #7752

@abernix abernix requested a review from a team July 1, 2025 19:36
@abernix abernix requested review from a team as code owners July 1, 2025 19:36
@apollo-librarian
Copy link

apollo-librarian bot commented Jul 1, 2025

⚠️ Docs preview not attached to branch

The preview was not built because the PR's base branch 2.4.0 is not in the list of sources.

An Apollo team member can comment one of the following commands to dictate which branch to attach the preview to:

  • !docs set-base-branch 1.x
  • !docs set-base-branch dev

Build ID: 6af35d1126c174e3d4c9e95c

@abernix abernix merged commit 9c2f27b into 2.4.0 Jul 2, 2025
13 of 14 checks passed
@abernix abernix deleted the prep-2.4.0 branch July 2, 2025 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants