Skip to content

Reliably distinguish GraphQL errors and transport errors in subscriptions#7901

Merged
goto-bus-stop merged 5 commits intodevfrom
renee/ROUTER-1343
Jul 30, 2025
Merged

Reliably distinguish GraphQL errors and transport errors in subscriptions#7901
goto-bus-stop merged 5 commits intodevfrom
renee/ROUTER-1343

Conversation

@goto-bus-stop
Copy link
Member

@goto-bus-stop goto-bus-stop commented Jul 16, 2025

Using a GraphQL response extension to mark "fatal" errors that should be reported as a top-level {errors} key in the subscriptions chunk, while letting non-fatal errors through as {payload: {errors}} chunks. In the past, we tried to detect this "magically" by seeing if the error is the final response in the stream, which isn't 100% accurate and can also be flaky depending on when the subgraph closes the connection.

I wanted to use a Result type here to enforce that fatal errors would be marked and handled correctly, but that would be a breaking change to our response types and every plugin. So instead, responses that are subscription fatal errors have a GraphQL extension apollo::subscriptions::fatal_error. The multipart serialisation code uses only that extension to decide how to format the response.

Todos

  • Update the test expectations
  • test_subscription_ws_passthrough_pure_error_payload appears to (still) be flaky with these changes

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@github-actions

This comment has been minimized.

@apollo-librarian
Copy link

apollo-librarian bot commented Jul 29, 2025

✅ Docs preview ready

The preview is ready to be viewed. View the preview

File Changes

0 new, 1 changed, 0 removed
* graphos/routing/(latest)/observability/telemetry/instrumentation/standard-instruments.mdx

Build ID: 1470f4ce5f4afc4301812492
Build Logs: View logs

URL: https://www.apollographql.com/docs/deploy-preview/1470f4ce5f4afc4301812492

@@ -115,27 +116,33 @@ impl Stream for Multipart {

match self.mode {
ProtocolMode::Subscription => {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the actual change happens.
Now, we decide whether to send payload or errors in the response exclusively based on the SUBSCRIPTION_ERROR_EXTENSION_KEY being true.

The code is reordered a bit (special handling for graceful server-side close is moved earlier) so we aren't creating another special-meaning internal message (SubscriptionPayload with payload being an empty object/null...)

@goto-bus-stop goto-bus-stop marked this pull request as ready for review July 29, 2025 13:45
@goto-bus-stop goto-bus-stop requested a review from a team July 29, 2025 13:45
if !is_still_open
&& response.data.is_none()
&& response.errors.is_empty()
&& response.extensions.is_empty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the significance of extensions being empty? If something other than the subscriptions error key was in extensions then will that break graceful close?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if extensions is not empty then we want to return the response

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially I'm trying to mimick what the previous code does. In the previous code, resp.payload.is_none() would only be true if both data and extensions were null/none.

I'm not sure if this is like actually required, to be honest. The subscriptions graceful close response can only have extensions if a plugin or user code added them.

@goto-bus-stop goto-bus-stop requested a review from a team as a code owner July 30, 2025 09:22
@goto-bus-stop goto-bus-stop enabled auto-merge (squash) July 30, 2025 10:02
@goto-bus-stop goto-bus-stop changed the title Fix fatal error / propagated error handling in subscriptions Reliably distinguish GraphQL errors and transport errors in subscriptions Jul 30, 2025
@goto-bus-stop goto-bus-stop disabled auto-merge July 30, 2025 10:02
@goto-bus-stop goto-bus-stop enabled auto-merge (squash) July 30, 2025 10:03
@goto-bus-stop goto-bus-stop merged commit 93ed732 into dev Jul 30, 2025
15 checks passed
@goto-bus-stop goto-bus-stop deleted the renee/ROUTER-1343 branch July 30, 2025 10:17
@lrlna lrlna mentioned this pull request Aug 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants