Skip to content

Connection cancellation race#8169

Merged
BrynCooke merged 4 commits intodevfrom
bryn/connection_cancellation_race
Sep 5, 2025
Merged

Connection cancellation race#8169
BrynCooke merged 4 commits intodevfrom
bryn/connection_cancellation_race

Conversation

@BrynCooke
Copy link
Contributor

@BrynCooke BrynCooke commented Sep 1, 2025

Previously we were using Notify to indicate to connections that they should shut down when a reload happens. However, this creates a race condition where if a connection is being established at the moment that shutdown is called the new connection will never receive the shutdown notification. Notify only notifies current waiters and will not immediately resolve if new waiters are added.

This PR switches to CancellationToken which once cancelled will always yield cancelled immediately.

This would manifest for users as ever growing memory use over hot reloads as a connection could potentially hold onto a pipeline forever.

Note: there is no unit test
The reason for this is that to trgger this it requires an enormous number of iterations, it's not possible to trigger it as part of a standard CI run.
I have reproduced by introducing sleeps at critical points which allows reproducing of a connection being created after the point where the the connection has shut down.


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • PR description explains the motivation for the change and relevant context for reviewing
  • PR description links appropriate GitHub/Jira tickets (creating when necessary)
  • Changeset is included for user-facing changes
  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Metrics and logs are added3 and documented
  • Tests added and passing4
    • Unit tests
    • Integration tests
    • Manual tests, as necessary

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. A lot of (if not most) features benefit from built-in observability and debug-level logs. Please read this guidance on metrics best-practices.

  4. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

@apollo-librarian
Copy link

apollo-librarian bot commented Sep 1, 2025

✅ Docs preview ready

The preview is ready to be viewed. View the preview

File Changes

0 new, 2 changed, 0 removed
* graphos/routing/(latest)/observability/client-id-enforcement.mdx
* graphos/routing/(latest)/errors.mdx

Build ID: 47fe91528b4560d3b32621a2
Build Logs: View logs

URL: https://www.apollographql.com/docs/deploy-preview/47fe91528b4560d3b32621a2

@github-actions

This comment has been minimized.

@BrynCooke BrynCooke force-pushed the bryn/connection_cancellation_race branch from 4c3f00c to dac90b0 Compare September 1, 2025 11:20
@BrynCooke BrynCooke marked this pull request as ready for review September 1, 2025 11:20
@BrynCooke BrynCooke requested a review from a team September 1, 2025 11:20
@BrynCooke BrynCooke requested a review from a team as a code owner September 1, 2025 11:20
@BrynCooke BrynCooke requested review from bnjjj and lrlna September 1, 2025 11:20
@BrynCooke BrynCooke force-pushed the bryn/connection_cancellation_race branch 2 times, most recently from 3d56e5a to bd2106d Compare September 1, 2025 12:22
Previously we were using Notify to indicate to connections that they should shut down when a reload happens. However, this creates a race condition where if a connection is being established at the moment that shutdown is called the new connection will never recieve the shutdown notification.

This PR switches to CancellationToken which once cancelled will always yield cancelled immediatey.
@BrynCooke BrynCooke force-pushed the bryn/connection_cancellation_race branch from bd2106d to 618eaf7 Compare September 1, 2025 12:27
@abernix
Copy link
Member

abernix commented Sep 2, 2025

@BrynCooke is this a candidate for @mergifyio backport 2.6.1 ? Or just 2.7.0?

@BrynCooke
Copy link
Contributor Author

BrynCooke commented Sep 3, 2025

@abernix I think we could backport it. Will ping you separately.

let received_first_request = $received_first_request;
tokio::pin!(connection);
tokio::select! {
// the connection finished first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the order doesn't matter unless biased; is used: https://docs.rs/tokio/latest/tokio/macro.select.html#fairness

did you change this only for readability or did you intend to change the priority of each branch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had actually changed it because I had thought that this was important. I'll revert the ordering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also revert the change in order for the select.
@BrynCooke BrynCooke merged commit efa9c72 into dev Sep 5, 2025
15 checks passed
@BrynCooke BrynCooke deleted the bryn/connection_cancellation_race branch September 5, 2025 08:04
@BrynCooke BrynCooke added the backport-1.x Backport this PR to 1.x label Sep 5, 2025
BrynCooke added a commit that referenced this pull request Sep 5, 2025
Co-authored-by: bryn <bryn@apollographql.com>
(cherry picked from commit efa9c72)
@abernix
Copy link
Member

abernix commented Sep 5, 2025

@mergify backport 2.6.2

@mergify
Copy link
Contributor

mergify bot commented Sep 5, 2025

backport 2.6.2

✅ Backports have been created

Details

BrynCooke added a commit that referenced this pull request Sep 5, 2025
Co-authored-by: bryn <bryn@apollographql.com>
(cherry picked from commit efa9c72)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-1.x Backport this PR to 1.x

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants