Connection cancellation race (backport #8169)#8210
Merged
Conversation
Co-authored-by: bryn <bryn@apollographql.com> (cherry picked from commit efa9c72)
10 tasks
✅ Docs preview readyThe preview is ready to be viewed. View the preview File Changes 0 new, 11 changed, 0 removedBuild ID: 1f341c106939643e41c475ee URL: https://www.apollographql.com/docs/deploy-preview/1f341c106939643e41c475ee |
abernix
approved these changes
Sep 5, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Previously we were using Notify to indicate to connections that they should shut down when a reload happens. However, this creates a race condition where if a connection is being established at the moment that shutdown is called the new connection will never receive the shutdown notification. Notify only notifies current waiters and will not immediately resolve if new waiters are added.
This PR switches to CancellationToken which once cancelled will always yield cancelled immediately.
This would manifest for users as ever growing memory use over hot reloads as a connection could potentially hold onto a pipeline forever.
Note: there is no unit test
The reason for this is that to trgger this it requires an enormous number of iterations, it's not possible to trigger it as part of a standard CI run.
I have reproduced by introducing sleeps at critical points which allows reproducing of a connection being created after the point where the the connection has shut down.
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
[ROUTER-1427]: https://apollographql.atlassian.net/browse/ROUTER-1427?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
This is an automatic backport of pull request #8169 done by Mergify.
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
A lot of (if not most) features benefit from built-in observability and
debug-level logs. Please read this guidance on metrics best-practices. ↩Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩