Skip to content

fix: delete leftover heartbeat connections#1033

Merged
df-wg merged 8 commits intomasterfrom
dave/eng-6234-multipart-write-error-when-router-loses-its-connection-with
Jan 23, 2025
Merged

fix: delete leftover heartbeat connections#1033
df-wg merged 8 commits intomasterfrom
dave/eng-6234-multipart-write-error-when-router-loses-its-connection-with

Conversation

@df-wg
Copy link
Copy Markdown
Contributor

@df-wg df-wg commented Jan 17, 2025

Users reported seeing the below sigfault when a connection to a subgraph is interrupted over multipart:

cosmo-router     | 21:07:16 PM ERROR core/graphql_handler.go:380 Unable to write error response {"hostname": "ad1991cdcfd2", "pid": 1, "component": "@wundergraph/router", "service_version": "0.158.0", "request_id": "ad1991cdcfd2/RTnCLg1J8M-000022", "trace_id": "77476c6fe295cd7dafeb71e00183879b", "error": "context canceled"}
cosmo-router     | github.com/wundergraph/cosmo/router/core.(*GraphQLHandler).WriteError
cosmo-router     |      github.com/wundergraph/cosmo/router/core/graphql_handler.go:380
cosmo-router     | github.com/wundergraph/graphql-go-tools/v2/pkg/engine/resolve.(*Resolver).handleHeartbeat.func1
cosmo-router     |      github.com/wundergraph/graphql-go-tools/v2@v2.0.0-rc.136/pkg/engine/resolve/resolve.go:431

After investigating, the cause seemed to be a number of times where we deleted the subscription trigger but didn't clean up the heartbeat (which is running in a separate thread), causing it to write on a non-existent context. This PR cleans that up

Copy link
Copy Markdown
Collaborator

@StarpTech StarpTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, looks reasonable to me. Can we add a test for it here or in the router?

Comment thread v2/pkg/engine/resolve/resolve.go Outdated
Comment thread v2/pkg/engine/resolve/resolve.go Outdated
Comment thread v2/pkg/engine/resolve/resolve.go Outdated
@df-wg df-wg requested review from Noroth and StarpTech January 20, 2025 19:38
Copy link
Copy Markdown
Contributor

@Noroth Noroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Do you think we should have one more test with 50 subscriptions at once and make sure that all of them are deleted properly?

e.g.:
require.Eventually

@df-wg df-wg requested a review from Noroth January 21, 2025 19:01
@df-wg
Copy link
Copy Markdown
Contributor Author

df-wg commented Jan 21, 2025

Good point @Noroth , added a test like that

Comment thread v2/pkg/engine/resolve/resolve_test.go
@df-wg df-wg merged commit f7492d3 into master Jan 23, 2025
@df-wg df-wg deleted the dave/eng-6234-multipart-write-error-when-router-loses-its-connection-with branch January 23, 2025 07:20
df-wg pushed a commit that referenced this pull request Jan 23, 2025
🤖 I have created a release *beep* *boop*
---


##
[2.0.0-rc.143](v2.0.0-rc.142...v2.0.0-rc.143)
(2025-01-23)


### Bug Fixes

* delete leftover heartbeat connections
([#1033](#1033))
([f7492d3](f7492d3))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants