Conversation
WalkthroughAdded non-blocking enqueue behavior for unsubscribe paths by deferring event posting to goroutines when the events channel is full; updated AsyncResolveGraphQLSubscription to use a named error return and to perform pre-enqueue shutdown/client-disconnect checks within a select before emitting AddSubscription events. (≤50 words) Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
v2/pkg/engine/resolve/resolve.go (1)
983-997: Same goroutine-fan-out risk; apply the same bounded-forwarder pattern.Mirror the fix from AsyncUnsubscribeSubscription to avoid unbounded goroutine creation here as well. Also tweak comment grammar.
- // In the event we cannot insert immediately, defer insertion a goroutine, this should prevent deadlocks, at the cost of goroutine creation. + // If we cannot insert immediately, defer insertion via a goroutine to avoid deadlocks, at the cost of a goroutine.
🧹 Nitpick comments (1)
v2/pkg/engine/resolve/resolve.go (1)
1155-1171: Also short‑circuit on client ctx cancellation to avoid enqueuing a dead subscription.Add ctx.ctx.Done() to the select to skip creating a subscription if the client already disconnected. Keeps work off the event loop under churn.
select { -case <-r.ctx.Done(): +case <-r.ctx.Done(): // Stop resolving if the resolver is shutting down return r.ctx.Err() +case <-ctx.ctx.Done(): + // Client went away before we could enqueue the subscription + return ctx.ctx.Err() case r.events <- subscriptionEvent{
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
v2/pkg/engine/resolve/resolve.go(4 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and test (go 1.25 / windows-latest)
- GitHub Check: Build and test (go 1.25 / ubuntu-latest)
- GitHub Check: Build and test (go 1.25 / windows-latest)
- GitHub Check: Build and test (go 1.25 / ubuntu-latest)
🔇 Additional comments (1)
v2/pkg/engine/resolve/resolve.go (1)
957-969: ```shell
#!/bin/bashSafe, tolerant inspection script (avoids failing if utilities are missing)
echo "PWD: $(pwd)"
echoFILE="v2/pkg/engine/resolve/resolve.go"
if [ -f "$FILE" ]; then
echo "=== Excerpt: $FILE (lines 930-1000) ==="
awk 'NR>=930 && NR<=1000 {printf "%6d %s\n", NR, $0}' "$FILE" || true
else
echo "File not found: $FILE"
fi
echoSearch helper: prefer rg, fall back to grep
run_search() {
pattern="$1"
label="$2"
echo "=== Search ($label): $pattern ==="
if command -v rg >/dev/null 2>&1; then
rg -n --hidden -S --glob 'v2/**' "$pattern" || true
else
# fallback to grep (may produce more noise)
grep -R --line-number -n --exclude-dir=.git -E "$pattern" v2 || true
fi
echo
}run_search '\btype\s+Resolver\b' "type Resolver"
run_search 'type\s+Resolver' "type Resolver (loose)"
run_search '\bevents\b' "events identifier"
run_search '\br.events\b' "r.events usages"
run_search '\bdeferred\b' "deferred identifier"
run_search '\bpostEventNonBlocking\b' "postEventNonBlocking helper"
run_search '\bsubscriptionEvent\b' "subscriptionEvent type/usages"
run_search 'subscriptionEventKindRemoveSubscription' "specific kind"
run_search 'make(\schan\s+subscriptionEvent' "make(chan subscriptionEvent"
run_search 'go func()\s{[^}]r.events\s<' "goroutine sending to r.events (rg-only regex)"</blockquote></details> </blockquote></details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
v2/pkg/engine/resolve/resolve.go (1)
350-359: Unsubscribe on heartbeat/work error — LGTM; consider coalescing repeats.This correctly breaks circular waits. Optional: guard repeated calls (e.g.,
sub.unsubOnce.CompareAndSwap(false, true)) to avoid spamming events while removal is pending.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
v2/pkg/engine/resolve/resolve.go(14 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
v2/pkg/engine/resolve/resolve.go (4)
v2/pkg/engine/resolve/response.go (2)
SubscriptionCloseKindGoingAway(64-64)SubscriptionCloseKind(56-59)v2/pkg/engine/resolve/context.go (1)
Context(16-35)v2/pkg/graphqlerrors/response.go (1)
Response(9-14)pkg/engine/resolve/resolve.go (1)
Fetches(318-318)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and test (go 1.25 / ubuntu-latest)
- GitHub Check: Build and test (go 1.25 / windows-latest)
- GitHub Check: Build and test (go 1.25 / windows-latest)
- GitHub Check: Build and test (go 1.25 / ubuntu-latest)
🔇 Additional comments (9)
v2/pkg/engine/resolve/resolve.go (9)
303-305: workItem returns error — LGTMThe change to
fn func() erroris consistent with the new error‑propagation path from workers.
384-387: Unsubscribe on work error — LGTMErroring work item triggers async removal as intended.
397-406: complete() now returns error — LGTMSignature aligns with
workItem.fn. Returningnilis fine sincewriter.Complete()has no error.
409-418: close(kind) now returns error — LGTMMatches the new worker contract;
writer.Close(kind)is still void, returningnilhere is appropriate.
420-483: Error propagation from subscription update — LGTM; verify metrics on failure.Behavior is correct. Minor: you call
reporter.SubscriptionUpdateSent()even on init/load/resolve errors. Confirm if “sent” should include failed attempts; otherwise introduce a separate metric.
532-562: Heartbeat returns error — LGTM; check metric semantics.Counting heartbeats via
SubscriptionUpdateSent()may inflate “updates sent”. Confirm intent or split metrics.
803-809: Good: closure captures and error propagation.Capturing
c, s := c, savoids the loop‑var trap; returning the error to trigger unsubscribe is correct.
914-916: Close work item wiring — LGTMWrapping
s.close(closeKind)to matchfunc() erroris correct.
1170-1189: Async subscription early‑exit checks — LGTMGood inline send with cancel checks for resolver/client contexts before enqueue.
78dfefe to
02e215f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (5)
v2/pkg/engine/resolve/resolve.go (5)
350-358: Fail-fast heartbeat/work → evented unsubscribe — correct and safe.Scheduling via AsyncUnsubscribeSubscription avoids races and honors the loop’s invariants. Consider a tiny debug hook for reason codes (heartbeat vs write) if you need postmortems.
420-483: executeSubscriptionUpdate now returns errors; unsubscribe on flush error — solid.
- Timeout-scoped ctx, input copy, and worker-threaded writes are correct.
- Minor: SubscriptionUpdateSent is incremented on error paths (init/load/resolve). If this metric means “success,” consider splitting into Sent/Failed (or Attempted).
532-561: handleHeartbeat() returns error and checks ctxs — correct threading discipline.Heartbeat increments SubscriptionUpdateSent; if that metric is intended for payload updates only, consider a separate heartbeat counter to avoid skew.
999-1011: Same non-blocking defer for client-wide unsubscribe — consistent.Consider using the same enqueueEventNonBlocking helper to avoid duplication.
972-984: Approve — de-duplicate non-blocking enqueue into helperPer-call goroutine fallback is fine and matches team guidance. I ran ripgrep for AsyncUnsubscribeSubscription and found calls only from subscription worker goroutines and tests (resolve.go:352, 357, 386, 467; resolve_test.go:5270, 5439) — none inside processEvents/handleEvent, so no self-send deadlock risk.
Replace the duplicated goroutine block with:
default: - // In the event we cannot insert immediately, defer insertion a goroutine, this should prevent deadlocks, at the cost of goroutine creation. - go func() { - select { - case <-r.ctx.Done(): - return - case r.events <- subscriptionEvent{ - id: id, - kind: subscriptionEventKindRemoveSubscription, - }: - } - }() + r.enqueueEventNonBlocking(subscriptionEvent{ + id: id, + kind: subscriptionEventKindRemoveSubscription, + })Add once in this file:
func (r *Resolver) enqueueEventNonBlocking(ev subscriptionEvent) { go func() { select { case <-r.ctx.Done(): return case r.events <- ev: } }() }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
v2/pkg/engine/resolve/resolve.go(14 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: endigma
PR: wundergraph/graphql-go-tools#1298
File: v2/pkg/engine/resolve/resolve.go:999-1011
Timestamp: 2025-09-19T14:50:19.506Z
Learning: In the graphql-go-tools resolver, when handling backpressure for async unsubscribe operations, the team prefers using individual goroutines over bounded deferrer implementations. The rationale is that bounded implementations can also overflow, and the preferred solution for overload is scaling the deployment rather than internal buffering. The current approach prevents deadlocks at the cost of goroutine creation, which is an acceptable tradeoff.
Learnt from: endigma
PR: wundergraph/graphql-go-tools#1298
File: v2/pkg/engine/resolve/resolve.go:339-347
Timestamp: 2025-09-19T14:51:33.690Z
Learning: In the graphql-go-tools resolver, the event loop follows a strict single-threaded design where all subscription lifecycle management (add/remove from triggers) must happen through the events channel to the main processEvents() goroutine. Worker goroutines should not directly modify subscription state or call unsubscribe methods, as this would break the single-threaded event loop architecture and create race conditions.
📚 Learning: 2025-09-19T14:51:33.690Z
Learnt from: endigma
PR: wundergraph/graphql-go-tools#1298
File: v2/pkg/engine/resolve/resolve.go:339-347
Timestamp: 2025-09-19T14:51:33.690Z
Learning: In the graphql-go-tools resolver, the event loop follows a strict single-threaded design where all subscription lifecycle management (add/remove from triggers) must happen through the events channel to the main processEvents() goroutine. Worker goroutines should not directly modify subscription state or call unsubscribe methods, as this would break the single-threaded event loop architecture and create race conditions.
Applied to files:
v2/pkg/engine/resolve/resolve.go
📚 Learning: 2025-09-19T14:50:19.506Z
Learnt from: endigma
PR: wundergraph/graphql-go-tools#1298
File: v2/pkg/engine/resolve/resolve.go:999-1011
Timestamp: 2025-09-19T14:50:19.506Z
Learning: In the graphql-go-tools resolver, when handling backpressure for async unsubscribe operations, the team prefers using individual goroutines over bounded deferrer implementations. The rationale is that bounded implementations can also overflow, and the preferred solution for overload is scaling the deployment rather than internal buffering. The current approach prevents deadlocks at the cost of goroutine creation, which is an acceptable tradeoff.
Applied to files:
v2/pkg/engine/resolve/resolve.go
🧬 Code graph analysis (1)
v2/pkg/engine/resolve/resolve.go (3)
v2/pkg/engine/resolve/response.go (2)
SubscriptionCloseKindGoingAway(64-64)SubscriptionCloseKind(56-59)v2/pkg/engine/resolve/context.go (1)
Context(16-35)v2/pkg/graphqlerrors/response.go (1)
Response(9-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and test (go 1.25 / windows-latest)
- GitHub Check: Build and test (go 1.25 / ubuntu-latest)
- GitHub Check: Build and test (go 1.25 / ubuntu-latest)
- GitHub Check: Build and test (go 1.25 / windows-latest)
🔇 Additional comments (7)
v2/pkg/engine/resolve/resolve.go (7)
373-387: No-heartbeat worker mirrors error/close handling — looks good.Parity with heartbeat path maintained.
397-406: complete() now returns error (always nil) — fine for uniformity with work items.Keeps signatures consistent; no behavior change.
803-809: Enqueue update as workItem with error propagation — correct place for writer-only ops.Error path writes via asyncErrorWriter and lets worker trigger unsubscribe.
914-916: Close via closure conforms to new workItem signature — good.Keeps all writer calls on the worker goroutine and marks final.
409-418: Approved — no double-close risk found.complete() and close() both defer close(s.completed), but the code ensures only one terminal action runs: the event-loop enqueues exactly one terminal workItem (complete or close) then closes s.workChan, and if the worker invokes s.close directly (context/resolver done) it returns before processing queued terminal work — no path executes close twice while the worker is active.
303-305: Approve — workItem initializations all wrap func() error.
Verified: in v2/pkg/engine/resolve/resolve.go the workItem usages are workItem{fn, false} (fn := func() error {...}), workItem{s.complete, true} (complete() returns error), and workItem{func() error { return s.close(closeKind) }, true).
1170-1189: Approve: async add-subscription with early-exit is correct.Inline select prevents stale sends and respects resolver/client cancellation. Verified: processEvents/handleEvent do not invoke AsyncUnsubscribeSubscription/AsyncUnsubscribeClient/AsyncResolveGraphQLSubscription; Async* calls live outside the single-threaded event loop (subscription worker/flush paths).
This reverts commit 02e215f.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
v2/pkg/engine/resolve/resolve.go (2)
958-958: Comment grammar nit.Tweak wording for clarity.
- // In the event we cannot insert immediately, defer insertion a goroutine, this should prevent deadlocks, at the cost of goroutine creation. + // If the channel is full, defer insertion to a goroutine to prevent deadlocks (at the cost of goroutine creation).
985-985: Comment grammar nit.Same clarity tweak as above.
- // In the event we cannot insert immediately, defer insertion a goroutine, this should prevent deadlocks, at the cost of goroutine creation. + // If the channel is full, defer insertion to a goroutine to prevent deadlocks (at the cost of goroutine creation).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
v2/pkg/engine/resolve/resolve.go(4 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: endigma
PR: wundergraph/graphql-go-tools#1298
File: v2/pkg/engine/resolve/resolve.go:999-1011
Timestamp: 2025-09-19T14:50:19.506Z
Learning: In the graphql-go-tools resolver, when handling backpressure for async unsubscribe operations, the team prefers using individual goroutines over bounded deferrer implementations. The rationale is that bounded implementations can also overflow, and the preferred solution for overload is scaling the deployment rather than internal buffering. The current approach prevents deadlocks at the cost of goroutine creation, which is an acceptable tradeoff.
Learnt from: endigma
PR: wundergraph/graphql-go-tools#1298
File: v2/pkg/engine/resolve/resolve.go:339-347
Timestamp: 2025-09-19T14:51:33.690Z
Learning: In the graphql-go-tools resolver, the event loop follows a strict single-threaded design where all subscription lifecycle management (add/remove from triggers) must happen through the events channel to the main processEvents() goroutine. Worker goroutines should not directly modify subscription state or call unsubscribe methods, as this would break the single-threaded event loop architecture and create race conditions.
📚 Learning: 2025-09-19T14:50:19.506Z
Learnt from: endigma
PR: wundergraph/graphql-go-tools#1298
File: v2/pkg/engine/resolve/resolve.go:999-1011
Timestamp: 2025-09-19T14:50:19.506Z
Learning: In the graphql-go-tools resolver, when handling backpressure for async unsubscribe operations, the team prefers using individual goroutines over bounded deferrer implementations. The rationale is that bounded implementations can also overflow, and the preferred solution for overload is scaling the deployment rather than internal buffering. The current approach prevents deadlocks at the cost of goroutine creation, which is an acceptable tradeoff.
Applied to files:
v2/pkg/engine/resolve/resolve.go
📚 Learning: 2025-09-19T14:51:33.690Z
Learnt from: endigma
PR: wundergraph/graphql-go-tools#1298
File: v2/pkg/engine/resolve/resolve.go:339-347
Timestamp: 2025-09-19T14:51:33.690Z
Learning: In the graphql-go-tools resolver, the event loop follows a strict single-threaded design where all subscription lifecycle management (add/remove from triggers) must happen through the events channel to the main processEvents() goroutine. Worker goroutines should not directly modify subscription state or call unsubscribe methods, as this would break the single-threaded event loop architecture and create race conditions.
Applied to files:
v2/pkg/engine/resolve/resolve.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Build and test (go 1.25 / windows-latest)
- GitHub Check: Build and test (go 1.25 / ubuntu-latest)
- GitHub Check: Build and test (go 1.25 / ubuntu-latest)
- GitHub Check: Build and test (go 1.25 / windows-latest)
🔇 Additional comments (4)
v2/pkg/engine/resolve/resolve.go (4)
957-968: Non-blocking deferral on unsubscribe looks good.This aligns with the single-threaded event-loop design and the team’s stated preference to prevent deadlocks by deferring via per-call goroutines under backpressure.
983-996: Non-blocking deferral on client-unsubscribe looks good.Consistent with the approach above and with the event-loop ownership model.
1155-1173: Pre-enqueue shutdown/disconnect checks are solid; confirm blocking send is intentional.This still blocks on r.events (no default/goroutine fallback). If that’s by design for add-subscription semantics, all good; otherwise consider mirroring the non-blocking pattern.
1101-1105: Verify: sync unsubscribe uses a blocking send.This path directly sends to r.events without a non-blocking fallback. Is this safe in all shutdown/teardown scenarios, or should it also use a select with ctx/r.ctx (or defer via goroutine) to avoid potential stalls?
🤖 I have created a release *beep* *boop* --- ## [2.0.0-rc.227](v2.0.0-rc.226...v2.0.0-rc.227) (2025-09-23) ### Bug Fixes * if blocked, defer async event insertion to prevent deadlocks ([#1298](#1298)) ([df38c31](df38c31)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Resolved potential deadlocks by deferring async event insertion when operations are blocked, improving stability and responsiveness under load. * **Documentation** * Updated changelog for version 2.0.0-rc.227. * **Chores** * Bumped version to 2.0.0-rc.227. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary by CodeRabbit
Bug Fixes
Performance
Chores
Checklist