fix(resolve): fix flaky singleflight deduplication tests#1393
fix(resolve): fix flaky singleflight deduplication tests#1393
Conversation
The tests used a followersEntered channel that signaled before followers actually called GetOrCreate/LoadOrStore. Under the race detector, the leader could finish and delete the singleflight key before followers entered, causing them to start fresh instead of deduplicating. Add followerCount atomic counter to InflightRequest and poll it in tests to confirm all followers have registered before releasing the data source. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The followerCount atomic.Int32 already tracks follower presence. Replace the mutex-guarded HasFollowers bool with followerCount.Load() > 0, removing two struct fields and ~10 lines of lock/unlock code with identical correctness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReplaced mutex-guarded follower flag in InflightRequest with an atomic follower counter and accessor methods; updated GetOrCreate/FinishOk to use the atomic counter. Tests and helpers were modified to poll HasFollowers/waitForFollowerCount or use small gating channels instead of channel-based follower signaling. Minor test timing adjustments. Changes
Sequence Diagram(s)(Skipped — changes are internal synchronization refactors and test synchronization adjustments, not a new multi-component control flow that requires a sequence diagram.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
v2/pkg/engine/resolve/resolve_test.go (1)
169-183: Consider adding a tiny backoff to the polling loop.The tight
Gosched()loop can burn CPU on slow CI. A minimal sleep keeps behavior deterministic while reducing spin.♻️ Suggested tweak
default: - runtime.Gosched() + runtime.Gosched() + time.Sleep(time.Millisecond) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@v2/pkg/engine/resolve/resolve_test.go` around lines 169 - 183, The polling loop in waitForFollowerCount spins with runtime.Gosched(), which can waste CPU on slow CI; change the loop to use a tiny backoff (e.g., time.Sleep for 1–5ms) instead of Gosched() while preserving the deadline and the check against inflight.followerCount.Load() (and the use of findAnyInflight and t.Fatal on timeout) so the test remains deterministic but less CPU-intensive.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@v2/pkg/engine/resolve/resolve_test.go`:
- Around line 169-183: The polling loop in waitForFollowerCount spins with
runtime.Gosched(), which can waste CPU on slow CI; change the loop to use a tiny
backoff (e.g., time.Sleep for 1–5ms) instead of Gosched() while preserving the
deadline and the check against inflight.followerCount.Load() (and the use of
findAnyInflight and t.Fatal on timeout) so the test remains deterministic but
less CPU-intensive.
The "two subscriptions to the same trigger" test was flaky because the data source's emitting goroutine could send counter=0 before sub2's addSubscription event was processed on the unbuffered events channel. Gate the data source start via the onStart callback until sub2 is registered. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Go 1.24+ panics when t.Fail() is called from a goroutine after the test has completed. Two sendChatMutation calls were launched in goroutines that could outlive their subtests. Call them synchronously instead — the HTTP request completes quickly and doesn't need to be async. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dkorittki
left a comment
There was a problem hiding this comment.
Just some minor questions/suggestions, otherwise LGTM
Replace runtime.Gosched() with time.Sleep(10ms) to reduce CPU usage in test polling loops. Increase deadline from 1s to 3s to accommodate slow CI runners. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…quest Wrap the atomic followerCount operations behind AddFollower() and HasFollowers() methods to keep production code cleaner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@v2/pkg/engine/resolve/resolve_test.go`:
- Line 186: This file (resolve_test.go) fails gci/imports formatting; re-run the
import and formatting tools (gci and goimports/gofmt) on
v2/pkg/engine/resolve/resolve_test.go to fix import ordering and formatting so
the CI lint passes—ensure imports are grouped/ordered per gci and files are
formatted (e.g., run gci then goimports/gofmt) and re-run tests to confirm no
other import-related changes are needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
🤖 I have created a release *beep* *boop* --- ## [2.0.0-rc.252](v2.0.0-rc.251...v2.0.0-rc.252) (2026-02-19) ### Features * forward headers to grpc subgraphs ([#1382](#1382)) ([8459b34](8459b34)) ### Bug Fixes * **resolve:** fix flaky singleflight deduplication tests ([#1393](#1393)) ([4105082](4105082)) * **resolve:** guard OnFinished against nil loaderHookContext on skipped fetches ([#1394](#1394)) ([f79d071](f79d071)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [1.8.1](execution/v1.8.0...execution/v1.8.1) (2026-02-19) ### Bug Fixes * **resolve:** fix flaky singleflight deduplication tests ([#1393](#1393)) ([4105082](4105082)) * **resolve:** guard OnFinished against nil loaderHookContext on skipped fetches ([#1394](#1394)) ([f79d071](f79d071)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Summary by CodeRabbit
Refactor
Tests
The inbound request singleflight tests used a channel-based synchronization pattern that signalled "entered" before followers actually called
GetOrCreate. Under-race, the leader could finish and delete the singleflight key before followers registered, causing them to start fresh requests instead of deduplicating — makingTestResolver_ArenaResolveGraphQLResponse_RequestDeduplicationand two related tests flaky.Replace
HasFollowers bool+Mu sync.MutexonInflightRequestwith a singlefollowerCount atomic.Int32, incremented insideGetOrCreateat the exact point a follower joins. Tests now poll this counter to guarantee all followers have registered before the leader is unblocked, eliminating the race entirely.Checklist