fix: fix flaky engine subscription tests#1318
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughSynchronizes GraphQL subscription startup in tests by adding startup hooks, a coordination channel, and a wait group. Test logic now waits for both subscriptions to register before emitting messages, restructures fake stream wiring, updates assertions for ordering, and expands cases to cover multi-startup sequences and cross-recorder interactions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
v2/pkg/engine/resolve/resolve_test.go (2)
5560-5561: Minor: Make UniqueRequestID explicit (consistency with next test)Here uniqueRequestFn returns nil without writing to the digest (produces the same zero-hash). Consider writing a constant (like "unique") to xxh, as you do in the next test, for clarity and to avoid accidental collisions if other cases appear.
Example:
fakeStream.uniqueRequestFn = func(ctx *Context, input []byte, xxh *xxhash.Digest) error { _, err := xxh.WriteString("unique") return err }
5620-5622: Optional: fail fast on final assertsUsing require.True would stop immediately on failure and print recorder messages as you already pass them.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
v2/pkg/engine/resolve/resolve_test.go(4 hunks)
🔇 Additional comments (6)
v2/pkg/engine/resolve/resolve_test.go (6)
5539-5544: Deterministic gating of stream emission: LGTMBlocking message production on streamCanStart and asserting onStart input makes the test deterministic and fixes the race.
Also applies to: 5546-5547
5549-5558: Startup hook handling is correctUsing defer startupHookWaitGroup.Done ensures both hooks count down; atomic flag avoids duplicate Updater calls. Good.
5606-5607: Readability improvement: LGTMCollecting recorders into a slice simplifies the loop below.
5633-5651: Second test synchronization: LGTM
- Gating messageFn on streamCanStart is solid.
- Using subscription-on-start to push the first message to both subscribers verifies proper registration.
- Explicit UniqueRequestID hashing with "unique" is clear.
5686-5692: Wait for initial messages before releasing the stream: LGTMAwaiting any message from both recorders proves both are registered before sending the final stream message.
5696-5701: Order validations after synchronization: LGTMBoth recorders getting [1000, 0] confirms correct sequencing post‑sync.
|
@dkorittki since I am not familiar with |
|
@ysmolski Yeah sorry you got auto selected as the reviewer. The best person to do this is @alepane21 . He´s already on it |
@coderabbitai summary
Checklist
Context
Some notes first:
I spotted two tests failing on Github Actions due to race conditions. They work locally and are CPU timings related.
Those two tests are
SubscriptionOnStart ctx updater only updates the right subscriptionSubscriptionOnStart ctx updater on multiple subscriptions with same trigger workstest 1:
There is a race condition going on. Here is the output of the test on Github runners with engine logs enabled.
As you can see recorder 2 misses its one expected message. The reason is that we update the trigger with the counter=0 message (line 8) before the second subscriber is added (line 9). So it misses the message. This happens because in the test we don't wait for the subscriber to finish registration on the trigger before sending the counter=0 message. Now we actually wait for that.
test 2:
Kind of the same error. Here is the engine debug output from a failing Github Actions run:
As you can see recorder 2 misses the counter=0 message. Both are expected to have the same messages in the same order. Both recorders have the counter=1000 message, which is delivered via subscription-on-start hook but recorder 2 misses the counter=0 message, delivered via fake stream. The count=0 message is delivered (line 8) before recorder 2 is subscribed (line 9). This happens because in this test, like in the other, we don't wait for the recorders to finish subscribing to the trigger, and sending off the counter=0 messages via fake stream early. Its fixed by waiting for a complete subscription.