fix: fix flaky engine subscription tests by dkorittki · Pull Request #1318 · wundergraph/graphql-go-tools

dkorittki · 2025-10-09T16:54:32Z

@coderabbitai summary

Checklist

I have discussed my proposed changes in an issue and have received approval to proceed.
I have followed the coding standards of the project.
Tests or benchmarks have been added or updated.

Context

Some notes first:

Its only happening for tests introduced on the cosmo streams topic branch
It seems to be a race condition in tests rather than actual engine code

I spotted two tests failing on Github Actions due to race conditions. They work locally and are CPU timings related.

Those two tests are

test 1 SubscriptionOnStart ctx updater only updates the right subscription
test 2 SubscriptionOnStart ctx updater on multiple subscriptions with same trigger works

test 1:

There is a race condition going on. Here is the output of the test on Github runners with engine logs enabled.

resolver:trigger:subscription:add:17241709254077376921:1
resolver:create:trigger:17241709254077376921
resolver:trigger:start:17241709254077376921
resolver:subscription_updater:update:17241709254077376921
resolver:trigger:initialized:17241709254077376921
resolver:subscription_updater:update:17241709254077376921
resolver:trigger:subscription:update:17241709254077376921:1,1
resolver:trigger:update:17241709254077376921
resolver:trigger:subscription:add:17241709254077376921:2
resolver:trigger:subscription:added:17241709254077376921:2
resolver:trigger:subscription:update:1
resolver:trigger:subscription:flushed:1
resolver:trigger:subscription:update:1
resolver:trigger:subscription:flushed:1
resolver:trigger:started:17241709254077376921
resolver:subscription_updater:complete:17241709254077376921
resolver:subscription_updater:complete:sent_event:17241709254077376921
resolver:trigger:complete:17241709254077376921
resolver:trigger:complete:17241709254077376921
resolver:trigger:subscription:closed:17241709254077376921:1
resolver:trigger:subscription:closed:17241709254077376921:2

recorder 1 messages: [{"data":{"counter":1000}} {"data":{"counter":0}}]
recorder 2 messages: []

As you can see recorder 2 misses its one expected message. The reason is that we update the trigger with the counter=0 message (line 8) before the second subscriber is added (line 9). So it misses the message. This happens because in the test we don't wait for the subscriber to finish registration on the trigger before sending the counter=0 message. Now we actually wait for that.

test 2:

Kind of the same error. Here is the engine debug output from a failing Github Actions run:

resolver:trigger:subscription:add:15889878720417707388:1
resolver:create:trigger:15889878720417707388
resolver:trigger:start:15889878720417707388
resolver:subscription_updater:update:15889878720417707388
resolver:trigger:initialized:15889878720417707388
resolver:subscription_updater:update:15889878720417707388
resolver:trigger:subscription:update:15889878720417707388:1,1
resolver:trigger:update:15889878720417707388
resolver:trigger:subscription:add:15889878720417707388:2
resolver:trigger:subscription:added:15889878720417707388:2
resolver:subscription_updater:update:15889878720417707388
resolver:trigger:subscription:update:15889878720417707388:1,2
resolver:trigger:subscription:update:2
resolver:trigger:started:15889878720417707388
resolver:trigger:subscription:update:1
resolver:trigger:subscription:flushed:2
resolver:trigger:subscription:flushed:1
resolver:trigger:subscription:update:1
resolver:trigger:subscription:flushed:1
resolver:subscription_updater:complete:15889878720417707388
resolver:subscription_updater:complete:sent_event:15889878720417707388
resolver:trigger:complete:15889878720417707388
resolver:trigger:complete:15889878720417707388
resolver:trigger:subscription:closed:15889878720417707388:1
resolver:trigger:subscription:closed:15889878720417707388:2

recorder 1 messages: [{"data":{"counter":1000}} {"data":{"counter":0}}]
recorder 2 messages: [{"data":{"counter":1000}}]

As you can see recorder 2 misses the counter=0 message. Both are expected to have the same messages in the same order. Both recorders have the counter=1000 message, which is delivered via subscription-on-start hook but recorder 2 misses the counter=0 message, delivered via fake stream. The count=0 message is delivered (line 8) before recorder 2 is subscribed (line 9). This happens because in this test, like in the other, we don't wait for the recorders to finish subscribing to the trigger, and sending off the counter=0 messages via fake stream early. Its fixed by waiting for a complete subscription.

coderabbitai · 2025-10-09T16:54:59Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Synchronizes GraphQL subscription startup in tests by adding startup hooks, a coordination channel, and a wait group. Test logic now waits for both subscriptions to register before emitting messages, restructures fake stream wiring, updates assertions for ordering, and expands cases to cover multi-startup sequences and cross-recorder interactions.

Changes

Cohort / File(s)	Summary
Resolve tests synchronization `v2/pkg/engine/resolve/resolve_test.go`	Added startup synchronization for subscriptions (startup hooks, wait group, `streamCanStart`), introduced `onStart`/`subscriptionOnStart` handlers updating shared state via Updater, delayed first message until all subscriptions are ready, adjusted assertions for ordered delivery, and expanded test coverage around multi-recorder sequencing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: implement missing subscription updater methods #1310 — Implements missing resolve.SubscriptionUpdater methods that the new test startup hooks rely on, aligning with the updater usage added in these resolve tests.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title succinctly captures the primary change, which is fixing flaky engine subscription tests, and directly aligns with the synchronization updates described in the summary. It clearly conveys the PR’s purpose without extraneous details or vague language. A reviewer scanning the list will immediately understand that the focus is on making subscription tests reliable.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

dkorittki · 2025-10-13T15:49:16Z

@coderabbitai review

coderabbitai · 2025-10-13T15:49:27Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

v2/pkg/engine/resolve/resolve_test.go (2)
5560-5561: Minor: Make UniqueRequestID explicit (consistency with next test)

Here uniqueRequestFn returns nil without writing to the digest (produces the same zero-hash). Consider writing a constant (like "unique") to xxh, as you do in the next test, for clarity and to avoid accidental collisions if other cases appear.

Example:
fakeStream.uniqueRequestFn = func(ctx *Context, input []byte, xxh *xxhash.Digest) error {
    _, err := xxh.WriteString("unique")
    return err
}
5620-5622: Optional: fail fast on final asserts

Using require.True would stop immediately on failure and print recorder messages as you already pass them.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba9a827 and 62d6e70.

📒 Files selected for processing (1)

v2/pkg/engine/resolve/resolve_test.go (4 hunks)

🔇 Additional comments (6)

v2/pkg/engine/resolve/resolve_test.go (6)

5539-5544: Deterministic gating of stream emission: LGTM

Blocking message production on streamCanStart and asserting onStart input makes the test deterministic and fixes the race.

Also applies to: 5546-5547

5549-5558: Startup hook handling is correct

Using defer startupHookWaitGroup.Done ensures both hooks count down; atomic flag avoids duplicate Updater calls. Good.

5606-5607: Readability improvement: LGTM

Collecting recorders into a slice simplifies the loop below.

5633-5651: Second test synchronization: LGTM

Gating messageFn on streamCanStart is solid.

Using subscription-on-start to push the first message to both subscribers verifies proper registration.

Explicit UniqueRequestID hashing with "unique" is clear.

5686-5692: Wait for initial messages before releasing the stream: LGTM

Awaiting any message from both recorders proves both are registered before sending the final stream message.

5696-5701: Order validations after synchronization: LGTM

Both recorders getting [1000, 0] confirms correct sequencing post‑sync.

ysmolski · 2025-10-14T13:10:48Z

@dkorittki since I am not familiar with [topic/streams-v1](https://github.com/wundergraph/graphql-go-tools/tree/topic/streams-v1) should I even try to review it? If not then who is the good reviewer for it?

dkorittki · 2025-10-15T12:29:09Z

@ysmolski Yeah sorry you got auto selected as the reviewer. The best person to do this is @alepane21 . He´s already on it

chore: add debug prints

1f5453f

dkorittki changed the base branch from master to topic/streams-v1 October 9, 2025 16:54

dkorittki added 8 commits October 10, 2025 10:43

chore: temporarily enable resolver debug logs

77a4043

chore: add debug print to cancellation call

f3a2f72

fix: wait for messages in recorder first

0a6a390

fix: Wait for subscription init before sending message

736ec9f

chore: refactor test but keep logic

cce9e26

chore: add debug prints to failing test

93637a3

fix: Wait for hooks to finish before sending messages

5424bd3

chore: Remove debug messages, enrich assert logs

62d6e70

dkorittki marked this pull request as ready for review October 13, 2025 15:48

dkorittki requested review from Noroth, StarpTech, devsergiy, jensneuse and ysmolski as code owners October 13, 2025 15:48

coderabbitai Bot reviewed Oct 13, 2025

View reviewed changes

Comment thread v2/pkg/engine/resolve/resolve_test.go

chore: time out test when waiting too long

16ab617

dkorittki requested a review from alepane21 October 15, 2025 12:29

alepane21 approved these changes Oct 16, 2025

View reviewed changes

dkorittki merged commit 1353de9 into topic/streams-v1 Oct 16, 2025
10 checks passed

dkorittki deleted the dominik/eng-8289-fix-flaky-engine-subscription-tests branch October 16, 2025 13:57

coderabbitai Bot mentioned this pull request Nov 10, 2025

feat: allows hook in the subscriptions #1309

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix flaky engine subscription tests#1318

fix: fix flaky engine subscription tests#1318
dkorittki merged 10 commits intotopic/streams-v1from
dominik/eng-8289-fix-flaky-engine-subscription-tests

dkorittki commented Oct 9, 2025 •

edited

Loading

Uh oh!

coderabbitai Bot commented Oct 9, 2025 •

edited

Loading

Review skipped

Uh oh!

dkorittki commented Oct 13, 2025

Uh oh!

coderabbitai Bot commented Oct 13, 2025

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

ysmolski commented Oct 14, 2025

Uh oh!

dkorittki commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dkorittki commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Context

test 1:

test 2:

Uh oh!

coderabbitai Bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Pre-merge checks and finishing touches

Uh oh!

dkorittki commented Oct 13, 2025

Uh oh!

coderabbitai Bot commented Oct 13, 2025

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ysmolski commented Oct 14, 2025

Uh oh!

dkorittki commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dkorittki commented Oct 9, 2025 •

edited

Loading

coderabbitai Bot commented Oct 9, 2025 •

edited

Loading