Skip to content

fix(op-service): harden RetryProxy to stabilize TestImplementations#19593

Merged
wwared merged 1 commit intodevelopfrom
aj/fix/flake-test-implementations
Mar 17, 2026
Merged

fix(op-service): harden RetryProxy to stabilize TestImplementations#19593
wwared merged 1 commit intodevelopfrom
aj/fix/flake-test-implementations

Conversation

@ajsutton
Copy link
Copy Markdown
Contributor

@ajsutton ajsutton commented Mar 17, 2026

Summary

Fixes #19583

  • Root cause: RetryProxy had overly tight timeouts (5s per-request, 5 max retries) for CI conditions with 12 parallel nodes hitting external RPCs. Also had a race condition in Start() using a timer instead of a ready signal.
  • Why flaky (not always failing): External RPC response times vary with CI load and rate limiting. Under light load, 5s is enough. Under heavy load (parallel test execution), requests timeout intermittently.
  • Fix: Per-request timeout 5s→30s, max retries 5→10, channel-based ready signal, fixed missing return after listen failure.
  • Flake count: 66 (highest after TestCleanShutdown)

Test plan

  • go build ./op-service/... passes
  • CI passes on this PR

🤖 Generated with Claude Code

…laky test

The TestImplementations test (66 flakes) relies on the devnet RetryProxy
to forward RPC calls to external mainnet/sepolia endpoints. Under CI load
(12 parallel nodes on a 2xlarge), the proxy's tight timeouts caused
intermittent failures:

- Per-request timeout of 5s was insufficient for slow external RPCs under
  load; increased to 30s (matching the Anvil startup timeout bump in #19424)
- Max retries of 5 was too few for sustained rate limiting; increased to 10
- Start() had a race condition: used a 100ms timer instead of a proper
  ready signal, and failed to return after net.Listen errors. Replaced with
  a channel-based ready signal that blocks until the listener is actually
  bound.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ajsutton ajsutton marked this pull request as ready for review March 17, 2026 07:06
@ajsutton ajsutton requested a review from a team as a code owner March 17, 2026 07:06
@geoknee geoknee added this pull request to the merge queue Mar 17, 2026
@wwared wwared removed this pull request from the merge queue due to a manual request Mar 17, 2026
@wwared wwared added this pull request to the merge queue Mar 17, 2026
@wwared wwared removed this pull request from the merge queue due to a manual request Mar 17, 2026
@wwared wwared added this pull request to the merge queue Mar 17, 2026
@wwared wwared removed this pull request from the merge queue due to a manual request Mar 17, 2026
@wwared wwared added this pull request to the merge queue Mar 17, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to no response for status checks Mar 17, 2026
@wwared wwared added this pull request to the merge queue Mar 17, 2026
Merged via the queue into develop with commit 13cd061 Mar 17, 2026
97 checks passed
@wwared wwared deleted the aj/fix/flake-test-implementations branch March 17, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flaky test: TestImplementations

3 participants