Skip to content

fix(op-service): increase Anvil startup timeout 5s → 30s#19424

Merged
smartcontracts merged 1 commit intodevelopfrom
fix/anvil-startup-timeout
Mar 6, 2026
Merged

fix(op-service): increase Anvil startup timeout 5s → 30s#19424
smartcontracts merged 1 commit intodevelopfrom
fix/anvil-startup-timeout

Conversation

@smartcontracts
Copy link
Copy Markdown
Contributor

Summary

  • Increases the Anvil startup timeout in op-service/testutils/devnet/anvil.go from 5 seconds to 30 seconds
  • Fixes TestImplementations and TestSuperchain in op-deployer/pkg/deployer/bootstrap — the Add Utils for Full Node Provider #2 and Add new Documentation #3 most frequent flakes in the repo over the last 7 days (67 and 46 incidences respectively)

Root Cause

Anvil.Start() waits for Anvil to print "Listening on 127.0.0.1" before returning. With a 5s timeout, this races against CI load: when 12 parallel nodes share a 2xlarge box, Anvil occasionally takes >5s to initialize, producing:

failed to start Anvil: anvil did not start in time

The test retries usually recover (job passes), but about 1 in N runs exhausts retries and the job fails hard — as seen in job #4491716.

Verification

Reproduced locally by setting timeout to 1ms → confirmed anvil did not start in time. With 30s → Anvil starts successfully every time.

🤖 Generated with Claude Code

The 5s timeout is too tight under CI load. When 12 parallel test nodes
compete for CPU/IO on a 2xlarge box, Anvil sometimes takes >5s to print
its "Listening on" line, triggering "anvil did not start in time" in
TestImplementations and TestSuperchain. This is the #2 and #3 most
frequent flake in the repo over the last 7 days (67 and 46 incidences).

30s gives Anvil enough headroom on a loaded machine while still failing
fast on a genuine startup failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@smartcontracts smartcontracts requested a review from a team as a code owner March 6, 2026 02:12
Copy link
Copy Markdown
Contributor

@joshklop joshklop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really should not be using anvil in unit tests...

@smartcontracts smartcontracts added this pull request to the merge queue Mar 6, 2026
Merged via the queue into develop with commit a3a933a Mar 6, 2026
75 checks passed
@smartcontracts smartcontracts deleted the fix/anvil-startup-timeout branch March 6, 2026 17:42
ajsutton added a commit that referenced this pull request Mar 7, 2026
- Rebase onto develop (picks up anvil 30s timeout fix from #19424)
- Change RPCReplayOrRecord to prefer cached fixtures over live RPC
- Add RPCReplayModePassthrough for fallback without recording side-effects
- Pin all tests using DefaultForkedScriptHost to block 10101510
  (one block past v6.0.0-rc.2 OPCM deployment) for deterministic fixtures
- Add rpc-fixtures-record and rpc-fixtures-verify Makefile targets
- Add rpc-fixture-refresh CI job that records, verifies, and caches
  fixtures on every develop push with Slack notification on failure
- Add fixture cache restore to go-tests job template
- Gitignore fixture JSON files to prevent accidental commits

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ajsutton added a commit that referenced this pull request Mar 17, 2026
…laky test

The TestImplementations test (66 flakes) relies on the devnet RetryProxy
to forward RPC calls to external mainnet/sepolia endpoints. Under CI load
(12 parallel nodes on a 2xlarge), the proxy's tight timeouts caused
intermittent failures:

- Per-request timeout of 5s was insufficient for slow external RPCs under
  load; increased to 30s (matching the Anvil startup timeout bump in #19424)
- Max retries of 5 was too few for sustained rate limiting; increased to 10
- Start() had a race condition: used a 100ms timer instead of a proper
  ready signal, and failed to return after net.Listen errors. Replaced with
  a channel-based ready signal that blocks until the listener is actually
  bound.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 17, 2026
…laky test (#19593)

The TestImplementations test (66 flakes) relies on the devnet RetryProxy
to forward RPC calls to external mainnet/sepolia endpoints. Under CI load
(12 parallel nodes on a 2xlarge), the proxy's tight timeouts caused
intermittent failures:

- Per-request timeout of 5s was insufficient for slow external RPCs under
  load; increased to 30s (matching the Anvil startup timeout bump in #19424)
- Max retries of 5 was too few for sustained rate limiting; increased to 10
- Start() had a race condition: used a 100ms timer instead of a proper
  ready signal, and failed to return after net.Listen errors. Replaced with
  a channel-based ready signal that blocks until the listener is actually
  bound.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants