Skip to content

fix(ci): upgrade rust-e2e-large-kona-sequencer to 2xlarge resource class#19552

Closed
ajsutton wants to merge 1 commit intodevelopfrom
aj/fix/rust-e2e-large-kona-sequencer-flake
Closed

fix(ci): upgrade rust-e2e-large-kona-sequencer to 2xlarge resource class#19552
ajsutton wants to merge 1 commit intodevelopfrom
aj/fix/rust-e2e-large-kona-sequencer-flake

Conversation

@ajsutton
Copy link
Contributor

@ajsutton ajsutton commented Mar 15, 2026

Summary

Upgrade the rust-e2e-large-kona-sequencer CI job from xlarge (8 vCPU, 16GB) to 2xlarge (16 vCPU, 32GB) to fix a flaky TestSyncSafe failure.

Root Cause

The large-kona-sequencer devnet config spins up ~20+ processes: 9 L2 CL nodes (1 kona sequencer + 4 kona-reth validators + 4 kona-geth validators), 9 L2 EL nodes, plus L1, batcher, and supervisor. On xlarge, resource contention causes kona-node engine API calls to time out:

ERROR component=kona-node error="Request timeout"
ERROR component=kona-node error="ErrorObject { code: ServerError(13), message: \"Request timeout\", data: None }"
WARN  Failed to insert new payload: server returned an error response: error code 13: Request timeout

When payload insertions fail, the safe head can't advance, and TestSyncSafe exhausts its retry budget (80 attempts * 2s = 160s) with:

operation failed permanently after 80 attempts: expected head to advance: local-safe

This failure was previously invisible at the test level because the rust-e2e jobs didn't report JUnit XML results to CircleCI. It showed up only as a job-level failure. #19548 adds store_test_results to make these visible.

Change

Added a resource_class parameter to rust-e2e-sysgo-tests (defaults to xlarge so existing variants are unchanged) and split large-kona-sequencer out of the matrix with resource_class: 2xlarge.

Test plan

  • rust-e2e-large-kona-sequencer passes on 2xlarge
  • Other rust-e2e variants unaffected (still xlarge)

🤖 Generated with Claude Code

The large-kona-sequencer devnet config spins up 9 nodes (1 kona
sequencer with reth + 4 kona validators with reth + 4 kona validators
with geth). On xlarge (8 vCPU, 16GB RAM) the container is consistently
killed by the OOM killer before any test step runs, producing failures
with no visible failed steps.

Split large-kona-sequencer out of the matrix and give it 2xlarge
(16 vCPU, 32GB RAM) to accommodate the memory requirements of the
full devnet.

Fixes flaky failures on #19548 (job 4581496)
and #19525 (job 4581006).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ajsutton ajsutton closed this Mar 15, 2026
@ajsutton ajsutton reopened this Mar 15, 2026
@ajsutton ajsutton marked this pull request as ready for review March 15, 2026 01:13
@ajsutton ajsutton requested a review from a team as a code owner March 15, 2026 01:13
Copy link
Member

@theochap theochap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also reduce the number of validator nodes in the large kona sequencer preset

@ajsutton
Copy link
Contributor Author

Claude: Closing in favor of removing the variant entirely. It runs the same tests as simple-kona-sequencer with the same logic — just more nodes iterating the same slices. The extra coverage isn't meaningful enough to justify 2xlarge or the flake risk.

@ajsutton ajsutton closed this Mar 15, 2026
@codecov
Copy link

codecov bot commented Mar 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 75.5%. Comparing base (1e2b976) to head (1aad2d7).
⚠️ Report is 4 commits behind head on develop.

Additional details and impacted files
@@            Coverage Diff             @@
##           develop   #19552     +/-   ##
==========================================
- Coverage     75.5%    75.5%   -0.1%     
==========================================
  Files          675      675             
  Lines        71571    71571             
==========================================
- Hits         54079    54073      -6     
- Misses       17348    17354      +6     
  Partials       144      144             
Flag Coverage Δ
cannon-go-tests-64 66.4% <ø> (ø)
contracts-bedrock-tests 80.2% <ø> (ø)
unit 75.5% <ø> (-0.1%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants