Skip to content

chore: deflake epochs mbps test#21003

Merged
spalladino merged 1 commit intomerge-train/spartanfrom
mr/deflake-epochs-mbps
Mar 2, 2026
Merged

chore: deflake epochs mbps test#21003
spalladino merged 1 commit intomerge-train/spartanfrom
mr/deflake-epochs-mbps

Conversation

@mrzeszutko
Copy link
Contributor

Summary

  • Fixes flaky epochs_mbps.parallel test by adding a retryUntil poll to assertMultipleBlocksPerSlot, closing a race condition between two independently-syncing archivers

Details

The epochs_mbps.parallel test has been flaking in CI (9 recent failures across PRs 20562-20868) on the "checkpointed block" test case. The root cause is a race condition:

  1. waitForTx polls the initial setup node's archiver and returns when it sees the tx as CHECKPOINTED.
  2. assertMultipleBlocksPerSlot then queries the first validator node's (nodes[0]) archiver via archiver.getCheckpoints().
  3. These are different nodes with independent L1 polling cycles (~50ms interval each).
  4. The first validator's archiver may not have indexed the latest checkpoint yet (~200-400ms race window).

CI logs confirm: the checkpoint with the expected block count is always produced and published to L1, but the first validator's archiver hasn't indexed it when the assertion runs.

Fix

Added a retryUntil poll at the start of assertMultipleBlocksPerSlot that waits (up to L2_SLOT_DURATION_IN_S * 3 = 108s, polling every 0.5s) for nodes[0]'s archiver to index a checkpoint with at least targetBlockCount blocks. Once found, the existing validation logic runs as before.

Fixes A-594

Copy link
Contributor

@spalladino spalladino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch again!

@spalladino spalladino enabled auto-merge (squash) March 2, 2026 14:28
@spalladino spalladino merged commit 1ccca76 into merge-train/spartan Mar 2, 2026
10 checks passed
@spalladino spalladino deleted the mr/deflake-epochs-mbps branch March 2, 2026 14:39
github-merge-queue bot pushed a commit that referenced this pull request Mar 3, 2026
BEGIN_COMMIT_OVERRIDE
fix: track last seen nonce in case of stale fallback L1 RPC node
(#20855)
feat: Validate num txs in block proposals (#20850)
fix(archiver): enforce checkpoint boundary on rollbackTo (#20908)
fix: tps zero metrics (#20656)
fix: handle scientific notation in bigintConfigHelper (#20929)
feat(aztec): node enters standby mode on genesis root mismatch (#20938)
fix: logging of class instances (#20807)
feat(slasher): make slash grace period relative to rollup upgrade time
(#20942)
chore: add script to find PRs to backport (#20956)
chore: remove unused prover-node dep (#20955)
fix: increase minFeePadding in e2e_bot bridge resume tests and harden
GasFees.mul() (#20962)
feat(sequencer): (A-526) rotate publishers when send fails (#20888)
chore: (A-554) bump reth version 1.6.0 -> 1.11.1 for eth devnet (#20889)
chore: metric on how many epochs validator has been on committee
(#20967)
fix: set wallet minFeePadding in BotFactory constructor (#20992)
chore: deflake epoch invalidate block test (#21001)
chore(sequencer): e2e tests for invalid signature recovery in checkpoint
attestations (#20971)
chore: deflake duplicate proposals and attestations (#20990)
chore: deflake epochs mbps test (#21003)
feat: reenable function selectors in txPublicSetupAllowList (#20909)
fix: limit offenses when voting in tally slashing mode by
slashMaxPayloadSize (#20683)
fix(spartan): wire SEQ_L1_PUBLISHING_TIME_ALLOWANCE_IN_SLOT env var
(#21017)
END_COMMIT_OVERRIDE
johnathan79717 pushed a commit that referenced this pull request Mar 4, 2026
## Summary

- Fixes flaky `epochs_mbps.parallel` test by adding a `retryUntil` poll
to `assertMultipleBlocksPerSlot`, closing a race condition between two
independently-syncing archivers

## Details

The `epochs_mbps.parallel` test has been flaking in CI (9 recent
failures across PRs 20562-20868) on the "checkpointed block" test case.
The root cause is a race condition:

1. `waitForTx` polls the **initial setup node's** archiver and returns
when it sees the tx as `CHECKPOINTED`.
2. `assertMultipleBlocksPerSlot` then queries the **first validator
node's** (`nodes[0]`) archiver via `archiver.getCheckpoints()`.
3. These are different nodes with independent L1 polling cycles (~50ms
interval each).
4. The first validator's archiver may not have indexed the latest
checkpoint yet (~200-400ms race window).

CI logs confirm: the checkpoint with the expected block count is always
produced and published to L1, but the first validator's archiver hasn't
indexed it when the assertion runs.

### Fix

Added a `retryUntil` poll at the start of `assertMultipleBlocksPerSlot`
that waits (up to `L2_SLOT_DURATION_IN_S * 3` = 108s, polling every
0.5s) for `nodes[0]`'s archiver to index a checkpoint with at least
`targetBlockCount` blocks. Once found, the existing validation logic
runs as before.

Fixes A-594
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants