Skip to content

fix: (A-649) tx collector bench test#21619

Merged
PhilWindle merged 1 commit intomerge-train/spartanfrom
danielntmd/fix-tx-collector-bench
Mar 17, 2026
Merged

fix: (A-649) tx collector bench test#21619
PhilWindle merged 1 commit intomerge-train/spartanfrom
danielntmd/fix-tx-collector-bench

Conversation

@danielntmd
Copy link
Contributor

Wire peerFailedBanTimeMs as new env and set tx collector test ban time to 5 minutes -> 5 seconds.

The test would flake due to timeout and aggregation of peers took 1 full minute on attempting to get peers per subtest despite never obtaining all peers. This is because the peer dial is serialized and limited to 5 for this test and peers may dial repeatedetly without success then get banned for 5 minutes, never being able to reconnect within the 1 minute wait. This should allow all peers to connect in time and lower the 1 minute timeout, resulting in less timeouts overall for the test.

Wire peerFailedBanTimeMs as new env and set tx collector test ban time to 5 minutes -> 5 seconds.
The test would flake due to timeout and aggregation of peers took 1 full minute on attempting to get peers per subtest despite never obtaining all peers. This is because the peer dial is serialized and limited to 5 for this test and peers may dial repeatedetly without success then get banned for 5 minutes, never being able to reconnect within the 1 minute wait. This should allow all peers to connect in time and lower the 1 minute timeout, resulting in less timeouts overall for the test.
@danielntmd danielntmd added the ci-full Run all master checks. label Mar 16, 2026
@danielntmd
Copy link
Contributor Author

No more 1 minute timeouts, looks like peers can connect quickly now.

http://ci.aztec-labs.com/2ceb66ad519806a9

@danielntmd
Copy link
Contributor Author

Test went from 20 minutes -> 3.5 minutes

@PhilWindle PhilWindle merged commit e141dbe into merge-train/spartan Mar 17, 2026
24 checks passed
@PhilWindle PhilWindle deleted the danielntmd/fix-tx-collector-bench branch March 17, 2026 08:54
github-merge-queue bot pushed a commit that referenced this pull request Mar 18, 2026
BEGIN_COMMIT_OVERRIDE
fix(p2p): fall back to maxTxsPerCheckpoint for per-block tx validation
(#21605)
chore: fixing M3 devcontainer builds (#21611)
fix: clamp finalized block to oldest available in world-state (#21643)
chore: fix proving logs script (#21335)
fix: (A-649) tx collector bench test (#21619)
fix(validator): process block proposals from own validator keys in HA
setups (#21603)
fix: add bounds when allocating arrays in deserialization (#21622)
fix: skip handleChainFinalized when block is behind oldest available
(#21656)
chore: demote finalized block skip log to trace (#21661)
fix: skip -march auto-detection for cross-compilation presets (#21356)
chore: revert "add bounds when allocating arrays in deserialization"
(#21622) (#21666)
fix: capture txs not available error reason in proposal handler (#21670)
fix: estimate gas in bot and make BatchCall.simulate() return
SimulationResult (#21676)
fix: prevent HA peer proposals from blocking equivocation in duplicate
proposal test (#21673)
fix(p2p): penalize peers for errors during response reading (#21680)
feat(sequencer): add build-ahead config and metrics (#20779)
chore: fixing build on mac (#21685)
fix: HA deadlock for last block edge case (#21690)
fix: process all contract classes in storeBroadcastedIndividualFunctions
(A-683) (#21686)
chore: add slack success post on nightly scenario (#21701)
fix(builder): persist contractsDB across blocks within a checkpoint
(#21520)
fix: only delete logs from rolled-back blocks, not entire tag (A-686)
(#21687)
chore(p2p): lower attestation pool per-slot caps to 2 (#21709)
chore(p2p): remove unused method (#21678)
fix(p2p): penalize peer on tx rejected by pool (#21677)
fix(test): workaround slow mock creation (#21708)
fix(sequencer): fix checkpoint budget redistribution for multi-block
slots (#21692)
fix: batch checkpoint unwinding in handleEpochPrune (A-690) (#21668)
fix(sequencer): add missing opts arg to checkpoint_builder tests
(#21733)
fix: race condition in fast tx collection (#21496)
fix: increase default postgres disk size from 1Gi to 10Gi (#21741)
fix: update batch_tx_requester tests to use RequestTracker (#21734)
chore: replace dead BOOTSTRAP_TO env var with bootstrap.sh build arg
(#21744)
fix(sequencer): extract gas and blob configs from valid requests only
(A-677) (#21747)
fix: deflake attempt for l1_tx_utils (#21743)
fix(test): fix flaky keystore reload test (#21749)
fix(test): fix flaky duplicate_attestation_slash test (#21753)
feat(pipeline): introduce pipeline views for building (#21026)
END_COMMIT_OVERRIDE
AztecBot added a commit that referenced this pull request Mar 20, 2026
…llelize_strict error swallowing

Two interacting bugs caused 255 benchmarks (bb-micro-bench, l1-contracts,
sol verifiers, noir circuit gate counts, p2p, prover-client) to stop being
published since 2026-03-18:

1. PR #21619 changed p2p bench workers to import `jest-mock-extended`, which
   crashes when the workers run outside Jest (as Node worker processes). This
   caused the p2p bench to fail immediately.

2. `parallelize_strict` ran `run_tests | tee` which put `run_tests` in a
   subshell. When the first benchmark failed, the subshell exited but the main
   script saw no jobs and exited 0 — silently swallowing the failure and
   preventing all subsequent benchmarks from running.

Fixes:
- Replace `mock<EpochCache>()` with `createMockEpochCache()` in both bench
  worker files (no Jest dependency needed)
- Use process substitution `> >(tee $output)` instead of pipe so `run_tests`
  stays in the main shell and background jobs remain trackable
- Continue scheduling remaining benchmarks on failure instead of aborting
- Report failures at exit but still collect partial results via bench_engine
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-full Run all master checks.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants