chore: standalone forge broadcast wrapper with retry, timeout, and anvil detection by ludamad · Pull Request #19824 · AztecProtocol/aztec-packages

ludamad · 2026-01-21T19:23:03Z

(I confirm I have edited and reviewed this AI summary)

Summary

Replaces ad-hoc --batch-size / --timeout flags across shell scripts and TypeScript with a single standalone wrapper script (l1-contracts/scripts/forge_broadcast.ts) that handles all forge broadcast reliability concerns in one place:

Anvil workaround — We control the timeout ourselves, as foundry is unreliable around broadcast timeouts. With anvil, retries clear broadcast artifacts and start from scratch (anvil's automine can strand batched transactions in the mempool). On real chains, uses --resume.
Separated verification — strips --verify from broadcast attempts so the timeout only covers transaction landing. After broadcast succeeds, runs forge script --resume --verify --broadcast with no timeout. Verification failure is logged but doesn't affect the exit code (transactions already landed). This prevents two problems: (1) timeout killing forge mid-verification even though all transactions mined, and (2) Etherscan failures triggering broadcast retries.

Files changed

File	Change
`l1-contracts/scripts/forge_broadcast.ts`	New standalone wrapper script
`l1-contracts/package.json`	Add `"type": "module"` for ESM
`l1-contracts/scripts/run_rollup_upgrade.sh`	Use forge_broadcast.ts
`l1-contracts/scripts/test_rollup_upgrade.sh`	Use forge_broadcast.ts
`yarn-project/ethereum/src/deploy_aztec_l1_contracts.ts`	Spawn forge_broadcast.ts instead of forge directly; remove inline retry/timeout logic
`yarn-project/l1-artifacts/scripts/copy-foundry-artifacts.sh`	Compile forge_broadcast.ts to JS with swc (Node refuses .ts from node_modules)

Stress test results

50,000 deploy+upgrade cycles against local anvil instances (20 parallel workers, half instant-mining, half --block-time 1):

Metric	Value
Total runs	46,484
Pass	46,484 (100%)
Fail	0
Retries needed	13 (0.03%)
Retry recovery	13/13 (100%)

All 13 retries occurred on instant-mining anvil workers (the automine race condition). Zero retries on --block-time workers. All recovered successfully. Previous attempts could not recover on those rare hangs.

Test plan

Existing CI tests cover L1 deployment paths (deploy + upgrade)
l1-contracts/scripts/test_rollup_upgrade.sh exercises the full deploy+upgrade flow end-to-end
Stress tested with 50,000 runs across 20 parallel anvil instances

bootstrap.sh

…port Replace inline retry logic and direct forge calls with a standalone TypeScript wrapper script (l1-contracts/scripts/forge_broadcast.ts). Key behaviors: - --batch-size 8 to prevent forge broadcast hangs - External timeout (forge's --timeout is unreliable for broadcast hangs) - On anvil: detect via web3_clientVersion, retry from scratch on failure (anvil's auto-miner race condition strands txs in mempool) - On real chains: retry with --resume to pick up unmined transactions - Buffers stdout per attempt, only emits the successful attempt's output - Waits for stdout drain before exit to avoid pipe truncation References: - foundry-rs/foundry#6796 (batch size hang) - foundry-rs/foundry#8919 (anvil auto-miner race)

… exit emitAndExit uses process.stdout.write with a callback to call process.exit, which is async. Without awaiting, execution falls through to subsequent log lines and retry loops before process.exit fires. Change return type to Promise<never> and await all call sites.

Move chain-specific timeout logic into forge_broadcast.ts itself — queries eth_chainId at startup and selects 300s for mainnet/sepolia, 50s for everything else. FORGE_BROADCAST_TIMEOUT env var still works as an override. Remove getForgeBroadcastTimeout from callers. Also fixes issues from code review: - rpcCall now rejects on JSON-RPC error responses (was silently resolving undefined, which could cause false positives in verifyBroadcastOnChain) - rpcCall has a 10s timeout to prevent wrapper's own RPC calls from hanging indefinitely - Remove incorrect foundry issue #8919 link (was a code refactoring PR, not about anvil) - Remove unnecessary try/catch around rmSync with force:true - Replace magic exit code 124 with named EXIT_TIMEOUT constant

AztecBot · 2026-02-09T12:46:16Z

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/a3c2164037e5135d�a3c2164037e5135d8;;�):  yarn-project/end-to-end/scripts/run_test.sh simple src/e2e_epochs/epochs_mbps.parallel.test.ts "builds multiple blocks per slot with transactions anchored to proposed blocks" (220s) (code: 1) group:e2e-p2p-epoch-flakes

The on-chain nonce verification logic was never triggered as a save in 46,484 stress test runs. On anvil (where all retries occurred), it's useless because broadcast artifacts are cleared before retry. The nonce heuristic is also unreliable — a higher nonce doesn't prove those specific transactions were mined. Remove to simplify.

The script has a shebang that handles --experimental-strip-types, so spawn it directly instead of via node. It also doesn't need to be copied to the temp dir — it just spawns forge which inherits cwd.

Strip --verify from broadcast attempts and run it as a separate step after transactions land. Forge runs verification AFTER all receipts are collected (crates/script/src/lib.rs:333-338) and exits non-zero if any verification fails (crates/script/src/verify.rs), even when all transactions succeeded. This caused two problems: 1. The broadcast timeout could kill forge mid-verification, wasting the attempt even though all transactions were already mined. 2. A verification failure (e.g. Etherscan rate limit) triggered broadcast retries, resubmitting already-mined transactions. After a successful broadcast, we now run `forge script --resume --verify --broadcast` without a timeout. The --resume path re-compiles and re-links using libraries from the broadcast artifacts (crates/script/src/build.rs CompiledState::resume) — it doesn't need simulation data. The broadcast step is a no-op since all receipts already exist. Verification failure is logged but doesn't affect the exit code, since the critical work (transaction landing) already succeeded.

ludamad · 2026-02-09T14:27:24Z

l1-contracts/scripts/forge_broadcast.js

+#!/usr/bin/env -S node --experimental-strip-types
+
+// forge_broadcast.ts - Reliable forge script broadcast with retry and timeout.
+//


I confirm that I have reviewed this carefully and have high confidence

Node v22 treats .ts files as CJS by default. The forge_broadcast.ts script uses ESM imports, which requires "type": "module" in the nearest package.json. Without copying l1-contracts/package.json, Node finds l1-artifacts/package.json instead (no "type": "module") and fails to parse the imports.

Node.js refuses to load .ts files from node_modules even with --experimental-strip-types. Compile with swc during copy-foundry-artifacts and invoke via process.execPath from deploy code.

Outdated

Move swc compilation from copy-foundry-artifacts.sh into l1-contracts bootstrap so the .js is built at the source. Add @swc/cli and @swc/core as devDependencies. copy-foundry-artifacts.sh now just copies the pre-built .js file.

socket-security · 2026-02-09T22:25:49Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff	Package	Supply Chain Security	Vulnerability	Quality	Maintenance	License
	npm/ox@0.8.3 ⏵ 0.8.9			⁺¹	^-1

View full report

Node.js refuses to load .ts files from node_modules, so make the script plain JavaScript instead of compiling with swc. Removes @swc/cli and @swc/core devDependencies from l1-contracts.

…atching

alexghr

looks good!

alexghr · 2026-02-11T10:01:38Z

l1-contracts/scripts/forge_broadcast.js

+  return new Promise((resolve, reject) => {
+    const url = new URL(rpcUrl);
+    const body = JSON.stringify({ jsonrpc: "2.0", id: 1, method, params });
+    const reqFn = url.protocol === "https:" ? httpsRequest : httpRequest;


fetch is available in node. I think it would simplify the code greatly (built-in stream handling, timeouts, promise support etc).

alexghr · 2026-02-11T10:03:39Z

l1-contracts/scripts/forge_broadcast.js

+const MAX_RETRIES = parseInt(
+  process.env.FORGE_BROADCAST_MAX_RETRIES ?? "3",
+  10,
+);


this can still return NaN. Might be worth just adding a quick check

Suggested change

const MAX_RETRIES = parseInt(

process.env.FORGE_BROADCAST_MAX_RETRIES ?? "3",

10,

);

const MAX_RETRIES = parseInt(

process.env.FORGE_BROADCAST_MAX_RETRIES ?? "3",

10,

);

if (!Number.isSafeInteger(MAX_RETRIES)) {

process.stderr.write(`MAX_RETRIES is not a valid integer.\n`);

process.exit(1);

}

I guess it's not too bad because the way it's being used (attempt <= MAX_RETRIES) is safe, ie. it will never run an attempt. I was worried this could cause it to retry indefinitely

alexghr · 2026-02-11T10:07:48Z

l1-contracts/scripts/forge_broadcast.js

+    const timer = setTimeout(() => {
+      timedOut = true;
+      proc.kill("SIGTERM");
+      killTimer = setTimeout(() => proc.kill("SIGKILL"), KILL_GRACE);
+    }, timeoutSecs * 1000);


FWIW spawn takes signal: AbortSignal, timeout: number, killSignal: number to automatically kill the process. It might be enough to just SIGKILL the process if it hangs.

alexghr · 2026-02-11T10:14:54Z

l1-contracts/scripts/forge_broadcast.js

+    //   - Forge computes new nonces from on-chain state
+    //   - New transactions replace any stuck ones with the same nonce
+    //   - The race condition is intermittent (~0.04%), so retries almost always succeed
+    rmSync("broadcast", { recursive: true, force: true });


just preempting flakes here

Suggested change

rmSync("broadcast", { recursive: true, force: true });

rmSync("broadcast", { recursive: true, force: true, maxRetries: 3, retryDelay: 100 });

BEGIN_COMMIT_OVERRIDE chore: Should fix proving benchmarks by reducing committee lag (#20381) chore: standalone forge broadcast wrapper with retry, timeout, and anvil detection (#19824) chore: benchmark tx val (#20227) fix: cloudflare terraform API (#20387) fix: HA e2e test order & retries (#20383) fix: incorporate forge broadcast review feedback (#20390) refactor(p2p): rewrite FileStoreTxCollection with retry, backoff, and shared worker pool (#20317) chore(ci): run ci job on draft PRs (#20395) feat(bot): allow anchoring txs to proposed chain (#20392) feat(p2p): slot-based soft deletion for TxPoolV2 (#20388) chore: add setup-container script (#20309) feat: build aztec-prover-agent with baked-in CRS (#20391) fix!: change protocol contracts deployer to be the contract address (#20396) feat(p2p): enforce minimum tx pool age for block building (#20384) refactor(sentinel): update validator statuses to checkpoint-based naming (#20372) chore: log tx hash (#20413) chore: update l1 fee analysis to measure blob count in L1 blocks (#20414) chore: set up tx file store in next-net (#20418) chore(ci): gate draft PRs from CI, allow override with ci-draft label (#20426) fix: stabilize writing_an_account_contract.test.ts (#20420) END_COMMIT_OVERRIDE

ludamad force-pushed the ad/chore/forge-upgrade-slow-batching branch from 739397d to 7c00e82 Compare January 21, 2026 23:33

ludamad marked this pull request as ready for review January 21, 2026 23:34

ludamad requested a review from charlielye as a code owner January 21, 2026 23:34

ludamad force-pushed the ad/chore/forge-upgrade-slow-batching branch 6 times, most recently from 7b36542 to dde561c Compare January 23, 2026 21:04

ludamad added the ci-merge-queue label Jan 23, 2026

ludamad force-pushed the ad/chore/forge-upgrade-slow-batching branch 4 times, most recently from f52c815 to cbd553d Compare January 27, 2026 19:54

ludamad force-pushed the ad/chore/forge-upgrade-slow-batching branch from dc4eb3b to 0e33d76 Compare February 4, 2026 14:33

ludamad changed the title ~~chore: upgrade foundry to 1.5.1 and use --slow flag in deployments~~ chore: fix forge broadcast hang with --batch-size and --timeout Feb 4, 2026

ludamad changed the base branch from next to merge-train/spartan February 4, 2026 14:33

ludamad force-pushed the ad/chore/forge-upgrade-slow-batching branch from 0e33d76 to edca85c Compare February 4, 2026 17:31

charlielye previously requested changes Feb 5, 2026

View reviewed changes

bootstrap.sh Outdated Show resolved Hide resolved

ludamad force-pushed the ad/chore/forge-upgrade-slow-batching branch 3 times, most recently from 0eaa392 to 80ba385 Compare February 5, 2026 18:02

ludamad added 4 commits February 9, 2026 12:27

chore: fix forge broadcast hang with --batch-size and --timeout

b1b05f2

ludamad force-pushed the ad/chore/forge-upgrade-slow-batching branch from 80ba385 to 102ee4f Compare February 9, 2026 12:28

ludamad changed the title ~~chore: fix forge broadcast hang with --batch-size and --timeout~~ chore: standalone forge broadcast wrapper with retry, timeout, and anvil detection Feb 9, 2026

ludamad added 2 commits February 9, 2026 14:04

chore: simplify forge_broadcast.ts invocation

6e00eb9

The script has a shebang that handles --experimental-strip-types, so spawn it directly instead of via node. It also doesn't need to be copied to the temp dir — it just spawns forge which inherits cwd.

ludamad requested a review from charlielye February 9, 2026 14:25

ludamad commented Feb 9, 2026

View reviewed changes

ludamad added 2 commits February 9, 2026 14:50

fix: compile forge_broadcast.ts to JS for node_modules compatibility

0b891ea

Node.js refuses to load .ts files from node_modules even with --experimental-strip-types. Compile with swc during copy-foundry-artifacts and invoke via process.execPath from deploy code.

ludamad and others added 2 commits February 10, 2026 11:52

chore: make forge_broadcast a plain .js file

d3dd089

Node.js refuses to load .ts files from node_modules, so make the script plain JavaScript instead of compiling with swc. Removes @swc/cli and @swc/core devDependencies from l1-contracts.

Merge branch 'merge-train/spartan' into ad/chore/forge-upgrade-slow-b…

c395c09

…atching

ludamad enabled auto-merge (squash) February 10, 2026 13:54

alexghr approved these changes Feb 11, 2026

View reviewed changes

ludamad merged commit 005cf8b into merge-train/spartan Feb 11, 2026
11 checks passed

ludamad deleted the ad/chore/forge-upgrade-slow-batching branch February 11, 2026 10:19

AztecBot mentioned this pull request Feb 11, 2026

feat: merge-train/spartan #20382

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: standalone forge broadcast wrapper with retry, timeout, and anvil detection#19824

chore: standalone forge broadcast wrapper with retry, timeout, and anvil detection#19824
ludamad merged 12 commits intomerge-train/spartanfrom
ad/chore/forge-upgrade-slow-batching

ludamad commented Jan 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

AztecBot commented Feb 9, 2026 •

edited

Loading

Uh oh!

ludamad Feb 9, 2026

Uh oh!

socket-security bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

alexghr left a comment

Uh oh!

alexghr Feb 11, 2026

Uh oh!

alexghr Feb 11, 2026

Uh oh!

alexghr Feb 11, 2026

Uh oh!

alexghr Feb 11, 2026

Uh oh!

alexghr Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	rmSync("broadcast", { recursive: true, force: true });
	rmSync("broadcast", { recursive: true, force: true, maxRetries: 3, retryDelay: 100 });

Conversation

ludamad commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Files changed

Stress test results

Test plan

Uh oh!

Uh oh!

AztecBot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Flakey Tests

Uh oh!

ludamad Feb 9, 2026

Choose a reason for hiding this comment

Uh oh!

socket-security bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexghr left a comment

Choose a reason for hiding this comment

Uh oh!

alexghr Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

alexghr Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

alexghr Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

alexghr Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

alexghr Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ludamad commented Jan 21, 2026 •

edited

Loading

AztecBot commented Feb 9, 2026 •

edited

Loading

socket-security bot commented Feb 9, 2026 •

edited

Loading