build: enable LTO for bun-zig.o by Jarred-Sumner · Pull Request #29618 · oven-sh/bun

Jarred-Sumner · 2026-04-23T00:27:21Z

Emit bun-zig.o as LLVM bitcode (obj.lto = .full) so it participates in the same LTO link as the C/C++ side, instead of being a native ELF object that lld can only link, not optimize across.

Why this didn't work before

The only documented Zig→bitcode path was -Dobj_format=bc, which routes through getEmittedLlvmBc() → -femit-llvm-bc. That writes Zig's self-hosted unoptimized bitcode (Producer "zig 0.15.2") before libLLVM ever touches it, and the self-hosted writer has encoding bugs for large modules → Invalid record in lld. The proper -flto path was blocked by use_lld = false on Linux causing LtoRequiresLld or silent fallback.

This PR uses obj.lto = .full + obj.use_lld = true, which routes through libLLVM's WriteBitcodeToFile after the LTO pre-link O3 pipeline. The output (Producer "LLVM20.1.2") is forward-compatible with lld 21/22.

Bumps ZIG_COMMIT_PARALLEL to pick up oven-sh/zig@04e7f6ac1e, which adds EnableSplitLTOUnit=1 + a module summary to Zig's LTO bitcode — required for -fwhole-program-vtables to accept the link (otherwise: inconsistent LTO Unit splitting).

What gets inlined

Boundary	Declared	Eliminated	%
Zig `export fn` → C++	336	142	42%
C `us_*` (usockets) ← Zig	115	79	69%
C++ `uws_*` (uWebSockets wrappers) ← Zig	108	76	70%
`mi_free`	—	all	100% (0 call insns)

Verified by disassembly: e.g. Bun__readOriginTimer body (optional check, clock_gettime, ns math) appears directly inside C++ Process_functionHRTime; symbol is gone.

Measured impact (linux-x64, vs latest canary)

	LTO	no Zig LTO	Δ
`Bun.escapeHTML`	171.3 ns	183.2 ns	6.5%
`TextDecoder.decode`	104.0 ns	106.8 ns	2.6%
`oha -n 1M -c 50`	~200,600 req/s	~193,800 req/s	3.5%

Caveat: canary is a slightly different revision; same-revision A/B would be cleaner.

Notes

zigLto(cfg) gates on usingParallelCompiler — the older Windows ZIG_COMMIT predates the summary fix.
codegenThreads() returns 1 when LTO is on (zig_llvm.cpp gates SplitModule on !lto).

robobun · 2026-04-23T00:27:32Z

^{Updated 5:44 PM PT - Apr 22nd, 2026}

❌ Your commit d963aa46 has 1 failures in Build #47387 (All Failures):

scripts/build/ci.ts - build failed on 🐧 x64-musl - build-bun
scripts/build/ci.ts - build failed on 🍎 aarch64 - build-zig
scripts/build/ci.ts - build failed on 🐧 x64-musl-baseline - build-bun
scripts/build/ci.ts - build failed on 🍎 x64 - build-zig
scripts/build/ci.ts - build failed on 🐧 aarch64-musl - build-bun

🧪 To try this PR locally:

bunx bun-pr 29618

That installs a local version of the PR into your bun-29618 executable, so you can run:

bun-29618 --bun

Emit bun-zig.o as LLVM bitcode (obj.lto=.full) so it participates in the same LTO link as the C/C++ side. Previously the only bitcode path was -Dobj_format=bc, which writes zig's self-hosted (unoptimized) bitcode and is rejected by lld with "Invalid record". Bumps ZIG_COMMIT_PARALLEL to pick up the EnableSplitLTOUnit/summary fix in zig_llvm.cpp, required for -fwhole-program-vtables interop.

claude

Additional findings (outside current diff — PR may have been updated during review):

🟡 .buildkite/Dockerfile-bootstrap.sh:108-115 — This script only installs docker via dnf (line 30), but the new block calls git clone — Amazon Linux 2023 AMIs don't ship git by default, so the clone fails with "command not found" and the if-body is silently skipped (set -e doesn't apply to if-conditions). The test docker images are never pre-pulled on the amazonlinux+docker hosts; fix by changing line 30 to dnf install -y docker git.
Extended reasoning...

What the bug is

.buildkite/Dockerfile-bootstrap.sh runs on Amazon Linux 2023 hosts (the only dnf-based platforms in ci.mjs buildPlatforms with features: ['docker']). Its only package install is line 30: dnf install -y docker. This PR adds a new block at lines 108–115 that runs git clone --depth=1 ... to fetch the repo so it can run test/docker/prepare-ci.sh. But git is never installed, and the al2023-ami-* AMIs selected by aws.getBaseImage() for distro=amazonlinux release=2023 are intentionally minimal and do not include git in the default package set.

Code path that triggers it

machine.mjs uploads Dockerfile-bootstrap.sh to a fresh AL2023 instance and runs it as root. The script runs set -euo pipefail, then dnf install -y docker, builds the buildkite docker image, and reaches the new block:
```
if git clone --depth=1 --branch "${BUN_BOOTSTRAP_REPO_REF:-main}" \
    https://github.com/oven-sh/bun.git /tmp/bun-test-docker; then
    ...
fi
```
With git absent, bash prints bash: git: command not found and the command exits 127. Because the command appears as the condition of an if, set -e does not apply (POSIX: "the -e setting shall be ignored when executing the compound-list following the if"). The if evaluates false, the body is skipped, and the script continues to docker container create and exits 0.

Why existing code doesn't prevent it

Unlike bootstrap.sh (which calls git_path="$(require git)" and explicitly installs git via install_common_software), Dockerfile-bootstrap.sh is a minimal host-side script that has never needed git before — it only orchestrates docker. Nothing in this file installs git, and nothing checks for it before the new clone.

Step-by-step proof
1. machine.mjs calls aws.getBaseImage({distro: 'amazonlinux', release: '2023', ...}) → matches al2023-ami-*-x86_64/arm64 (the official AL2023 AMI, which ships without git).
2. Script runs as root, executes dnf install -y docker (line 30) — installs docker only.
3. docker buildx build ... --build-arg BUN_REPO_REF=... succeeds (git inside the container image is installed via apt at Dockerfile line ~25, but that's inside the container, not on the host).
4. Line 108: if git clone ... → command not found, exit 127 → if-condition false.
5. Body (prepare-ci.sh invocation) skipped; rm -rf /tmp/bun-test-docker also skipped (it's inside the if).
6. Script proceeds, prints "Bootstrap complete", exits 0. AMI is snapshotted without postgres/mysql/redis/minio images in /var/lib/docker.
Impact

The bake doesn't fail — the block is explicitly documented as best-effort. The only consequence is that the new "pre-pull test docker images" optimization is silently a no-op on the linux-{aarch64,x64}-amazonlinux-2023-with-docker images. Per ci.mjs, those are build platforms (cpp/zig/link agents), not testPlatforms (which use debian/ubuntu/alpine via bootstrap.sh, where this PR's separate prefetch_build_deps path does have git). So the practical value of pre-pulling test images on these specific hosts is marginal anyway — but the author clearly intended the block to execute, and it silently won't.

Fix

Change line 30 to:
```
dnf install -y docker git
```
🟡 scripts/build/download.ts:161-163 — The new permanent = res.status >= 400 && res.status < 500 check treats all 4xx as non-retryable, but HTTP 429 (Too Many Requests) and 408 (Request Timeout) are transient by definition and were previously retried with backoff. Since prefetch-deps.ts fires concurrent downloads at GitHub and only treats HTTP 404 as an expected miss, a 429 would now hard-fail instead of backing off. Consider && res.status !== 408 && res.status !== 429.
Extended reasoning...

What changed

Before this PR, downloadWithRetry retried on any non-ok response: the if (!res.ok || res.body === null) branch set lastError and continued into the next backoff iteration regardless of status code. After this PR, the new line at scripts/build/download.ts:163 sets permanent = res.status >= 400 && res.status < 500, which causes the loop condition attempt <= maxAttempts && !permanent to terminate after a single attempt for every 4xx code. The inline comment says "4xx is deterministic" — but that is not true for 429 (Too Many Requests) or 408 (Request Timeout), which are explicitly defined as retryable in HTTP semantics (and 429 frequently carries a Retry-After header for exactly this reason).

Concrete trigger path
1. scripts/prefetch-deps.ts enumerates the cross-product of {asan, lto, baseline, musl} variants and pushes ~dozens of GitHub release-asset / archive-tarball URLs into a queue.
2. Four parallel workers call downloadWithRetry(item.url, path, ...) for each item.
3. GitHub's CDN (codeload.github.com / objects.githubusercontent.com) returns 429 under abuse detection or load.
4. res.ok is false, res.status is 429 → permanent = true → loop exits after attempt 1 → throw lastError (the raw HTTP 429 Too Many Requests for ... BuildError).
5. Back in prefetch-deps.ts's fetchOne, the catch block tests /\bHTTP 404\b/.test(err.message) — this does not match 429, so the error is re-thrown.
6. The worker's promise rejects, Promise.all(workers) rejects, and prefetch-deps.ts exits non-zero.
In .buildkite/Dockerfile (lines 145–151), the prefetch step runs inside RUN set -e; ... (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch) with no || true guard, so a non-zero exit fails the entire docker buildx build. (bootstrap.sh and bootstrap.ps1 wrap prefetch-deps in best-effort handling, so on those paths the bake survives — but the Dockerfile path does not.) Regular CI builds calling downloadWithRetry via fetchPrebuilt/fetchZig would also fail immediately on 429 instead of retrying.

Why existing code doesn't prevent it

The only special-casing of non-ok statuses in callers is prefetch-deps.ts's /\bHTTP 404\b/ regex, which intentionally matches 404 only. Nothing in the call chain catches or retries 429. Before this PR the retry loop itself absorbed it; now nothing does.

Impact

This is a minor robustness regression. GitHub release-asset and codeload endpoints rarely return 429 for anonymous downloads, image bakes are infrequent manual operations, and on CI most downloads now hit the prefetch cache and never reach fetch(). So the practical likelihood is low — but when it does happen the failure mode is worse than before (hard fail vs. transparent retry), and the comment asserting "4xx is deterministic" is factually wrong for these codes.

Fix

Exclude the two transient 4xx codes from the permanent check:
```
permanent = res.status >= 400 && res.status < 500 && res.status !== 408 && res.status !== 429;
```
This preserves the intended fast-fail on 404/403/401 (genuinely deterministic) while restoring backoff for rate-limiting and request-timeout responses.

github-actions · 2026-04-23T00:34:21Z

This PR may be a duplicate of:

ci: bake a dep prefetch cache into build images [build images] #29568 - Both implement the same build dependency prefetch cache system (prefetchDir, prefetchPathForUrl, tryPrefetchExtracted, BUN_BUILD_PREFETCH_DIR) with near-identical changes to scripts/build/download.ts, scripts/prefetch-deps.ts, bootstrap scripts, and Dockerfile

🤖 Generated with Claude Code

coderabbitai · 2026-04-23T00:36:45Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 21916ab5-a00c-43d9-b298-8652f2427785

📥 Commits

Reviewing files that changed from the base of the PR and between d963aa4 and c0e855c.

📒 Files selected for processing (2)

scripts/build/config.ts
scripts/build/zig.ts

Walkthrough

Adds a boolean LTO build option and propagates it through build tooling; LTO is disabled for multi-platform "check" builds and, when enabled, configures full LTO with LLD. Zig/tooling commit selection simplified to a single pinned Zig commit and sharding/generation now respects the LTO setting.

Changes

Cohort / File(s)	Summary
Build option and object config `build.zig`	Adds `BunBuildOptions.lto: bool`; disables `lto` for multi-platform "check" builds; when `opts.lto` is true `configureObj` sets `obj.lto = .full` and forces `obj.use_lld = true`.
Zig build script and sharding behavior `scripts/build/zig.ts`	Pins a single `ZIG_COMMIT` (new hash), removes parallel-compiler-specific commit/selection logic, passes `-Dlto=<bool>` to `zig build`, always sets `ZIG_PARALLEL_SEMA=1`, and makes codegen sharding return 1 when `cfg.lto` is true (removes prior "only parallel compiler enables sharding" dependency).
Config resolution and display `scripts/build/config.ts`	Resolves `cfg.zigCommit` from `versionDefaults.zigCommit` when unset and updates feature display to compare against the new single pinned Zig commit rather than a host-dependent default.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: enabling LTO for bun-zig.o, which is the core objective across all modified files.
Description check	✅ Passed	The description is comprehensive and well-structured, covering the PR objectives, technical motivation, verification methods, and measured impact, though it deviates from the template structure.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

test/docker/index.ts (1)
386-393: 🧹 Nitpick | 🔵 Trivial

Deduplicate the build-required service list.

buildServices() now carries a second copy of the same "postgres_tls" | "mysql_tls" | "redis_unified" membership logic that DockerComposeHelper.up() already hardcodes at Line 113. This PR had to touch both places, and servicesToBuild is only inferred as string[], so a typo here would slip past type-checking and fail at runtime. Please move this to one shared, typed constant/Set<ServiceName> and reuse it in both paths.
♻️ Proposed refactor
+const servicesNeedingBuild: readonly ServiceName[] = ["postgres_tls", "mysql_tls", "redis_unified"];
+
 class DockerComposeHelper {
@@
-    if (service === "mysql_tls" || service === "redis_unified" || service === "postgres_tls") {
+    if (servicesNeedingBuild.includes(service)) {
       const buildResult = await this.exec(["build", service]);
@@
-    const servicesToBuild = ["postgres_tls", "mysql_tls", "redis_unified"];
+    const servicesToBuild = servicesNeedingBuild;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/docker/index.ts` around lines 386 - 393, The services list is duplicated
in buildServices() and DockerComposeHelper.up(); create a single exported, typed
constant (e.g., a readonly array or Set typed as Set<ServiceName> or
ReadonlyArray<ServiceName>) named something like BUILT_SERVICES or
SERVICES_REQUIRING_BUILD and replace the local servicesToBuild in
buildServices() and the hardcoded list in DockerComposeHelper.up() to reference
this constant so the membership is centralized and type-checked; update any
imports/uses to reference the new symbol and adjust types so the compiler
enforces ServiceName membership.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.buildkite/Dockerfile:
- Around line 145-153: The RUN block cloning bun should treat the prefetch step
as best-effort and then harden the output directory before exporting: change the
`if ... bun scripts/prefetch-deps.ts /opt/bun-prefetch` invocation so any
non-zero exit does not fail the build (e.g., run it in a subshell and swallow
errors or append `|| true`), and after the step (only if /opt/bun-prefetch
exists) run `chown -R root:root /opt/bun-prefetch || true` and `chmod -R a-w
/opt/bun-prefetch || true` (and add read/execute bits if needed) so
`/opt/bun-prefetch` is immutable and owned by root; reference the existing RUN
block, `${BUN_REPO_REF}`, `bun scripts/prefetch-deps.ts`, and
`/opt/bun-prefetch` when making the edits.

In @.buildkite/Dockerfile-bootstrap.sh:
- Around line 108-115: The git clone block that attempts to pull the bun repo
for Docker warmup currently swallows clone failures; add an else branch after
the if git clone ...; then print a clear warning (to stderr) including the
attempted ref variable ${BUN_BOOTSTRAP_REPO_REF:-main} and the destination
(/tmp/bun-test-docker) so clone failures are visible; update the block
surrounding the existing prepare-ci.sh checks and rm -rf to include this else
warning so bad refs or transient GitHub failures are diagnosable.

In `@build.zig`:
- Around line 756-760: The build currently allows opts.no_llvm=true with
opts.lto=true leading to obj.use_llvm = false while obj.lto = .full (an
unsupported Zig state); add a guard early after evaluating opts to detect when
opts.no_llvm and opts.lto are both true and fail fast with a clear error
message. Modify the logic around obj.use_llvm / obj.lto (the block setting
obj.use_llvm = !opts.no_llvm; obj.lto = .full; obj.use_lld = true) to check the
conflicting flags first and call a build-failure routine (or std.debug.panic
with a descriptive message) if both are set, so we never set obj.lto when
obj.use_llvm is false.

In `@scripts/bootstrap.ps1`:
- Around line 720-735: The Prefetch-Build-Deps flow still throws when "bun
scripts\prefetch-deps.ts" returns a non-zero exit ($LASTEXITCODE), which
contradicts the intended best-effort behavior; modify the block in the
Prefetch-Build-Deps function so that after running "& bun
scripts\prefetch-deps.ts $prefetchDir" you check $LASTEXITCODE and, if non-zero,
emit a warning/log (do not throw) and continue cleanup and setting of
BUN_BUILD_PREFETCH_DIR; keep the existing try/finally with Pop-Location and
ensure Remove-Item and attrib/Set-Env still run even when prefetch-deps.ts fails
so downstream steps proceed.

In `@scripts/machine.mjs`:
- Around line 1253-1271: The spawn can emit an "error" event (e.g., ENOENT)
which is not handled; update the logic around nodeSpawn/child so you attach an
"error" listener on child in addition to "close", and ensure the Promise waits
for either event: on "error" reject or resolve with a non-zero code and the
error signal/info, and on "close" resolve as before. Also remove/cleanup the
signal handlers and both listeners inside the resolution path (references:
nodeSpawn, child, forward, the process.on("SIGINT"/"SIGTERM") handlers and the
Promise that currently listens only for "close") so you don't leak handlers
after spawn failure or normal exit.

In `@test/docker/prepare-ci.sh`:
- Around line 81-86: The docker cache-save step is missing the newly designated
warmup images; update the docker save manifest (the block that runs docker save
when BUN_DOCKER_LOAD_CACHE=1) to include the same images you added with
pull_if_missing — specifically "ubuntu/squid:5.2-22.04_beta", "postgres:15.13",
and "redis:8-alpine" — so the cache archive contains the full warm set and later
jobs won’t re-pull those images.

In `@test/integration/build-prefetch/prefetch.test.ts`:
- Around line 37-48: The tests assert exact empty stderr
(expect(stderr).toBe("")) which flakily fails on ASAN shards because bun
subprocesses started via bunExe() with bunEnv may emit known ASAN warnings;
remove or relax those exact-empty-stderr assertions in the prefetch test blocks
(the Bun.spawn usage that defines proc, stdout, stderr, exitCode and checks
stdout, file contents and exitCode) and instead rely on the stable assertions
(stdout contains "using prefetch cache", filesystem check Bun.file(...).text()
=== "hi\n", and exitCode === 0); apply the same change to the other test blocks
noted (the other Bun.spawn sections around lines 59-79, 99-116, 145-160).

---

Outside diff comments:
In `@test/docker/index.ts`:
- Around line 386-393: The services list is duplicated in buildServices() and
DockerComposeHelper.up(); create a single exported, typed constant (e.g., a
readonly array or Set typed as Set<ServiceName> or ReadonlyArray<ServiceName>)
named something like BUILT_SERVICES or SERVICES_REQUIRING_BUILD and replace the
local servicesToBuild in buildServices() and the hardcoded list in
DockerComposeHelper.up() to reference this constant so the membership is
centralized and type-checked; update any imports/uses to reference the new
symbol and adjust types so the compiler enforces ServiceName membership.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0c97bbcc-70b7-495d-ad34-456b426f6a30

📥 Commits

Reviewing files that changed from the base of the PR and between 890ef5a and 2a376e285751bc6dd140341d8c0ef1e4dc95a236.

📒 Files selected for processing (16)

.buildkite/Dockerfile
.buildkite/Dockerfile-bootstrap.sh
.buildkite/ci.mjs
build.zig
scripts/bootstrap.ps1
scripts/bootstrap.sh
scripts/build/download.ts
scripts/build/zig.ts
scripts/machine.mjs
scripts/packer/variables.pkr.hcl
scripts/packer/windows-arm64.pkr.hcl
scripts/packer/windows-x64.pkr.hcl
scripts/prefetch-deps.ts
test/docker/index.ts
test/docker/prepare-ci.sh
test/integration/build-prefetch/prefetch.test.ts

coderabbitai · 2026-04-23T00:36:48Z

+RUN set -e; \
+    if git clone --depth=1 --branch ${BUN_REPO_REF} https://github.com/oven-sh/bun.git /tmp/bun-clone \
+       && [ -f /tmp/bun-clone/scripts/prefetch-deps.ts ]; then \
+         (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch); \
+       else \
+         echo "warning: prefetch-deps.ts unavailable at ${BUN_REPO_REF}; skipping warm cache"; \
+       fi; \
+    rm -rf /tmp/bun-clone
+ENV BUN_BUILD_PREFETCH_DIR=/opt/bun-prefetch


⚠️ Potential issue | 🟠 Major

Mirror the bootstrap scripts here: make warm-cache best-effort and harden it before exporting it.

This Docker-only path currently fails the whole image build if bun scripts/prefetch-deps.ts exits non-zero, and it exposes /opt/bun-prefetch to later jobs without stripping write bits. That makes the image bake flakier than scripts/bootstrap.sh / scripts/bootstrap.ps1 and leaves a shared cache mutable inside a long-lived agent container.

⚙️ Suggested fix

RUN set -e; \ if git clone --depth=1 --branch ${BUN_REPO_REF} https://github.com/oven-sh/bun.git /tmp/bun-clone \ && [ -f /tmp/bun-clone/scripts/prefetch-deps.ts ]; then \ - (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch); \ + if ! (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch); then \ + echo "warning: prefetch-deps.ts failed; continuing without warm cache"; \ + rm -rf /opt/bun-prefetch; \ + else \ + chmod -R a-w /opt/bun-prefetch; \ + fi; \ else \ echo "warning: prefetch-deps.ts unavailable at ${BUN_REPO_REF}; skipping warm cache"; \ fi; \ rm -rf /tmp/bun-clone

🧰 Tools

🪛 Hadolint (2.14.0)

[warning] 145-145: Use WORKDIR to switch to a directory

(DL3003)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.buildkite/Dockerfile around lines 145 - 153, The RUN block cloning bun should treat the prefetch step as best-effort and then harden the output directory before exporting: change the `if ... bun scripts/prefetch-deps.ts /opt/bun-prefetch` invocation so any non-zero exit does not fail the build (e.g., run it in a subshell and swallow errors or append `|| true`), and after the step (only if /opt/bun-prefetch exists) run `chown -R root:root /opt/bun-prefetch || true` and `chmod -R a-w /opt/bun-prefetch || true` (and add read/execute bits if needed) so `/opt/bun-prefetch` is immutable and owned by root; reference the existing RUN block, `${BUN_REPO_REF}`, `bun scripts/prefetch-deps.ts`, and `/opt/bun-prefetch` when making the edits.

coderabbitai · 2026-04-23T00:36:48Z

+if git clone --depth=1 --branch "${BUN_BOOTSTRAP_REPO_REF:-main}" \
+    https://github.com/oven-sh/bun.git /tmp/bun-test-docker; then
+    if [ -f /tmp/bun-test-docker/test/docker/prepare-ci.sh ]; then
+        (cd /tmp/bun-test-docker/test/docker && sh prepare-ci.sh) || \
+            echo "warning: prepare-ci.sh failed; test docker images not pre-pulled"
+    fi
+    rm -rf /tmp/bun-test-docker
+fi


⚠️ Potential issue | 🟡 Minor

Log clone failures in this best-effort warmup path.

If the clone itself fails, this block currently does nothing and the bake looks successful even though the Docker warm cache was never populated. Please add an else warning here so bad refs/transient GitHub failures are diagnosable.

🪵 Suggested tweak

if git clone --depth=1 --branch "${BUN_BOOTSTRAP_REPO_REF:-main}" \ https://github.com/oven-sh/bun.git /tmp/bun-test-docker; then if [ -f /tmp/bun-test-docker/test/docker/prepare-ci.sh ]; then (cd /tmp/bun-test-docker/test/docker && sh prepare-ci.sh) || \ echo "warning: prepare-ci.sh failed; test docker images not pre-pulled" fi rm -rf /tmp/bun-test-docker +else + echo "warning: failed to clone oven-sh/bun at ${BUN_BOOTSTRAP_REPO_REF:-main}; test docker images not pre-pulled" fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.buildkite/Dockerfile-bootstrap.sh around lines 108 - 115, The git clone block that attempts to pull the bun repo for Docker warmup currently swallows clone failures; add an else branch after the if git clone ...; then print a clear warning (to stderr) including the attempted ref variable ${BUN_BOOTSTRAP_REPO_REF:-main} and the destination (/tmp/bun-test-docker) so clone failures are visible; update the block surrounding the existing prepare-ci.sh checks and rm -rf to include this else warning so bad refs or transient GitHub failures are diagnosable.

coderabbitai · 2026-04-23T00:36:48Z

    obj.use_llvm = !opts.no_llvm;
    obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm;
+    if (opts.lto) {
+        obj.lto = .full;
+        obj.use_lld = true;


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does Zig support obj.lto = .fullwhenuse_llvm = false / the self-hosted backend is enabled?

💡 Result:

No, Zig does not support obj.lto = .full when use_llvm = false / the self-hosted backend is enabled.

Citations:

1: Disable LTO by default + some LTO fixes ziglang/zig#22230

2: Compilation: Enable LTO for libraries by default ziglang/zig#22228

3: Error (link warning) building minimal Zig executable with LTO enabled for ReleaseSafe mode targeting Windows ziglang/zig#24274

4: zig ld: support LTO ziglang/zig#8680

5: https://mintlify.com/ziglang/zig/compiler/optimization

🏁 Script executed:

# Find the build.zig file and check the relevant sections fd -t f "build.zig" | head -1 | xargs wc -l

Repository: oven-sh/bun

Length of output: 69

🏁 Script executed:

# Check the BunBuildOptions struct and flag parsing grep -n "no_llvm\|\.lto" build.zig | head -30

Repository: oven-sh/bun

Length of output: 869

🏁 Script executed:

# Check how opts is initialized and where defaults are set sed -n '50,70p' build.zig sed -n '200,220p' build.zig sed -n '640,660p' build.zig

Repository: oven-sh/bun

Length of output: 2849

🏁 Script executed:

# Check if there are existing validation/guard patterns elsewhere in build.zig rg "const fail_step|addFail|@panic" build.zig -A 2 -B 2 | head -50

Repository: oven-sh/bun

Length of output: 1783

Reject -Dno_llvm=true together with -Dlto=true.

Full LTO is an LLVM-only path, but the code still allows both flags to be set independently. When both are true, obj.use_llvm becomes false while obj.lto = .full is set, which Zig does not support. Add a guard to fail fast instead of producing confusing Zig errors.

Suggested fix

obj.use_llvm = !opts.no_llvm; obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm; if (opts.lto) { + if (opts.no_llvm) { + const fail_step = b.addFail("LTO requires the LLVM backend"); + obj.step.dependOn(&fail_step.step); + return; + } obj.lto = .full; obj.use_lld = true; }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

obj.use_llvm = !opts.no_llvm;

obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm;

if (opts.lto) {

obj.lto = .full;

obj.use_lld = true;

obj.use_llvm = !opts.no_llvm;

obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm;

if (opts.lto) {

if (opts.no_llvm) {

const fail_step = b.addFail("LTO requires the LLVM backend");

obj.step.dependOn(&fail_step.step);

return;

}

obj.lto = .full;

obj.use_lld = true;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@build.zig` around lines 756 - 760, The build currently allows opts.no_llvm=true with opts.lto=true leading to obj.use_llvm = false while obj.lto = .full (an unsupported Zig state); add a guard early after evaluating opts to detect when opts.no_llvm and opts.lto are both true and fail fast with a clear error message. Modify the logic around obj.use_llvm / obj.lto (the block setting obj.use_llvm = !opts.no_llvm; obj.lto = .full; obj.use_lld = true) to check the conflicting flags first and call a build-failure routine (or std.debug.panic with a descriptive message) if both are set, so we never set obj.lto when obj.use_llvm is false.

coderabbitai · 2026-04-23T00:36:48Z

+  Push-Location $cloneDir
+  try {
+    & bun scripts\prefetch-deps.ts $prefetchDir
+    if ($LASTEXITCODE -ne 0) { throw "prefetch-deps.ts failed" }
+  } finally {
+    Pop-Location
+  }
+  Remove-Item -Recurse -Force $cloneDir
+
+  # Read-only: download.ts only ever copies FROM here, and a writable baked
+  # input is something a misbehaving job could corrupt for later jobs on the
+  # same runner.
+  & attrib +R "$prefetchDir\*" /S /D
+
+  Set-Env "BUN_BUILD_PREFETCH_DIR" $prefetchDir
+}


⚠️ Potential issue | 🟠 Major

Prefetch-Build-Deps is still load-bearing on prefetch-deps.ts failure.

Clone failures are downgraded to warnings, but a non-zero exit from bun scripts\prefetch-deps.ts still throws out of this function and aborts the bootstrap. That contradicts the “best-effort” behavior described above and the shell implementation.

🛠️ Suggested fix

Push-Location $cloneDir try { & bun scripts\prefetch-deps.ts $prefetchDir - if ($LASTEXITCODE -ne 0) { throw "prefetch-deps.ts failed" } + if ($LASTEXITCODE -ne 0) { + Write-Output "warning: prefetch-deps.ts failed; skipping warm cache" + Remove-Item -Recurse -Force -ErrorAction SilentlyContinue $prefetchDir + return + } + } catch { + Write-Output "warning: prefetch-deps.ts failed; skipping warm cache" + Remove-Item -Recurse -Force -ErrorAction SilentlyContinue $prefetchDir + return } finally { Pop-Location }

🧰 Tools

🪛 PSScriptAnalyzer (1.25.0)

[warning] Missing BOM encoding for non-ASCII encoded file 'bootstrap.ps1'

(PSUseBOMForUnicodeEncodedFile)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/bootstrap.ps1` around lines 720 - 735, The Prefetch-Build-Deps flow still throws when "bun scripts\prefetch-deps.ts" returns a non-zero exit ($LASTEXITCODE), which contradicts the intended best-effort behavior; modify the block in the Prefetch-Build-Deps function so that after running "& bun scripts\prefetch-deps.ts $prefetchDir" you check $LASTEXITCODE and, if non-zero, emit a warning/log (do not throw) and continue cleanup and setting of BUN_BUILD_PREFETCH_DIR; keep the existing try/finally with Pop-Location and ensure Remove-Item and attrib/Set-Env still run even when prefetch-deps.ts fails so downstream steps proceed.

coderabbitai · 2026-04-23T00:36:49Z

+  const child = nodeSpawn(packerArgs[0], packerArgs.slice(1), {
    stdio: "inherit",
    env: {
      ...process.env,
-      // Packer also reads these env vars
      ARM_CLIENT_ID: clientId,
      ARM_CLIENT_SECRET: clientSecret,
      ARM_SUBSCRIPTION_ID: subscriptionId,
      ARM_TENANT_ID: tenantId,
    },
  });
+  let cancelled = false;
+  const forward = signal => {
+    cancelled = true;
+    console.log(`[packer] received ${signal}, forwarding to packer for Azure cleanup...`);
+    child.kill(signal);
+  };
+  process.on("SIGINT", forward);
+  process.on("SIGTERM", forward);
+  const [code, signal] = await new Promise(done => child.on("close", (c, s) => done([c, s])));


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check the file exists and read the relevant section wc -l scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 79

🏁 Script executed:

# Read the exact lines mentioned in the review sed -n '1253,1271p' scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 538

🏁 Script executed:

# Get more context around this code block sed -n '1240,1280p' scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 1206

🏁 Script executed:

# Search for other uses of nodeSpawn or spawn in this file to understand patterns rg "nodeSpawn|\.spawn\(" scripts/machine.mjs -n -A 5 -B 2

Repository: oven-sh/bun

Length of output: 4997

🏁 Script executed:

# Check if there are error handlers attached to other spawned processes in the file rg '\.on\("error"' scripts/machine.mjs -n -B 3 -A 1

Repository: oven-sh/bun

Length of output: 37

🏁 Script executed:

# Search for the spawnSafe function definition rg "function spawnSafe|const spawnSafe|export.*spawnSafe" scripts/machine.mjs -A 10

Repository: oven-sh/bun

Length of output: 488

🏁 Script executed:

# Get more context around line 1253 - read a larger block to see if nodeSpawn is actually there sed -n '1200,1300p' scripts/machine.mjs | head -150

Repository: oven-sh/bun

Length of output: 2894

🏁 Script executed:

# Check imports at the top of the file head -50 scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 975

🏁 Script executed:

# Search for nodeSpawn usage rg "nodeSpawn" scripts/machine.mjs -n -B 2 -A 2

Repository: oven-sh/bun

Length of output: 37

🏁 Script executed:

# Search for the spawn and spawnSafe definitions in utils.mjs rg "function spawn|export.*spawn|const spawn" scripts/utils.mjs -A 15 | head -100

Repository: oven-sh/bun

Length of output: 2770

🏁 Script executed:

# Look specifically at what spawnSafe does rg "spawnSafe" scripts/utils.mjs -A 20 | head -80

Repository: oven-sh/bun

Length of output: 1961

🏁 Script executed:

# Check if nodeSpawn is defined anywhere in the codebase rg "nodeSpawn" . --type js --type mjs -n

Repository: oven-sh/bun

Length of output: 82

🏁 Script executed:

# Look for the spawn export to understand what nodeSpawn refers to grep -n "^export.*spawn\|^const spawn\|^function spawn" scripts/utils.mjs

Repository: oven-sh/bun

Length of output: 422

🏁 Script executed:

# Look for nodeSpawn import in utils.mjs head -100 scripts/utils.mjs | grep -n "nodeSpawn\|import.*spawn"

Repository: oven-sh/bun

Length of output: 141

🏁 Script executed:

# Search for nodeSpawn definition in utils.mjs rg "nodeSpawn|import.*spawn" scripts/utils.mjs | head -20

Repository: oven-sh/bun

Length of output: 434

🏁 Script executed:

# Read the full spawn function to see how it handles errors sed -n '237,340p' scripts/utils.mjs

Repository: oven-sh/bun

Length of output: 2626

🏁 Script executed:

# Check git status and see if there are pending changes cd scripts && git status

Repository: oven-sh/bun

Length of output: 275

🏁 Script executed:

# Check if there's a git diff that shows the actual changes git diff scripts/machine.mjs | head -200

Repository: oven-sh/bun

Length of output: 37

🏁 Script executed:

# See the git log to understand recent changes git log --oneline scripts/machine.mjs | head -5

Repository: oven-sh/bun

Length of output: 95

🏁 Script executed:

# Search for the exact pattern from the review comment rg "const cancelled = false" scripts/machine.mjs -B 5 -A 10

Repository: oven-sh/bun

Length of output: 37

🏁 Script executed:

# Search for signal forwarding pattern with SIGINT and SIGTERM rg "forward = signal|received.*signal.*forwarding" scripts/machine.mjs -B 5 -A 10

Repository: oven-sh/bun

Length of output: 37

🏁 Script executed:

# Search more broadly for the packer signal handling rg "packer.*received.*forwarding|child\.kill$signal$" scripts/machine.mjs -B 10 -A 10

Repository: oven-sh/bun

Length of output: 37

🏁 Script executed:

# Check if there's a different branch or stashed changes git stash list git branch -a

Repository: oven-sh/bun

Length of output: 152

🏁 Script executed:

# Check if there's an uncommitted draft or stage area git diff --cached scripts/machine.mjs | head -100

Repository: oven-sh/bun

Length of output: 37

🏁 Script executed:

# Look for any pattern matching the review comment more carefully # Search for "packer.*init\|packer.*build" to understand Packer invocation rg "packer.*init|packer.*build" scripts/machine.mjs -B 10 -A 20

Repository: oven-sh/bun

Length of output: 2438

🏁 Script executed:

# Check what the review comment context implies - this looks like a PR review # The snippet shows code that would replace the current spawnSafe pattern # Let's verify the current state is still using spawnSafe grep -n "await spawnSafe.*packer" scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 171

🏁 Script executed:

# Verify Node.js spawn error behavior - get the Node.js documentation context # The key question: does subprocess "close" fire if spawn() itself fails? # According to Node.js docs, if spawn fails to exec, it emits "error", not "close" python3 << 'EOF' # Document the Node.js child_process.spawn() behavior behavior = { "error_event": "Emitted when: spawn() cannot exec the command (ENOENT, EACCES, etc.)", "error_examples": ["ENOENT (file not found)", "EACCES (permission denied)", "EISDIR (is a directory)"], "close_event": "Emitted when: child process has exited AND stdio streams closed", "close_vs_exit": "close fires after exit; multiple processes can share stdio", "if_spawn_fails": "error event fires, close event does NOT fire", "conclusion": "Listening only to 'close' will not catch spawn exec failures" } for key, value in behavior.items(): print(f"{key}: {value}") EOF

Repository: oven-sh/bun

Length of output: 522

Add error handler to catch spawn failures before waiting for process exit.

The code switches from spawnSafe() to raw nodeSpawn() to enable signal forwarding. However, if nodeSpawn() cannot exec Packer (e.g., file not found, permission denied), Node emits an "error" event instead of allowing the process to start. By only listening to the "close" event, spawn failures will not be caught.

Suggested fix

- const [code, signal] = await new Promise(done => child.on("close", (c, s) => done([c, s]))); + const [code, signal] = await new Promise((resolve, reject) => { + child.once("error", reject); + child.once("close", (c, s) => resolve([c, s])); + });

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/machine.mjs` around lines 1253 - 1271, The spawn can emit an "error" event (e.g., ENOENT) which is not handled; update the logic around nodeSpawn/child so you attach an "error" listener on child in addition to "close", and ensure the Promise waits for either event: on "error" reject or resolve with a non-zero code and the error signal/info, and on "close" resolve as before. Also remove/cleanup the signal handlers and both listeners inside the resolution path (references: nodeSpawn, child, forward, the process.on("SIGINT"/"SIGTERM") handlers and the Promise that currently listens only for "close") so you don't leak handlers after spawn failure or normal exit.

coderabbitai · 2026-04-23T00:36:49Z

+pull_if_missing "ubuntu/squid:5.2-22.04_beta"
+# Base images of the locally-built services above — pulled implicitly by
+# `compose build`, listed here so a missing-layer pull surfaces in this step
+# rather than as a slow first test.
+pull_if_missing "postgres:15.13"
+pull_if_missing "redis:8-alpine"


⚠️ Potential issue | 🟡 Minor

The cache-save manifest is now missing part of the warm set.

These images are now treated as required warmups here, but the docker save block at Lines 103-110 still archives only the old set. BUN_DOCKER_LOAD_CACHE=1 will therefore keep missing ubuntu/squid:5.2-22.04_beta, postgres:15.13, and redis:8-alpine, so later jobs still hit the network for exactly the images this change added.

💾 Minimal fix

docker save \ postgres:15 \ + postgres:15.13 \ mysql:8.4 \ mysql:8.0 \ redis:7-alpine \ + redis:8-alpine \ minio/minio:latest \ crossbario/autobahn-testsuite \ + ubuntu/squid:5.2-22.04_beta \ -o "$CACHE_FILE"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/docker/prepare-ci.sh` around lines 81 - 86, The docker cache-save step is missing the newly designated warmup images; update the docker save manifest (the block that runs docker save when BUN_DOCKER_LOAD_CACHE=1) to include the same images you added with pull_if_missing — specifically "ubuntu/squid:5.2-22.04_beta", "postgres:15.13", and "redis:8-alpine" — so the cache archive contains the full warm set and later jobs won’t re-pull those images.

coderabbitai · 2026-04-23T00:36:49Z

+  await using proc = Bun.spawn({
+    cmd: [bunExe(), join(repoRoot, "scripts", "build", "fetch-cli.ts"), "dep", "fake", repo, commit, dest, cache],
+    env: { ...bunEnv, BUN_BUILD_PREFETCH_DIR: prefetch },
+    stdout: "pipe",
+    stderr: "pipe",
+  });
+  const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]);
+
+  expect(stderr).toBe("");
+  expect(stdout).toContain("using prefetch cache");
+  expect(await Bun.file(join(dest, "hello.txt")).text()).toBe("hi\n");
+  expect(exitCode).toBe(0);


⚠️ Potential issue | 🟠 Major

These exact stderr === "" assertions will fail on ASAN shards.

All four tests spawn bunExe() with bunEnv, so debug ASAN builds can still emit the known WARNING: ASAN interferes... line on stderr. That makes these assertions fail even when the prefetch behavior is correct; here the stable regression guards are stdout, filesystem state, and exit code.

Based on learnings, bunExe() subprocesses in debug ASAN builds can emit a known warning on stderr even with bunEnv, so empty-stderr assertions are intentionally avoided in this pattern.

Also applies to: 59-79, 99-116, 145-160

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/integration/build-prefetch/prefetch.test.ts` around lines 37 - 48, The tests assert exact empty stderr (expect(stderr).toBe("")) which flakily fails on ASAN shards because bun subprocesses started via bunExe() with bunEnv may emit known ASAN warnings; remove or relax those exact-empty-stderr assertions in the prefetch test blocks (the Bun.spawn usage that defines proc, stdout, stderr, exitCode and checks stdout, file contents and exitCode) and instead rely on the stable assertions (stdout contains "using prefetch cache", filesystem check Bun.file(...).text() === "hi\n", and exitCode === 0); apply the same change to the other test blocks noted (the other Bun.spawn sections around lines 59-79, 99-116, 145-160).

claude

I didn't find any bugs, but this changes how release binaries are built (Zig→bitcode LTO across the C++ boundary) and bumps the pinned Zig compiler commit — worth a human sanity check before it ships.

Extended reasoning...

Overview

This PR enables full LTO for bun-zig.o by emitting it as LLVM bitcode (obj.lto = .full + obj.use_lld = true) so it participates in the same link-time optimization pass as the C++ side. Changes touch build.zig (new -Dlto option wired into BunBuildOptions and configureObj) and scripts/build/zig.ts (new zigLto() gate, -Dlto= build arg, codegenThreads() forced to 1 under LTO, and a ZIG_COMMIT_PARALLEL bump to pick up the EnableSplitLTOUnit/module-summary fix).

Security risks

None identified. This is build-system configuration; no runtime code paths, auth, crypto, or user-input handling are touched. The bumped Zig commit is from oven-sh's own fork.

Level of scrutiny

High. Although the diff is small (~30 lines) and mechanically clean, it changes the object format emitted for every shipped release binary and enables cross-language inlining that eliminates 42–100% of boundary symbols per the PR description. LTO across a Zig/C++ FFI boundary is the kind of thing that can surface latent ODR/visibility/UB issues only at link or runtime, and it also bumps the pinned compiler — both are production-critical build-pipeline decisions a human owner should sign off on.

Other factors

The logic itself looks correct: zigLto is properly gated on usingParallelCompiler (so the older Windows-host compiler is unaffected), addMultiCheck hardcodes lto = false for check-only steps, and codegenThreads() returning 1 under LTO keeps zigObjectPaths() and the no_merge_shards path consistent. CI status is mostly infrastructure failures ("Failed to create agent") unrelated to this change, so a green run hasn't yet validated the new bitcode path across all targets.

coderabbitai

♻️ Duplicate comments (1)

build.zig (1)

204-216: ⚠️ Potential issue | 🟠 Major

Still reject -Dno_llvm=true together with -Dlto=true.

These flags are still accepted independently, so the build can reach configureObj() with obj.use_llvm = false and obj.lto = .full. Fail fast here instead of constructing that contradictory configuration.

Suggested fix

 const no_llvm = b.option(bool, "no_llvm", "Experiment with Zig self hosted backends. No stability guaranteed") orelse false;
 const lto = b.option(bool, "lto", "Emit LLVM bitcode for full LTO instead of a native object") orelse false;
+if (no_llvm and lto) {
+    std.debug.panic("`-Dlto=true` requires the LLVM backend", .{});
+}

Does Zig support setting `obj.lto = .full` while `obj.use_llvm = false` / `-Dno_llvm=true` in `build.zig`?

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@build.zig` around lines 204 - 216, The build accepts a contradictory option
combination (no_llvm true with lto true) and can reach configureObj() with
BunBuildOptions where obj.use_llvm = false and obj.lto = .full; add an early
validation after reading the options (no_llvm and lto) and before
constructing/using build_options to fail fast: detect if no_llvm && lto then
emit a clear error message and stop configuration (e.g., process exit or
b.addError/return) so the contradictory BunBuildOptions state is never created;
reference the no_llvm and lto flags, the BunBuildOptions construction, and
configureObj()/obj.use_llvm/obj.lto when locating where to add the check.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@build.zig`:
- Around line 204-216: The build accepts a contradictory option combination
(no_llvm true with lto true) and can reach configureObj() with BunBuildOptions
where obj.use_llvm = false and obj.lto = .full; add an early validation after
reading the options (no_llvm and lto) and before constructing/using
build_options to fail fast: detect if no_llvm && lto then emit a clear error
message and stop configuration (e.g., process exit or b.addError/return) so the
contradictory BunBuildOptions state is never created; reference the no_llvm and
lto flags, the BunBuildOptions construction, and
configureObj()/obj.use_llvm/obj.lto when locating where to add the check.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 433e7a3c-4d72-477c-a323-d65c655bb9ee

📥 Commits

Reviewing files that changed from the base of the PR and between 2a376e285751bc6dd140341d8c0ef1e4dc95a236 and d963aa4.

📒 Files selected for processing (2)

build.zig
scripts/build/zig.ts

The split existed because COFF shard emission isn't implemented, but codegenThreads() already forces 1 on Windows so sharding never happens. Parallel sema is deterministic and works on all hosts.

claude · 2026-04-23T00:49:01Z

    `-Denable_fuzzilli=${bool(cfg.fuzzilli)}`,
    `-Denable_valgrind=${bool(cfg.valgrind)}`,
    `-Denable_tinycc=${bool(cfg.tinycc)}`,
+    `-Dlto=${bool(cfg.lto)}`,


🟡 The comment at lines 446-448 still says "MUST be 0 on the stable compiler — see codegenThreads()", but this PR removed the stable/parallel compiler split and codegenThreads() no longer ever returns 0 (its minimum is now 1). Consider updating this comment to drop the stale "stable compiler" / "0" reference.

Extended reasoning...

What the issue is

The comment immediately following the new -Dlto=${bool(cfg.lto)} line in zigBuildArgs() reads:

// Sharded LLVM codegen — one shard per host core on the parallel // compiler. Zig has no "auto" value (0 = single-threaded). MUST be 0 // on the stable compiler — see codegenThreads(). `-Dllvm_codegen_threads=${codegenThreads(cfg)}`,

This comment is now stale documentation that this PR's own changes orphaned.

Why it's stale

This PR collapsed the ZIG_COMMIT / ZIG_COMMIT_PARALLEL split into a single ZIG_COMMIT constant and deleted:

usingParallelCompiler()

defaultZigCommit()

the if (!usingParallelCompiler(cfg)) return 0; branch at the top of codegenThreads()

After these removals, codegenThreads() (lines 54-66) returns at minimum 1 — never 0. There is no longer a "stable compiler" vs "parallel compiler" distinction anywhere in the codebase, and the function the comment tells the reader to consult no longer contains any logic about 0 or a stable compiler.

Step-by-step proof

Before this PR, codegenThreads() started with if (!usingParallelCompiler(cfg)) return 0; — the comment's "MUST be 0 on the stable compiler" pointed at that line.

The diff removes that line: - if (!usingParallelCompiler(cfg)) return 0; and replaces it with nothing (the function now starts with if (cfg.windows) return 1;).

The diff also removes the entire "TEMPORARY SPLIT" doc block, ZIG_COMMIT_PARALLEL, defaultZigCommit(), and usingParallelCompiler() — so "the stable compiler" no longer refers to anything that exists.

The comment at 446-448 was not touched and still tells the reader "MUST be 0 on the stable compiler — see codegenThreads()". A reader following that pointer will find no such logic.

Impact

No behavioral impact — this is purely misleading documentation. A future reader trying to understand why -Dllvm_codegen_threads is set the way it is will be sent looking for a "stable compiler" branch that no longer exists.

How to fix

Update the comment to reflect the current state, e.g.:

// Sharded LLVM codegen — see codegenThreads() for when sharding is // gated off (Windows, LTO, non-ASAN CI). Zig has no "auto" value.

Or simply drop the second sentence entirely, since codegenThreads()'s own doc comment now fully explains the gating.

Since 02fbd62 (#29618, enable LTO for bun-zig.o), `linux aarch64 - build-bun` OOMs on `r8g.xlarge`: lld's full-LTO link now ingests ~1.39 GB of bitcode (794 MB JSC + 347 MB libbun-profile + 247 MB bun-zig.o), peak memory crosses the 31.5 GiB available (no swap) about 18 min in, and the kernel OOM-kills lld — sometimes taking the BuildKite agent with it (`exit:-1`). This was already failing on the PR's own CI (build 47390) before merge. | target | instance | bitcode in | result | |---|---|---|---| | linux x64-glibc | r7i.xlarge (4c/32G) | 1392 MB | pass, 18.98 min | | linux aarch64-musl | r8g.xlarge (4c/32G) | 1310 MB | pass, 18.15 min | | **linux aarch64-glibc** | r8g.xlarge (4c/32G) | 1388 MB | **Killed, 18.75 min** | x64 has slightly *more* bitcode and survives, so the tipping factor is LLVM's AArch64 backend holding more state during codegen than X86 — not a clang bug, just expected full-LTO scaling. x64 is one bitcode-adding commit away from the same fate. Bump both linux link boxes to `2xlarge` (8 vCPU / 64 GiB). The extra cores also help lld's parallel LTO codegen phase (the single-threaded merge/opt half won't benefit, so expect ~14–15 min rather than half of 19). Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>

Emit `bun-zig.o` as LLVM bitcode (`obj.lto = .full`) so it participates in the same LTO link as the C/C++ side, instead of being a native ELF object that lld can only link, not optimize across. ## Why this didn't work before The only documented Zig→bitcode path was `-Dobj_format=bc`, which routes through `getEmittedLlvmBc()` → `-femit-llvm-bc`. That writes Zig's **self-hosted** unoptimized bitcode (Producer `"zig 0.15.2"`) before libLLVM ever touches it, and the self-hosted writer has encoding bugs for large modules → `Invalid record` in lld. The proper `-flto` path was blocked by `use_lld = false` on Linux causing `LtoRequiresLld` or silent fallback. This PR uses `obj.lto = .full` + `obj.use_lld = true`, which routes through libLLVM's `WriteBitcodeToFile` after the LTO pre-link O3 pipeline. The output (Producer `"LLVM20.1.2"`) is forward-compatible with lld 21/22. Bumps `ZIG_COMMIT_PARALLEL` to pick up oven-sh/zig@04e7f6ac1e, which adds `EnableSplitLTOUnit=1` + a module summary to Zig's LTO bitcode — required for `-fwhole-program-vtables` to accept the link (otherwise: `inconsistent LTO Unit splitting`). ## What gets inlined | Boundary | Declared | Eliminated | % | |---|---|---|---| | Zig `export fn` → C++ | 336 | 142 | 42% | | C `us_*` (usockets) ← Zig | 115 | 79 | 69% | | C++ `uws_*` (uWebSockets wrappers) ← Zig | 108 | 76 | 70% | | `mi_free` | — | all | 100% (0 call insns) | Verified by disassembly: e.g. `Bun__readOriginTimer` body (optional check, `clock_gettime`, ns math) appears directly inside C++ `Process_functionHRTime`; symbol is gone. ## Measured impact (linux-x64, vs latest canary) | | LTO | no Zig LTO | Δ | |---|---|---|---| | `Bun.escapeHTML` | 171.3 ns | 183.2 ns | **6.5%** | | `TextDecoder.decode` | 104.0 ns | 106.8 ns | **2.6%** | | `oha -n 1M -c 50` | ~200,600 req/s | ~193,800 req/s | **3.5%** | Caveat: canary is a slightly different revision; same-revision A/B would be cleaner. ## Notes - `zigLto(cfg)` gates on `usingParallelCompiler` — the older Windows `ZIG_COMMIT` predates the summary fix. - `codegenThreads()` returns 1 when LTO is on (zig_llvm.cpp gates SplitModule on `!lto`). --------- Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>

Since 02fbd62 (oven-sh#29618, enable LTO for bun-zig.o), `linux aarch64 - build-bun` OOMs on `r8g.xlarge`: lld's full-LTO link now ingests ~1.39 GB of bitcode (794 MB JSC + 347 MB libbun-profile + 247 MB bun-zig.o), peak memory crosses the 31.5 GiB available (no swap) about 18 min in, and the kernel OOM-kills lld — sometimes taking the BuildKite agent with it (`exit:-1`). This was already failing on the PR's own CI (build 47390) before merge. | target | instance | bitcode in | result | |---|---|---|---| | linux x64-glibc | r7i.xlarge (4c/32G) | 1392 MB | pass, 18.98 min | | linux aarch64-musl | r8g.xlarge (4c/32G) | 1310 MB | pass, 18.15 min | | **linux aarch64-glibc** | r8g.xlarge (4c/32G) | 1388 MB | **Killed, 18.75 min** | x64 has slightly *more* bitcode and survives, so the tipping factor is LLVM's AArch64 backend holding more state during codegen than X86 — not a clang bug, just expected full-LTO scaling. x64 is one bitcode-adding commit away from the same fate. Bump both linux link boxes to `2xlarge` (8 vCPU / 64 GiB). The extra cores also help lld's parallel LTO codegen phase (the single-threaded merge/opt half won't benefit, so expect ~14–15 min rather than half of 19). Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>

Emit `bun-zig.o` as LLVM bitcode (`obj.lto = .full`) so it participates in the same LTO link as the C/C++ side, instead of being a native ELF object that lld can only link, not optimize across. ## Why this didn't work before The only documented Zig→bitcode path was `-Dobj_format=bc`, which routes through `getEmittedLlvmBc()` → `-femit-llvm-bc`. That writes Zig's **self-hosted** unoptimized bitcode (Producer `"zig 0.15.2"`) before libLLVM ever touches it, and the self-hosted writer has encoding bugs for large modules → `Invalid record` in lld. The proper `-flto` path was blocked by `use_lld = false` on Linux causing `LtoRequiresLld` or silent fallback. This PR uses `obj.lto = .full` + `obj.use_lld = true`, which routes through libLLVM's `WriteBitcodeToFile` after the LTO pre-link O3 pipeline. The output (Producer `"LLVM20.1.2"`) is forward-compatible with lld 21/22. Bumps `ZIG_COMMIT_PARALLEL` to pick up oven-sh/zig@04e7f6ac1e, which adds `EnableSplitLTOUnit=1` + a module summary to Zig's LTO bitcode — required for `-fwhole-program-vtables` to accept the link (otherwise: `inconsistent LTO Unit splitting`). ## What gets inlined | Boundary | Declared | Eliminated | % | |---|---|---|---| | Zig `export fn` → C++ | 336 | 142 | 42% | | C `us_*` (usockets) ← Zig | 115 | 79 | 69% | | C++ `uws_*` (uWebSockets wrappers) ← Zig | 108 | 76 | 70% | | `mi_free` | — | all | 100% (0 call insns) | Verified by disassembly: e.g. `Bun__readOriginTimer` body (optional check, `clock_gettime`, ns math) appears directly inside C++ `Process_functionHRTime`; symbol is gone. ## Measured impact (linux-x64, vs latest canary) | | LTO | no Zig LTO | Δ | |---|---|---|---| | `Bun.escapeHTML` | 171.3 ns | 183.2 ns | **6.5%** | | `TextDecoder.decode` | 104.0 ns | 106.8 ns | **2.6%** | | `oha -n 1M -c 50` | ~200,600 req/s | ~193,800 req/s | **3.5%** | Caveat: canary is a slightly different revision; same-revision A/B would be cleaner. ## Notes - `zigLto(cfg)` gates on `usingParallelCompiler` — the older Windows `ZIG_COMMIT` predates the summary fix. - `codegenThreads()` returns 1 when LTO is on (zig_llvm.cpp gates SplitModule on `!lto`). --------- Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>

Since 02fbd62 (oven-sh#29618, enable LTO for bun-zig.o), `linux aarch64 - build-bun` OOMs on `r8g.xlarge`: lld's full-LTO link now ingests ~1.39 GB of bitcode (794 MB JSC + 347 MB libbun-profile + 247 MB bun-zig.o), peak memory crosses the 31.5 GiB available (no swap) about 18 min in, and the kernel OOM-kills lld — sometimes taking the BuildKite agent with it (`exit:-1`). This was already failing on the PR's own CI (build 47390) before merge. | target | instance | bitcode in | result | |---|---|---|---| | linux x64-glibc | r7i.xlarge (4c/32G) | 1392 MB | pass, 18.98 min | | linux aarch64-musl | r8g.xlarge (4c/32G) | 1310 MB | pass, 18.15 min | | **linux aarch64-glibc** | r8g.xlarge (4c/32G) | 1388 MB | **Killed, 18.75 min** | x64 has slightly *more* bitcode and survives, so the tipping factor is LLVM's AArch64 backend holding more state during codegen than X86 — not a clang bug, just expected full-LTO scaling. x64 is one bitcode-adding commit away from the same fate. Bump both linux link boxes to `2xlarge` (8 vCPU / 64 GiB). The extra cores also help lld's parallel LTO codegen phase (the single-threaded merge/opt half won't benefit, so expect ~14–15 min rather than half of 19). Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>

Jarred-Sumner force-pushed the jarred/zig-lto branch from 2a376e2 to d963aa4 Compare April 23, 2026 00:33

claude Bot reviewed Apr 23, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

claude Bot reviewed Apr 23, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

build: collapse ZIG_COMMIT split, all hosts use parallel compiler

c0e855c

The split existed because COFF shard emission isn't implemented, but codegenThreads() already forces 1 on Windows so sharding never happens. Parallel sema is deterministic and works on all hosts.

claude Bot reviewed Apr 23, 2026

View reviewed changes

Jarred-Sumner merged commit 02fbd62 into main Apr 23, 2026
53 of 58 checks passed

Jarred-Sumner deleted the jarred/zig-lto branch April 23, 2026 01:18

robobun mentioned this pull request Apr 23, 2026

Blob: render empty blobs as zero-byte, not "detached" #29639

Merged

Jarred-Sumner mentioned this pull request Apr 23, 2026

ci: bump linux link box to 2xlarge (64 GiB) for full-LTO #29650

Merged

Conversation

Jarred-Sumner commented Apr 23, 2026

Why this didn't work before

What gets inlined

Measured impact (linux-x64, vs latest canary)

Notes

Uh oh!

robobun commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

What the bug is

Code path that triggers it

Why existing code doesn't prevent it

Step-by-step proof

Impact

Fix

What changed

Concrete trigger path

Why existing code doesn't prevent it

Impact

Fix

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 23, 2026

Choose a reason for hiding this comment

What the issue is

Why it's stale

Step-by-step proof

Impact

How to fix

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

robobun commented Apr 23, 2026 •

edited

Loading

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading