Skip to content

build: enable LTO for bun-zig.o#29618

Merged
Jarred-Sumner merged 2 commits into
mainfrom
jarred/zig-lto
Apr 23, 2026
Merged

build: enable LTO for bun-zig.o#29618
Jarred-Sumner merged 2 commits into
mainfrom
jarred/zig-lto

Conversation

@Jarred-Sumner

Copy link
Copy Markdown
Collaborator

Emit bun-zig.o as LLVM bitcode (obj.lto = .full) so it participates in the same LTO link as the C/C++ side, instead of being a native ELF object that lld can only link, not optimize across.

Why this didn't work before

The only documented Zig→bitcode path was -Dobj_format=bc, which routes through getEmittedLlvmBc()-femit-llvm-bc. That writes Zig's self-hosted unoptimized bitcode (Producer "zig 0.15.2") before libLLVM ever touches it, and the self-hosted writer has encoding bugs for large modules → Invalid record in lld. The proper -flto path was blocked by use_lld = false on Linux causing LtoRequiresLld or silent fallback.

This PR uses obj.lto = .full + obj.use_lld = true, which routes through libLLVM's WriteBitcodeToFile after the LTO pre-link O3 pipeline. The output (Producer "LLVM20.1.2") is forward-compatible with lld 21/22.

Bumps ZIG_COMMIT_PARALLEL to pick up oven-sh/zig@04e7f6ac1e, which adds EnableSplitLTOUnit=1 + a module summary to Zig's LTO bitcode — required for -fwhole-program-vtables to accept the link (otherwise: inconsistent LTO Unit splitting).

What gets inlined

Boundary Declared Eliminated %
Zig export fn → C++ 336 142 42%
C us_* (usockets) ← Zig 115 79 69%
C++ uws_* (uWebSockets wrappers) ← Zig 108 76 70%
mi_free all 100% (0 call insns)

Verified by disassembly: e.g. Bun__readOriginTimer body (optional check, clock_gettime, ns math) appears directly inside C++ Process_functionHRTime; symbol is gone.

Measured impact (linux-x64, vs latest canary)

LTO no Zig LTO Δ
Bun.escapeHTML 171.3 ns 183.2 ns 6.5%
TextDecoder.decode 104.0 ns 106.8 ns 2.6%
oha -n 1M -c 50 ~200,600 req/s ~193,800 req/s 3.5%

Caveat: canary is a slightly different revision; same-revision A/B would be cleaner.

Notes

  • zigLto(cfg) gates on usingParallelCompiler — the older Windows ZIG_COMMIT predates the summary fix.
  • codegenThreads() returns 1 when LTO is on (zig_llvm.cpp gates SplitModule on !lto).

@robobun

robobun commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator
Updated 5:44 PM PT - Apr 22nd, 2026

❌ Your commit d963aa46 has 1 failures in Build #47387 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29618

That installs a local version of the PR into your bun-29618 executable, so you can run:

bun-29618 --bun

Emit bun-zig.o as LLVM bitcode (obj.lto=.full) so it participates in the
same LTO link as the C/C++ side. Previously the only bitcode path was
-Dobj_format=bc, which writes zig's self-hosted (unoptimized) bitcode and
is rejected by lld with "Invalid record".

Bumps ZIG_COMMIT_PARALLEL to pick up the EnableSplitLTOUnit/summary fix
in zig_llvm.cpp, required for -fwhole-program-vtables interop.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 .buildkite/Dockerfile-bootstrap.sh:108-115 — This script only installs docker via dnf (line 30), but the new block calls git clone — Amazon Linux 2023 AMIs don't ship git by default, so the clone fails with "command not found" and the if-body is silently skipped (set -e doesn't apply to if-conditions). The test docker images are never pre-pulled on the amazonlinux+docker hosts; fix by changing line 30 to dnf install -y docker git.

    Extended reasoning...

    What the bug is

    .buildkite/Dockerfile-bootstrap.sh runs on Amazon Linux 2023 hosts (the only dnf-based platforms in ci.mjs buildPlatforms with features: ['docker']). Its only package install is line 30: dnf install -y docker. This PR adds a new block at lines 108–115 that runs git clone --depth=1 ... to fetch the repo so it can run test/docker/prepare-ci.sh. But git is never installed, and the al2023-ami-* AMIs selected by aws.getBaseImage() for distro=amazonlinux release=2023 are intentionally minimal and do not include git in the default package set.

    Code path that triggers it

    machine.mjs uploads Dockerfile-bootstrap.sh to a fresh AL2023 instance and runs it as root. The script runs set -euo pipefail, then dnf install -y docker, builds the buildkite docker image, and reaches the new block:

    if git clone --depth=1 --branch "${BUN_BOOTSTRAP_REPO_REF:-main}" \
        https://github.com/oven-sh/bun.git /tmp/bun-test-docker; then
        ...
    fi

    With git absent, bash prints bash: git: command not found and the command exits 127. Because the command appears as the condition of an if, set -e does not apply (POSIX: "the -e setting shall be ignored when executing the compound-list following the if"). The if evaluates false, the body is skipped, and the script continues to docker container create and exits 0.

    Why existing code doesn't prevent it

    Unlike bootstrap.sh (which calls git_path="$(require git)" and explicitly installs git via install_common_software), Dockerfile-bootstrap.sh is a minimal host-side script that has never needed git before — it only orchestrates docker. Nothing in this file installs git, and nothing checks for it before the new clone.

    Step-by-step proof

    1. machine.mjs calls aws.getBaseImage({distro: 'amazonlinux', release: '2023', ...}) → matches al2023-ami-*-x86_64/arm64 (the official AL2023 AMI, which ships without git).
    2. Script runs as root, executes dnf install -y docker (line 30) — installs docker only.
    3. docker buildx build ... --build-arg BUN_REPO_REF=... succeeds (git inside the container image is installed via apt at Dockerfile line ~25, but that's inside the container, not on the host).
    4. Line 108: if git clone ...command not found, exit 127 → if-condition false.
    5. Body (prepare-ci.sh invocation) skipped; rm -rf /tmp/bun-test-docker also skipped (it's inside the if).
    6. Script proceeds, prints "Bootstrap complete", exits 0. AMI is snapshotted without postgres/mysql/redis/minio images in /var/lib/docker.

    Impact

    The bake doesn't fail — the block is explicitly documented as best-effort. The only consequence is that the new "pre-pull test docker images" optimization is silently a no-op on the linux-{aarch64,x64}-amazonlinux-2023-with-docker images. Per ci.mjs, those are build platforms (cpp/zig/link agents), not testPlatforms (which use debian/ubuntu/alpine via bootstrap.sh, where this PR's separate prefetch_build_deps path does have git). So the practical value of pre-pulling test images on these specific hosts is marginal anyway — but the author clearly intended the block to execute, and it silently won't.

    Fix

    Change line 30 to:

    dnf install -y docker git
  • 🟡 scripts/build/download.ts:161-163 — The new permanent = res.status >= 400 && res.status < 500 check treats all 4xx as non-retryable, but HTTP 429 (Too Many Requests) and 408 (Request Timeout) are transient by definition and were previously retried with backoff. Since prefetch-deps.ts fires concurrent downloads at GitHub and only treats HTTP 404 as an expected miss, a 429 would now hard-fail instead of backing off. Consider && res.status !== 408 && res.status !== 429.

    Extended reasoning...

    What changed

    Before this PR, downloadWithRetry retried on any non-ok response: the if (!res.ok || res.body === null) branch set lastError and continued into the next backoff iteration regardless of status code. After this PR, the new line at scripts/build/download.ts:163 sets permanent = res.status >= 400 && res.status < 500, which causes the loop condition attempt <= maxAttempts && !permanent to terminate after a single attempt for every 4xx code. The inline comment says "4xx is deterministic" — but that is not true for 429 (Too Many Requests) or 408 (Request Timeout), which are explicitly defined as retryable in HTTP semantics (and 429 frequently carries a Retry-After header for exactly this reason).

    Concrete trigger path

    1. scripts/prefetch-deps.ts enumerates the cross-product of {asan, lto, baseline, musl} variants and pushes ~dozens of GitHub release-asset / archive-tarball URLs into a queue.
    2. Four parallel workers call downloadWithRetry(item.url, path, ...) for each item.
    3. GitHub's CDN (codeload.github.com / objects.githubusercontent.com) returns 429 under abuse detection or load.
    4. res.ok is false, res.status is 429 → permanent = true → loop exits after attempt 1 → throw lastError (the raw HTTP 429 Too Many Requests for ... BuildError).
    5. Back in prefetch-deps.ts's fetchOne, the catch block tests /\bHTTP 404\b/.test(err.message) — this does not match 429, so the error is re-thrown.
    6. The worker's promise rejects, Promise.all(workers) rejects, and prefetch-deps.ts exits non-zero.

    In .buildkite/Dockerfile (lines 145–151), the prefetch step runs inside RUN set -e; ... (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch) with no || true guard, so a non-zero exit fails the entire docker buildx build. (bootstrap.sh and bootstrap.ps1 wrap prefetch-deps in best-effort handling, so on those paths the bake survives — but the Dockerfile path does not.) Regular CI builds calling downloadWithRetry via fetchPrebuilt/fetchZig would also fail immediately on 429 instead of retrying.

    Why existing code doesn't prevent it

    The only special-casing of non-ok statuses in callers is prefetch-deps.ts's /\bHTTP 404\b/ regex, which intentionally matches 404 only. Nothing in the call chain catches or retries 429. Before this PR the retry loop itself absorbed it; now nothing does.

    Impact

    This is a minor robustness regression. GitHub release-asset and codeload endpoints rarely return 429 for anonymous downloads, image bakes are infrequent manual operations, and on CI most downloads now hit the prefetch cache and never reach fetch(). So the practical likelihood is low — but when it does happen the failure mode is worse than before (hard fail vs. transparent retry), and the comment asserting "4xx is deterministic" is factually wrong for these codes.

    Fix

    Exclude the two transient 4xx codes from the permanent check:

    permanent = res.status >= 400 && res.status < 500 && res.status !== 408 && res.status !== 429;

    This preserves the intended fast-fail on 404/403/401 (genuinely deterministic) while restoring backoff for rate-limiting and request-timeout responses.

@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. ci: bake a dep prefetch cache into build images [build images] #29568 - Both implement the same build dependency prefetch cache system (prefetchDir, prefetchPathForUrl, tryPrefetchExtracted, BUN_BUILD_PREFETCH_DIR) with near-identical changes to scripts/build/download.ts, scripts/prefetch-deps.ts, bootstrap scripts, and Dockerfile

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Apr 23, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 21916ab5-a00c-43d9-b298-8652f2427785

📥 Commits

Reviewing files that changed from the base of the PR and between d963aa4 and c0e855c.

📒 Files selected for processing (2)
  • scripts/build/config.ts
  • scripts/build/zig.ts

Walkthrough

Adds a boolean LTO build option and propagates it through build tooling; LTO is disabled for multi-platform "check" builds and, when enabled, configures full LTO with LLD. Zig/tooling commit selection simplified to a single pinned Zig commit and sharding/generation now respects the LTO setting.

Changes

Cohort / File(s) Summary
Build option and object config
build.zig
Adds BunBuildOptions.lto: bool; disables lto for multi-platform "check" builds; when opts.lto is true configureObj sets obj.lto = .full and forces obj.use_lld = true.
Zig build script and sharding behavior
scripts/build/zig.ts
Pins a single ZIG_COMMIT (new hash), removes parallel-compiler-specific commit/selection logic, passes -Dlto=<bool> to zig build, always sets ZIG_PARALLEL_SEMA=1, and makes codegen sharding return 1 when cfg.lto is true (removes prior "only parallel compiler enables sharding" dependency).
Config resolution and display
scripts/build/config.ts
Resolves cfg.zigCommit from versionDefaults.zigCommit when unset and updates feature display to compare against the new single pinned Zig commit rather than a host-dependent default.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main change: enabling LTO for bun-zig.o, which is the core objective across all modified files.
Description check ✅ Passed The description is comprehensive and well-structured, covering the PR objectives, technical motivation, verification methods, and measured impact, though it deviates from the template structure.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/docker/index.ts (1)

386-393: 🧹 Nitpick | 🔵 Trivial

Deduplicate the build-required service list.

buildServices() now carries a second copy of the same "postgres_tls" | "mysql_tls" | "redis_unified" membership logic that DockerComposeHelper.up() already hardcodes at Line 113. This PR had to touch both places, and servicesToBuild is only inferred as string[], so a typo here would slip past type-checking and fail at runtime. Please move this to one shared, typed constant/Set<ServiceName> and reuse it in both paths.

♻️ Proposed refactor
+const servicesNeedingBuild: readonly ServiceName[] = ["postgres_tls", "mysql_tls", "redis_unified"];
+
 class DockerComposeHelper {
@@
-    if (service === "mysql_tls" || service === "redis_unified" || service === "postgres_tls") {
+    if (servicesNeedingBuild.includes(service)) {
       const buildResult = await this.exec(["build", service]);
@@
-    const servicesToBuild = ["postgres_tls", "mysql_tls", "redis_unified"];
+    const servicesToBuild = servicesNeedingBuild;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/docker/index.ts` around lines 386 - 393, The services list is duplicated
in buildServices() and DockerComposeHelper.up(); create a single exported, typed
constant (e.g., a readonly array or Set typed as Set<ServiceName> or
ReadonlyArray<ServiceName>) named something like BUILT_SERVICES or
SERVICES_REQUIRING_BUILD and replace the local servicesToBuild in
buildServices() and the hardcoded list in DockerComposeHelper.up() to reference
this constant so the membership is centralized and type-checked; update any
imports/uses to reference the new symbol and adjust types so the compiler
enforces ServiceName membership.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.buildkite/Dockerfile:
- Around line 145-153: The RUN block cloning bun should treat the prefetch step
as best-effort and then harden the output directory before exporting: change the
`if ... bun scripts/prefetch-deps.ts /opt/bun-prefetch` invocation so any
non-zero exit does not fail the build (e.g., run it in a subshell and swallow
errors or append `|| true`), and after the step (only if /opt/bun-prefetch
exists) run `chown -R root:root /opt/bun-prefetch || true` and `chmod -R a-w
/opt/bun-prefetch || true` (and add read/execute bits if needed) so
`/opt/bun-prefetch` is immutable and owned by root; reference the existing RUN
block, `${BUN_REPO_REF}`, `bun scripts/prefetch-deps.ts`, and
`/opt/bun-prefetch` when making the edits.

In @.buildkite/Dockerfile-bootstrap.sh:
- Around line 108-115: The git clone block that attempts to pull the bun repo
for Docker warmup currently swallows clone failures; add an else branch after
the if git clone ...; then print a clear warning (to stderr) including the
attempted ref variable ${BUN_BOOTSTRAP_REPO_REF:-main} and the destination
(/tmp/bun-test-docker) so clone failures are visible; update the block
surrounding the existing prepare-ci.sh checks and rm -rf to include this else
warning so bad refs or transient GitHub failures are diagnosable.

In `@build.zig`:
- Around line 756-760: The build currently allows opts.no_llvm=true with
opts.lto=true leading to obj.use_llvm = false while obj.lto = .full (an
unsupported Zig state); add a guard early after evaluating opts to detect when
opts.no_llvm and opts.lto are both true and fail fast with a clear error
message. Modify the logic around obj.use_llvm / obj.lto (the block setting
obj.use_llvm = !opts.no_llvm; obj.lto = .full; obj.use_lld = true) to check the
conflicting flags first and call a build-failure routine (or std.debug.panic
with a descriptive message) if both are set, so we never set obj.lto when
obj.use_llvm is false.

In `@scripts/bootstrap.ps1`:
- Around line 720-735: The Prefetch-Build-Deps flow still throws when "bun
scripts\prefetch-deps.ts" returns a non-zero exit ($LASTEXITCODE), which
contradicts the intended best-effort behavior; modify the block in the
Prefetch-Build-Deps function so that after running "& bun
scripts\prefetch-deps.ts $prefetchDir" you check $LASTEXITCODE and, if non-zero,
emit a warning/log (do not throw) and continue cleanup and setting of
BUN_BUILD_PREFETCH_DIR; keep the existing try/finally with Pop-Location and
ensure Remove-Item and attrib/Set-Env still run even when prefetch-deps.ts fails
so downstream steps proceed.

In `@scripts/machine.mjs`:
- Around line 1253-1271: The spawn can emit an "error" event (e.g., ENOENT)
which is not handled; update the logic around nodeSpawn/child so you attach an
"error" listener on child in addition to "close", and ensure the Promise waits
for either event: on "error" reject or resolve with a non-zero code and the
error signal/info, and on "close" resolve as before. Also remove/cleanup the
signal handlers and both listeners inside the resolution path (references:
nodeSpawn, child, forward, the process.on("SIGINT"/"SIGTERM") handlers and the
Promise that currently listens only for "close") so you don't leak handlers
after spawn failure or normal exit.

In `@test/docker/prepare-ci.sh`:
- Around line 81-86: The docker cache-save step is missing the newly designated
warmup images; update the docker save manifest (the block that runs docker save
when BUN_DOCKER_LOAD_CACHE=1) to include the same images you added with
pull_if_missing — specifically "ubuntu/squid:5.2-22.04_beta", "postgres:15.13",
and "redis:8-alpine" — so the cache archive contains the full warm set and later
jobs won’t re-pull those images.

In `@test/integration/build-prefetch/prefetch.test.ts`:
- Around line 37-48: The tests assert exact empty stderr
(expect(stderr).toBe("")) which flakily fails on ASAN shards because bun
subprocesses started via bunExe() with bunEnv may emit known ASAN warnings;
remove or relax those exact-empty-stderr assertions in the prefetch test blocks
(the Bun.spawn usage that defines proc, stdout, stderr, exitCode and checks
stdout, file contents and exitCode) and instead rely on the stable assertions
(stdout contains "using prefetch cache", filesystem check Bun.file(...).text()
=== "hi\n", and exitCode === 0); apply the same change to the other test blocks
noted (the other Bun.spawn sections around lines 59-79, 99-116, 145-160).

---

Outside diff comments:
In `@test/docker/index.ts`:
- Around line 386-393: The services list is duplicated in buildServices() and
DockerComposeHelper.up(); create a single exported, typed constant (e.g., a
readonly array or Set typed as Set<ServiceName> or ReadonlyArray<ServiceName>)
named something like BUILT_SERVICES or SERVICES_REQUIRING_BUILD and replace the
local servicesToBuild in buildServices() and the hardcoded list in
DockerComposeHelper.up() to reference this constant so the membership is
centralized and type-checked; update any imports/uses to reference the new
symbol and adjust types so the compiler enforces ServiceName membership.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0c97bbcc-70b7-495d-ad34-456b426f6a30

📥 Commits

Reviewing files that changed from the base of the PR and between 890ef5a and 2a376e285751bc6dd140341d8c0ef1e4dc95a236.

📒 Files selected for processing (16)
  • .buildkite/Dockerfile
  • .buildkite/Dockerfile-bootstrap.sh
  • .buildkite/ci.mjs
  • build.zig
  • scripts/bootstrap.ps1
  • scripts/bootstrap.sh
  • scripts/build/download.ts
  • scripts/build/zig.ts
  • scripts/machine.mjs
  • scripts/packer/variables.pkr.hcl
  • scripts/packer/windows-arm64.pkr.hcl
  • scripts/packer/windows-x64.pkr.hcl
  • scripts/prefetch-deps.ts
  • test/docker/index.ts
  • test/docker/prepare-ci.sh
  • test/integration/build-prefetch/prefetch.test.ts

Comment thread .buildkite/Dockerfile Outdated
Comment on lines +145 to +153
RUN set -e; \
if git clone --depth=1 --branch ${BUN_REPO_REF} https://github.com/oven-sh/bun.git /tmp/bun-clone \
&& [ -f /tmp/bun-clone/scripts/prefetch-deps.ts ]; then \
(cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch); \
else \
echo "warning: prefetch-deps.ts unavailable at ${BUN_REPO_REF}; skipping warm cache"; \
fi; \
rm -rf /tmp/bun-clone
ENV BUN_BUILD_PREFETCH_DIR=/opt/bun-prefetch

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Mirror the bootstrap scripts here: make warm-cache best-effort and harden it before exporting it.

This Docker-only path currently fails the whole image build if bun scripts/prefetch-deps.ts exits non-zero, and it exposes /opt/bun-prefetch to later jobs without stripping write bits. That makes the image bake flakier than scripts/bootstrap.sh / scripts/bootstrap.ps1 and leaves a shared cache mutable inside a long-lived agent container.

⚙️ Suggested fix
 RUN set -e; \
     if git clone --depth=1 --branch ${BUN_REPO_REF} https://github.com/oven-sh/bun.git /tmp/bun-clone \
        && [ -f /tmp/bun-clone/scripts/prefetch-deps.ts ]; then \
-         (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch); \
+         if ! (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch); then \
+           echo "warning: prefetch-deps.ts failed; continuing without warm cache"; \
+           rm -rf /opt/bun-prefetch; \
+         else \
+           chmod -R a-w /opt/bun-prefetch; \
+         fi; \
        else \
          echo "warning: prefetch-deps.ts unavailable at ${BUN_REPO_REF}; skipping warm cache"; \
        fi; \
     rm -rf /tmp/bun-clone
🧰 Tools
🪛 Hadolint (2.14.0)

[warning] 145-145: Use WORKDIR to switch to a directory

(DL3003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.buildkite/Dockerfile around lines 145 - 153, The RUN block cloning bun
should treat the prefetch step as best-effort and then harden the output
directory before exporting: change the `if ... bun scripts/prefetch-deps.ts
/opt/bun-prefetch` invocation so any non-zero exit does not fail the build
(e.g., run it in a subshell and swallow errors or append `|| true`), and after
the step (only if /opt/bun-prefetch exists) run `chown -R root:root
/opt/bun-prefetch || true` and `chmod -R a-w /opt/bun-prefetch || true` (and add
read/execute bits if needed) so `/opt/bun-prefetch` is immutable and owned by
root; reference the existing RUN block, `${BUN_REPO_REF}`, `bun
scripts/prefetch-deps.ts`, and `/opt/bun-prefetch` when making the edits.

Comment thread .buildkite/Dockerfile-bootstrap.sh Outdated
Comment on lines +108 to +115
if git clone --depth=1 --branch "${BUN_BOOTSTRAP_REPO_REF:-main}" \
https://github.com/oven-sh/bun.git /tmp/bun-test-docker; then
if [ -f /tmp/bun-test-docker/test/docker/prepare-ci.sh ]; then
(cd /tmp/bun-test-docker/test/docker && sh prepare-ci.sh) || \
echo "warning: prepare-ci.sh failed; test docker images not pre-pulled"
fi
rm -rf /tmp/bun-test-docker
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Log clone failures in this best-effort warmup path.

If the clone itself fails, this block currently does nothing and the bake looks successful even though the Docker warm cache was never populated. Please add an else warning here so bad refs/transient GitHub failures are diagnosable.

🪵 Suggested tweak
 if git clone --depth=1 --branch "${BUN_BOOTSTRAP_REPO_REF:-main}" \
     https://github.com/oven-sh/bun.git /tmp/bun-test-docker; then
     if [ -f /tmp/bun-test-docker/test/docker/prepare-ci.sh ]; then
         (cd /tmp/bun-test-docker/test/docker && sh prepare-ci.sh) || \
             echo "warning: prepare-ci.sh failed; test docker images not pre-pulled"
     fi
     rm -rf /tmp/bun-test-docker
+else
+    echo "warning: failed to clone oven-sh/bun at ${BUN_BOOTSTRAP_REPO_REF:-main}; test docker images not pre-pulled"
 fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.buildkite/Dockerfile-bootstrap.sh around lines 108 - 115, The git clone
block that attempts to pull the bun repo for Docker warmup currently swallows
clone failures; add an else branch after the if git clone ...; then print a
clear warning (to stderr) including the attempted ref variable
${BUN_BOOTSTRAP_REPO_REF:-main} and the destination (/tmp/bun-test-docker) so
clone failures are visible; update the block surrounding the existing
prepare-ci.sh checks and rm -rf to include this else warning so bad refs or
transient GitHub failures are diagnosable.

Comment thread build.zig
Comment on lines 756 to +760
obj.use_llvm = !opts.no_llvm;
obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm;
if (opts.lto) {
obj.lto = .full;
obj.use_lld = true;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

Does Zig support obj.lto = .fullwhenuse_llvm = false / the self-hosted backend is enabled?

💡 Result:

No, Zig does not support obj.lto = .full when use_llvm = false / the self-hosted backend is enabled.

Citations:


🏁 Script executed:

# Find the build.zig file and check the relevant sections
fd -t f "build.zig" | head -1 | xargs wc -l

Repository: oven-sh/bun

Length of output: 69


🏁 Script executed:

# Check the BunBuildOptions struct and flag parsing
grep -n "no_llvm\|\.lto" build.zig | head -30

Repository: oven-sh/bun

Length of output: 869


🏁 Script executed:

# Check how opts is initialized and where defaults are set
sed -n '50,70p' build.zig
sed -n '200,220p' build.zig
sed -n '640,660p' build.zig

Repository: oven-sh/bun

Length of output: 2849


🏁 Script executed:

# Check if there are existing validation/guard patterns elsewhere in build.zig
rg "const fail_step|addFail|@panic" build.zig -A 2 -B 2 | head -50

Repository: oven-sh/bun

Length of output: 1783


Reject -Dno_llvm=true together with -Dlto=true.

Full LTO is an LLVM-only path, but the code still allows both flags to be set independently. When both are true, obj.use_llvm becomes false while obj.lto = .full is set, which Zig does not support. Add a guard to fail fast instead of producing confusing Zig errors.

Suggested fix
     obj.use_llvm = !opts.no_llvm;
     obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm;
     if (opts.lto) {
+        if (opts.no_llvm) {
+            const fail_step = b.addFail("LTO requires the LLVM backend");
+            obj.step.dependOn(&fail_step.step);
+            return;
+        }
         obj.lto = .full;
         obj.use_lld = true;
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
obj.use_llvm = !opts.no_llvm;
obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm;
if (opts.lto) {
obj.lto = .full;
obj.use_lld = true;
obj.use_llvm = !opts.no_llvm;
obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm;
if (opts.lto) {
if (opts.no_llvm) {
const fail_step = b.addFail("LTO requires the LLVM backend");
obj.step.dependOn(&fail_step.step);
return;
}
obj.lto = .full;
obj.use_lld = true;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@build.zig` around lines 756 - 760, The build currently allows
opts.no_llvm=true with opts.lto=true leading to obj.use_llvm = false while
obj.lto = .full (an unsupported Zig state); add a guard early after evaluating
opts to detect when opts.no_llvm and opts.lto are both true and fail fast with a
clear error message. Modify the logic around obj.use_llvm / obj.lto (the block
setting obj.use_llvm = !opts.no_llvm; obj.lto = .full; obj.use_lld = true) to
check the conflicting flags first and call a build-failure routine (or
std.debug.panic with a descriptive message) if both are set, so we never set
obj.lto when obj.use_llvm is false.

Comment thread scripts/bootstrap.ps1 Outdated
Comment on lines +720 to +735
Push-Location $cloneDir
try {
& bun scripts\prefetch-deps.ts $prefetchDir
if ($LASTEXITCODE -ne 0) { throw "prefetch-deps.ts failed" }
} finally {
Pop-Location
}
Remove-Item -Recurse -Force $cloneDir

# Read-only: download.ts only ever copies FROM here, and a writable baked
# input is something a misbehaving job could corrupt for later jobs on the
# same runner.
& attrib +R "$prefetchDir\*" /S /D

Set-Env "BUN_BUILD_PREFETCH_DIR" $prefetchDir
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Prefetch-Build-Deps is still load-bearing on prefetch-deps.ts failure.

Clone failures are downgraded to warnings, but a non-zero exit from bun scripts\prefetch-deps.ts still throws out of this function and aborts the bootstrap. That contradicts the “best-effort” behavior described above and the shell implementation.

🛠️ Suggested fix
   Push-Location $cloneDir
   try {
     & bun scripts\prefetch-deps.ts $prefetchDir
-    if ($LASTEXITCODE -ne 0) { throw "prefetch-deps.ts failed" }
+    if ($LASTEXITCODE -ne 0) {
+      Write-Output "warning: prefetch-deps.ts failed; skipping warm cache"
+      Remove-Item -Recurse -Force -ErrorAction SilentlyContinue $prefetchDir
+      return
+    }
+  } catch {
+    Write-Output "warning: prefetch-deps.ts failed; skipping warm cache"
+    Remove-Item -Recurse -Force -ErrorAction SilentlyContinue $prefetchDir
+    return
   } finally {
     Pop-Location
   }
🧰 Tools
🪛 PSScriptAnalyzer (1.25.0)

[warning] Missing BOM encoding for non-ASCII encoded file 'bootstrap.ps1'

(PSUseBOMForUnicodeEncodedFile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/bootstrap.ps1` around lines 720 - 735, The Prefetch-Build-Deps flow
still throws when "bun scripts\prefetch-deps.ts" returns a non-zero exit
($LASTEXITCODE), which contradicts the intended best-effort behavior; modify the
block in the Prefetch-Build-Deps function so that after running "& bun
scripts\prefetch-deps.ts $prefetchDir" you check $LASTEXITCODE and, if non-zero,
emit a warning/log (do not throw) and continue cleanup and setting of
BUN_BUILD_PREFETCH_DIR; keep the existing try/finally with Pop-Location and
ensure Remove-Item and attrib/Set-Env still run even when prefetch-deps.ts fails
so downstream steps proceed.

Comment thread scripts/machine.mjs Outdated
Comment on lines +1253 to +1271
const child = nodeSpawn(packerArgs[0], packerArgs.slice(1), {
stdio: "inherit",
env: {
...process.env,
// Packer also reads these env vars
ARM_CLIENT_ID: clientId,
ARM_CLIENT_SECRET: clientSecret,
ARM_SUBSCRIPTION_ID: subscriptionId,
ARM_TENANT_ID: tenantId,
},
});
let cancelled = false;
const forward = signal => {
cancelled = true;
console.log(`[packer] received ${signal}, forwarding to packer for Azure cleanup...`);
child.kill(signal);
};
process.on("SIGINT", forward);
process.on("SIGTERM", forward);
const [code, signal] = await new Promise(done => child.on("close", (c, s) => done([c, s])));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, check the file exists and read the relevant section
wc -l scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 79


🏁 Script executed:

# Read the exact lines mentioned in the review
sed -n '1253,1271p' scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 538


🏁 Script executed:

# Get more context around this code block
sed -n '1240,1280p' scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 1206


🏁 Script executed:

# Search for other uses of nodeSpawn or spawn in this file to understand patterns
rg "nodeSpawn|\.spawn\(" scripts/machine.mjs -n -A 5 -B 2

Repository: oven-sh/bun

Length of output: 4997


🏁 Script executed:

# Check if there are error handlers attached to other spawned processes in the file
rg '\.on\("error"' scripts/machine.mjs -n -B 3 -A 1

Repository: oven-sh/bun

Length of output: 37


🏁 Script executed:

# Search for the spawnSafe function definition
rg "function spawnSafe|const spawnSafe|export.*spawnSafe" scripts/machine.mjs -A 10

Repository: oven-sh/bun

Length of output: 488


🏁 Script executed:

# Get more context around line 1253 - read a larger block to see if nodeSpawn is actually there
sed -n '1200,1300p' scripts/machine.mjs | head -150

Repository: oven-sh/bun

Length of output: 2894


🏁 Script executed:

# Check imports at the top of the file
head -50 scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 975


🏁 Script executed:

# Search for nodeSpawn usage
rg "nodeSpawn" scripts/machine.mjs -n -B 2 -A 2

Repository: oven-sh/bun

Length of output: 37


🏁 Script executed:

# Search for the spawn and spawnSafe definitions in utils.mjs
rg "function spawn|export.*spawn|const spawn" scripts/utils.mjs -A 15 | head -100

Repository: oven-sh/bun

Length of output: 2770


🏁 Script executed:

# Look specifically at what spawnSafe does
rg "spawnSafe" scripts/utils.mjs -A 20 | head -80

Repository: oven-sh/bun

Length of output: 1961


🏁 Script executed:

# Check if nodeSpawn is defined anywhere in the codebase
rg "nodeSpawn" . --type js --type mjs -n

Repository: oven-sh/bun

Length of output: 82


🏁 Script executed:

# Look for the spawn export to understand what nodeSpawn refers to
grep -n "^export.*spawn\|^const spawn\|^function spawn" scripts/utils.mjs

Repository: oven-sh/bun

Length of output: 422


🏁 Script executed:

# Look for nodeSpawn import in utils.mjs
head -100 scripts/utils.mjs | grep -n "nodeSpawn\|import.*spawn"

Repository: oven-sh/bun

Length of output: 141


🏁 Script executed:

# Search for nodeSpawn definition in utils.mjs
rg "nodeSpawn|import.*spawn" scripts/utils.mjs | head -20

Repository: oven-sh/bun

Length of output: 434


🏁 Script executed:

# Read the full spawn function to see how it handles errors
sed -n '237,340p' scripts/utils.mjs

Repository: oven-sh/bun

Length of output: 2626


🏁 Script executed:

# Check git status and see if there are pending changes
cd scripts && git status

Repository: oven-sh/bun

Length of output: 275


🏁 Script executed:

# Check if there's a git diff that shows the actual changes
git diff scripts/machine.mjs | head -200

Repository: oven-sh/bun

Length of output: 37


🏁 Script executed:

# See the git log to understand recent changes
git log --oneline scripts/machine.mjs | head -5

Repository: oven-sh/bun

Length of output: 95


🏁 Script executed:

# Search for the exact pattern from the review comment
rg "const cancelled = false" scripts/machine.mjs -B 5 -A 10

Repository: oven-sh/bun

Length of output: 37


🏁 Script executed:

# Search for signal forwarding pattern with SIGINT and SIGTERM
rg "forward = signal|received.*signal.*forwarding" scripts/machine.mjs -B 5 -A 10

Repository: oven-sh/bun

Length of output: 37


🏁 Script executed:

# Search more broadly for the packer signal handling
rg "packer.*received.*forwarding|child\.kill\(signal\)" scripts/machine.mjs -B 10 -A 10

Repository: oven-sh/bun

Length of output: 37


🏁 Script executed:

# Check if there's a different branch or stashed changes
git stash list
git branch -a

Repository: oven-sh/bun

Length of output: 152


🏁 Script executed:

# Check if there's an uncommitted draft or stage area
git diff --cached scripts/machine.mjs | head -100

Repository: oven-sh/bun

Length of output: 37


🏁 Script executed:

# Look for any pattern matching the review comment more carefully
# Search for "packer.*init\|packer.*build" to understand Packer invocation
rg "packer.*init|packer.*build" scripts/machine.mjs -B 10 -A 20

Repository: oven-sh/bun

Length of output: 2438


🏁 Script executed:

# Check what the review comment context implies - this looks like a PR review
# The snippet shows code that would replace the current spawnSafe pattern
# Let's verify the current state is still using spawnSafe
grep -n "await spawnSafe.*packer" scripts/machine.mjs

Repository: oven-sh/bun

Length of output: 171


🏁 Script executed:

# Verify Node.js spawn error behavior - get the Node.js documentation context
# The key question: does subprocess "close" fire if spawn() itself fails?
# According to Node.js docs, if spawn fails to exec, it emits "error", not "close"
python3 << 'EOF'
# Document the Node.js child_process.spawn() behavior
behavior = {
    "error_event": "Emitted when: spawn() cannot exec the command (ENOENT, EACCES, etc.)",
    "error_examples": ["ENOENT (file not found)", "EACCES (permission denied)", "EISDIR (is a directory)"],
    "close_event": "Emitted when: child process has exited AND stdio streams closed",
    "close_vs_exit": "close fires after exit; multiple processes can share stdio",
    "if_spawn_fails": "error event fires, close event does NOT fire",
    "conclusion": "Listening only to 'close' will not catch spawn exec failures"
}
for key, value in behavior.items():
    print(f"{key}: {value}")
EOF

Repository: oven-sh/bun

Length of output: 522


Add error handler to catch spawn failures before waiting for process exit.

The code switches from spawnSafe() to raw nodeSpawn() to enable signal forwarding. However, if nodeSpawn() cannot exec Packer (e.g., file not found, permission denied), Node emits an "error" event instead of allowing the process to start. By only listening to the "close" event, spawn failures will not be caught.

Suggested fix
-  const [code, signal] = await new Promise(done => child.on("close", (c, s) => done([c, s])));
+  const [code, signal] = await new Promise((resolve, reject) => {
+    child.once("error", reject);
+    child.once("close", (c, s) => resolve([c, s]));
+  });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/machine.mjs` around lines 1253 - 1271, The spawn can emit an "error"
event (e.g., ENOENT) which is not handled; update the logic around
nodeSpawn/child so you attach an "error" listener on child in addition to
"close", and ensure the Promise waits for either event: on "error" reject or
resolve with a non-zero code and the error signal/info, and on "close" resolve
as before. Also remove/cleanup the signal handlers and both listeners inside the
resolution path (references: nodeSpawn, child, forward, the
process.on("SIGINT"/"SIGTERM") handlers and the Promise that currently listens
only for "close") so you don't leak handlers after spawn failure or normal exit.

Comment thread test/docker/prepare-ci.sh Outdated
Comment on lines +81 to +86
pull_if_missing "ubuntu/squid:5.2-22.04_beta"
# Base images of the locally-built services above — pulled implicitly by
# `compose build`, listed here so a missing-layer pull surfaces in this step
# rather than as a slow first test.
pull_if_missing "postgres:15.13"
pull_if_missing "redis:8-alpine"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The cache-save manifest is now missing part of the warm set.

These images are now treated as required warmups here, but the docker save block at Lines 103-110 still archives only the old set. BUN_DOCKER_LOAD_CACHE=1 will therefore keep missing ubuntu/squid:5.2-22.04_beta, postgres:15.13, and redis:8-alpine, so later jobs still hit the network for exactly the images this change added.

💾 Minimal fix
     docker save \
         postgres:15 \
+        postgres:15.13 \
         mysql:8.4 \
         mysql:8.0 \
         redis:7-alpine \
+        redis:8-alpine \
         minio/minio:latest \
         crossbario/autobahn-testsuite \
+        ubuntu/squid:5.2-22.04_beta \
         -o "$CACHE_FILE"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/docker/prepare-ci.sh` around lines 81 - 86, The docker cache-save step
is missing the newly designated warmup images; update the docker save manifest
(the block that runs docker save when BUN_DOCKER_LOAD_CACHE=1) to include the
same images you added with pull_if_missing — specifically
"ubuntu/squid:5.2-22.04_beta", "postgres:15.13", and "redis:8-alpine" — so the
cache archive contains the full warm set and later jobs won’t re-pull those
images.

Comment on lines +37 to +48
await using proc = Bun.spawn({
cmd: [bunExe(), join(repoRoot, "scripts", "build", "fetch-cli.ts"), "dep", "fake", repo, commit, dest, cache],
env: { ...bunEnv, BUN_BUILD_PREFETCH_DIR: prefetch },
stdout: "pipe",
stderr: "pipe",
});
const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]);

expect(stderr).toBe("");
expect(stdout).toContain("using prefetch cache");
expect(await Bun.file(join(dest, "hello.txt")).text()).toBe("hi\n");
expect(exitCode).toBe(0);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

These exact stderr === "" assertions will fail on ASAN shards.

All four tests spawn bunExe() with bunEnv, so debug ASAN builds can still emit the known WARNING: ASAN interferes... line on stderr. That makes these assertions fail even when the prefetch behavior is correct; here the stable regression guards are stdout, filesystem state, and exit code.

Based on learnings, bunExe() subprocesses in debug ASAN builds can emit a known warning on stderr even with bunEnv, so empty-stderr assertions are intentionally avoided in this pattern.

Also applies to: 59-79, 99-116, 145-160

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/integration/build-prefetch/prefetch.test.ts` around lines 37 - 48, The
tests assert exact empty stderr (expect(stderr).toBe("")) which flakily fails on
ASAN shards because bun subprocesses started via bunExe() with bunEnv may emit
known ASAN warnings; remove or relax those exact-empty-stderr assertions in the
prefetch test blocks (the Bun.spawn usage that defines proc, stdout, stderr,
exitCode and checks stdout, file contents and exitCode) and instead rely on the
stable assertions (stdout contains "using prefetch cache", filesystem check
Bun.file(...).text() === "hi\n", and exitCode === 0); apply the same change to
the other test blocks noted (the other Bun.spawn sections around lines 59-79,
99-116, 145-160).

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this changes how release binaries are built (Zig→bitcode LTO across the C++ boundary) and bumps the pinned Zig compiler commit — worth a human sanity check before it ships.

Extended reasoning...

Overview

This PR enables full LTO for bun-zig.o by emitting it as LLVM bitcode (obj.lto = .full + obj.use_lld = true) so it participates in the same link-time optimization pass as the C++ side. Changes touch build.zig (new -Dlto option wired into BunBuildOptions and configureObj) and scripts/build/zig.ts (new zigLto() gate, -Dlto= build arg, codegenThreads() forced to 1 under LTO, and a ZIG_COMMIT_PARALLEL bump to pick up the EnableSplitLTOUnit/module-summary fix).

Security risks

None identified. This is build-system configuration; no runtime code paths, auth, crypto, or user-input handling are touched. The bumped Zig commit is from oven-sh's own fork.

Level of scrutiny

High. Although the diff is small (~30 lines) and mechanically clean, it changes the object format emitted for every shipped release binary and enables cross-language inlining that eliminates 42–100% of boundary symbols per the PR description. LTO across a Zig/C++ FFI boundary is the kind of thing that can surface latent ODR/visibility/UB issues only at link or runtime, and it also bumps the pinned compiler — both are production-critical build-pipeline decisions a human owner should sign off on.

Other factors

The logic itself looks correct: zigLto is properly gated on usingParallelCompiler (so the older Windows-host compiler is unaffected), addMultiCheck hardcodes lto = false for check-only steps, and codegenThreads() returning 1 under LTO keeps zigObjectPaths() and the no_merge_shards path consistent. CI status is mostly infrastructure failures ("Failed to create agent") unrelated to this change, so a green run hasn't yet validated the new bitcode path across all targets.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
build.zig (1)

204-216: ⚠️ Potential issue | 🟠 Major

Still reject -Dno_llvm=true together with -Dlto=true.

These flags are still accepted independently, so the build can reach configureObj() with obj.use_llvm = false and obj.lto = .full. Fail fast here instead of constructing that contradictory configuration.

Suggested fix
 const no_llvm = b.option(bool, "no_llvm", "Experiment with Zig self hosted backends. No stability guaranteed") orelse false;
 const lto = b.option(bool, "lto", "Emit LLVM bitcode for full LTO instead of a native object") orelse false;
+if (no_llvm and lto) {
+    std.debug.panic("`-Dlto=true` requires the LLVM backend", .{});
+}
Does Zig support setting `obj.lto = .full` while `obj.use_llvm = false` / `-Dno_llvm=true` in `build.zig`?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@build.zig` around lines 204 - 216, The build accepts a contradictory option
combination (no_llvm true with lto true) and can reach configureObj() with
BunBuildOptions where obj.use_llvm = false and obj.lto = .full; add an early
validation after reading the options (no_llvm and lto) and before
constructing/using build_options to fail fast: detect if no_llvm && lto then
emit a clear error message and stop configuration (e.g., process exit or
b.addError/return) so the contradictory BunBuildOptions state is never created;
reference the no_llvm and lto flags, the BunBuildOptions construction, and
configureObj()/obj.use_llvm/obj.lto when locating where to add the check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@build.zig`:
- Around line 204-216: The build accepts a contradictory option combination
(no_llvm true with lto true) and can reach configureObj() with BunBuildOptions
where obj.use_llvm = false and obj.lto = .full; add an early validation after
reading the options (no_llvm and lto) and before constructing/using
build_options to fail fast: detect if no_llvm && lto then emit a clear error
message and stop configuration (e.g., process exit or b.addError/return) so the
contradictory BunBuildOptions state is never created; reference the no_llvm and
lto flags, the BunBuildOptions construction, and
configureObj()/obj.use_llvm/obj.lto when locating where to add the check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 433e7a3c-4d72-477c-a323-d65c655bb9ee

📥 Commits

Reviewing files that changed from the base of the PR and between 2a376e285751bc6dd140341d8c0ef1e4dc95a236 and d963aa4.

📒 Files selected for processing (2)
  • build.zig
  • scripts/build/zig.ts

The split existed because COFF shard emission isn't implemented, but
codegenThreads() already forces 1 on Windows so sharding never happens.
Parallel sema is deterministic and works on all hosts.
Comment thread scripts/build/zig.ts
`-Denable_fuzzilli=${bool(cfg.fuzzilli)}`,
`-Denable_valgrind=${bool(cfg.valgrind)}`,
`-Denable_tinycc=${bool(cfg.tinycc)}`,
`-Dlto=${bool(cfg.lto)}`,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The comment at lines 446-448 still says "MUST be 0 on the stable compiler — see codegenThreads()", but this PR removed the stable/parallel compiler split and codegenThreads() no longer ever returns 0 (its minimum is now 1). Consider updating this comment to drop the stale "stable compiler" / "0" reference.

Extended reasoning...

What the issue is

The comment immediately following the new -Dlto=${bool(cfg.lto)} line in zigBuildArgs() reads:

// Sharded LLVM codegen — one shard per host core on the parallel
// compiler. Zig has no "auto" value (0 = single-threaded). MUST be 0
// on the stable compiler — see codegenThreads().
`-Dllvm_codegen_threads=${codegenThreads(cfg)}`,

This comment is now stale documentation that this PR's own changes orphaned.

Why it's stale

This PR collapsed the ZIG_COMMIT / ZIG_COMMIT_PARALLEL split into a single ZIG_COMMIT constant and deleted:

  • usingParallelCompiler()
  • defaultZigCommit()
  • the if (!usingParallelCompiler(cfg)) return 0; branch at the top of codegenThreads()

After these removals, codegenThreads() (lines 54-66) returns at minimum 1 — never 0. There is no longer a "stable compiler" vs "parallel compiler" distinction anywhere in the codebase, and the function the comment tells the reader to consult no longer contains any logic about 0 or a stable compiler.

Step-by-step proof

  1. Before this PR, codegenThreads() started with if (!usingParallelCompiler(cfg)) return 0; — the comment's "MUST be 0 on the stable compiler" pointed at that line.
  2. The diff removes that line: - if (!usingParallelCompiler(cfg)) return 0; and replaces it with nothing (the function now starts with if (cfg.windows) return 1;).
  3. The diff also removes the entire "TEMPORARY SPLIT" doc block, ZIG_COMMIT_PARALLEL, defaultZigCommit(), and usingParallelCompiler() — so "the stable compiler" no longer refers to anything that exists.
  4. The comment at 446-448 was not touched and still tells the reader "MUST be 0 on the stable compiler — see codegenThreads()". A reader following that pointer will find no such logic.

Impact

No behavioral impact — this is purely misleading documentation. A future reader trying to understand why -Dllvm_codegen_threads is set the way it is will be sent looking for a "stable compiler" branch that no longer exists.

How to fix

Update the comment to reflect the current state, e.g.:

// Sharded LLVM codegen — see codegenThreads() for when sharding is
// gated off (Windows, LTO, non-ASAN CI). Zig has no "auto" value.

Or simply drop the second sentence entirely, since codegenThreads()'s own doc comment now fully explains the gating.

@Jarred-Sumner Jarred-Sumner merged commit 02fbd62 into main Apr 23, 2026
53 of 58 checks passed
@Jarred-Sumner Jarred-Sumner deleted the jarred/zig-lto branch April 23, 2026 01:18
Jarred-Sumner added a commit that referenced this pull request Apr 23, 2026
Since 02fbd62 (#29618, enable LTO for bun-zig.o), `linux aarch64 -
build-bun` OOMs on `r8g.xlarge`: lld's full-LTO link now ingests ~1.39
GB of bitcode (794 MB JSC + 347 MB libbun-profile + 247 MB bun-zig.o),
peak memory crosses the 31.5 GiB available (no swap) about 18 min in,
and the kernel OOM-kills lld — sometimes taking the BuildKite agent with
it (`exit:-1`).

This was already failing on the PR's own CI (build 47390) before merge.

| target | instance | bitcode in | result |
|---|---|---|---|
| linux x64-glibc | r7i.xlarge (4c/32G) | 1392 MB | pass, 18.98 min |
| linux aarch64-musl | r8g.xlarge (4c/32G) | 1310 MB | pass, 18.15 min |
| **linux aarch64-glibc** | r8g.xlarge (4c/32G) | 1388 MB | **Killed,
18.75 min** |

x64 has slightly *more* bitcode and survives, so the tipping factor is
LLVM's AArch64 backend holding more state during codegen than X86 — not
a clang bug, just expected full-LTO scaling. x64 is one bitcode-adding
commit away from the same fate.

Bump both linux link boxes to `2xlarge` (8 vCPU / 64 GiB). The extra
cores also help lld's parallel LTO codegen phase (the single-threaded
merge/opt half won't benefit, so expect ~14–15 min rather than half of
19).

Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>
structwafel pushed a commit to structwafel/bun that referenced this pull request Apr 25, 2026
Emit `bun-zig.o` as LLVM bitcode (`obj.lto = .full`) so it participates
in the same LTO link as the C/C++ side, instead of being a native ELF
object that lld can only link, not optimize across.

## Why this didn't work before

The only documented Zig→bitcode path was `-Dobj_format=bc`, which routes
through `getEmittedLlvmBc()` → `-femit-llvm-bc`. That writes Zig's
**self-hosted** unoptimized bitcode (Producer `"zig 0.15.2"`) before
libLLVM ever touches it, and the self-hosted writer has encoding bugs
for large modules → `Invalid record` in lld. The proper `-flto` path was
blocked by `use_lld = false` on Linux causing `LtoRequiresLld` or silent
fallback.

This PR uses `obj.lto = .full` + `obj.use_lld = true`, which routes
through libLLVM's `WriteBitcodeToFile` after the LTO pre-link O3
pipeline. The output (Producer `"LLVM20.1.2"`) is forward-compatible
with lld 21/22.

Bumps `ZIG_COMMIT_PARALLEL` to pick up oven-sh/zig@04e7f6ac1e, which
adds `EnableSplitLTOUnit=1` + a module summary to Zig's LTO bitcode —
required for `-fwhole-program-vtables` to accept the link (otherwise:
`inconsistent LTO Unit splitting`).

## What gets inlined

| Boundary | Declared | Eliminated | % |
|---|---|---|---|
| Zig `export fn` → C++ | 336 | 142 | 42% |
| C `us_*` (usockets) ← Zig | 115 | 79 | 69% |
| C++ `uws_*` (uWebSockets wrappers) ← Zig | 108 | 76 | 70% |
| `mi_free` | — | all | 100% (0 call insns) |

Verified by disassembly: e.g. `Bun__readOriginTimer` body (optional
check, `clock_gettime`, ns math) appears directly inside C++
`Process_functionHRTime`; symbol is gone.

## Measured impact (linux-x64, vs latest canary)

| | LTO | no Zig LTO | Δ |
|---|---|---|---|
| `Bun.escapeHTML` | 171.3 ns | 183.2 ns | **6.5%** |
| `TextDecoder.decode` | 104.0 ns | 106.8 ns | **2.6%** |
| `oha -n 1M -c 50` | ~200,600 req/s | ~193,800 req/s | **3.5%** |

Caveat: canary is a slightly different revision; same-revision A/B would
be cleaner.

## Notes

- `zigLto(cfg)` gates on `usingParallelCompiler` — the older Windows
`ZIG_COMMIT` predates the summary fix.
- `codegenThreads()` returns 1 when LTO is on (zig_llvm.cpp gates
SplitModule on `!lto`).

---------

Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>
structwafel pushed a commit to structwafel/bun that referenced this pull request Apr 25, 2026
Since 02fbd62 (oven-sh#29618, enable LTO for bun-zig.o), `linux aarch64 -
build-bun` OOMs on `r8g.xlarge`: lld's full-LTO link now ingests ~1.39
GB of bitcode (794 MB JSC + 347 MB libbun-profile + 247 MB bun-zig.o),
peak memory crosses the 31.5 GiB available (no swap) about 18 min in,
and the kernel OOM-kills lld — sometimes taking the BuildKite agent with
it (`exit:-1`).

This was already failing on the PR's own CI (build 47390) before merge.

| target | instance | bitcode in | result |
|---|---|---|---|
| linux x64-glibc | r7i.xlarge (4c/32G) | 1392 MB | pass, 18.98 min |
| linux aarch64-musl | r8g.xlarge (4c/32G) | 1310 MB | pass, 18.15 min |
| **linux aarch64-glibc** | r8g.xlarge (4c/32G) | 1388 MB | **Killed,
18.75 min** |

x64 has slightly *more* bitcode and survives, so the tipping factor is
LLVM's AArch64 backend holding more state during codegen than X86 — not
a clang bug, just expected full-LTO scaling. x64 is one bitcode-adding
commit away from the same fate.

Bump both linux link boxes to `2xlarge` (8 vCPU / 64 GiB). The extra
cores also help lld's parallel LTO codegen phase (the single-threaded
merge/opt half won't benefit, so expect ~14–15 min rather than half of
19).

Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
Emit `bun-zig.o` as LLVM bitcode (`obj.lto = .full`) so it participates
in the same LTO link as the C/C++ side, instead of being a native ELF
object that lld can only link, not optimize across.

## Why this didn't work before

The only documented Zig→bitcode path was `-Dobj_format=bc`, which routes
through `getEmittedLlvmBc()` → `-femit-llvm-bc`. That writes Zig's
**self-hosted** unoptimized bitcode (Producer `"zig 0.15.2"`) before
libLLVM ever touches it, and the self-hosted writer has encoding bugs
for large modules → `Invalid record` in lld. The proper `-flto` path was
blocked by `use_lld = false` on Linux causing `LtoRequiresLld` or silent
fallback.

This PR uses `obj.lto = .full` + `obj.use_lld = true`, which routes
through libLLVM's `WriteBitcodeToFile` after the LTO pre-link O3
pipeline. The output (Producer `"LLVM20.1.2"`) is forward-compatible
with lld 21/22.

Bumps `ZIG_COMMIT_PARALLEL` to pick up oven-sh/zig@04e7f6ac1e, which
adds `EnableSplitLTOUnit=1` + a module summary to Zig's LTO bitcode —
required for `-fwhole-program-vtables` to accept the link (otherwise:
`inconsistent LTO Unit splitting`).

## What gets inlined

| Boundary | Declared | Eliminated | % |
|---|---|---|---|
| Zig `export fn` → C++ | 336 | 142 | 42% |
| C `us_*` (usockets) ← Zig | 115 | 79 | 69% |
| C++ `uws_*` (uWebSockets wrappers) ← Zig | 108 | 76 | 70% |
| `mi_free` | — | all | 100% (0 call insns) |

Verified by disassembly: e.g. `Bun__readOriginTimer` body (optional
check, `clock_gettime`, ns math) appears directly inside C++
`Process_functionHRTime`; symbol is gone.

## Measured impact (linux-x64, vs latest canary)

| | LTO | no Zig LTO | Δ |
|---|---|---|---|
| `Bun.escapeHTML` | 171.3 ns | 183.2 ns | **6.5%** |
| `TextDecoder.decode` | 104.0 ns | 106.8 ns | **2.6%** |
| `oha -n 1M -c 50` | ~200,600 req/s | ~193,800 req/s | **3.5%** |

Caveat: canary is a slightly different revision; same-revision A/B would
be cleaner.

## Notes

- `zigLto(cfg)` gates on `usingParallelCompiler` — the older Windows
`ZIG_COMMIT` predates the summary fix.
- `codegenThreads()` returns 1 when LTO is on (zig_llvm.cpp gates
SplitModule on `!lto`).

---------

Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
Since 02fbd62 (oven-sh#29618, enable LTO for bun-zig.o), `linux aarch64 -
build-bun` OOMs on `r8g.xlarge`: lld's full-LTO link now ingests ~1.39
GB of bitcode (794 MB JSC + 347 MB libbun-profile + 247 MB bun-zig.o),
peak memory crosses the 31.5 GiB available (no swap) about 18 min in,
and the kernel OOM-kills lld — sometimes taking the BuildKite agent with
it (`exit:-1`).

This was already failing on the PR's own CI (build 47390) before merge.

| target | instance | bitcode in | result |
|---|---|---|---|
| linux x64-glibc | r7i.xlarge (4c/32G) | 1392 MB | pass, 18.98 min |
| linux aarch64-musl | r8g.xlarge (4c/32G) | 1310 MB | pass, 18.15 min |
| **linux aarch64-glibc** | r8g.xlarge (4c/32G) | 1388 MB | **Killed,
18.75 min** |

x64 has slightly *more* bitcode and survives, so the tipping factor is
LLVM's AArch64 backend holding more state during codegen than X86 — not
a clang bug, just expected full-LTO scaling. x64 is one bitcode-adding
commit away from the same fate.

Bump both linux link boxes to `2xlarge` (8 vCPU / 64 GiB). The extra
cores also help lld's parallel LTO codegen phase (the single-threaded
merge/opt half won't benefit, so expect ~14–15 min rather than half of
19).

Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants