build: enable LTO for bun-zig.o#29618
Conversation
|
Updated 5:44 PM PT - Apr 22nd, 2026
❌ Your commit
🧪 To try this PR locally: bunx bun-pr 29618That installs a local version of the PR into your bun-29618 --bun |
Emit bun-zig.o as LLVM bitcode (obj.lto=.full) so it participates in the same LTO link as the C/C++ side. Previously the only bitcode path was -Dobj_format=bc, which writes zig's self-hosted (unoptimized) bitcode and is rejected by lld with "Invalid record". Bumps ZIG_COMMIT_PARALLEL to pick up the EnableSplitLTOUnit/summary fix in zig_llvm.cpp, required for -fwhole-program-vtables interop.
2a376e2 to
d963aa4
Compare
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟡
.buildkite/Dockerfile-bootstrap.sh:108-115— This script only installsdockervia dnf (line 30), but the new block callsgit clone— Amazon Linux 2023 AMIs don't ship git by default, so the clone fails with "command not found" and the if-body is silently skipped (set -e doesn't apply to if-conditions). The test docker images are never pre-pulled on the amazonlinux+docker hosts; fix by changing line 30 todnf install -y docker git.Extended reasoning...
What the bug is
.buildkite/Dockerfile-bootstrap.shruns on Amazon Linux 2023 hosts (the onlydnf-based platforms inci.mjsbuildPlatformswithfeatures: ['docker']). Its only package install is line 30:dnf install -y docker. This PR adds a new block at lines 108–115 that runsgit clone --depth=1 ...to fetch the repo so it can runtest/docker/prepare-ci.sh. Butgitis never installed, and theal2023-ami-*AMIs selected byaws.getBaseImage()fordistro=amazonlinux release=2023are intentionally minimal and do not include git in the default package set.Code path that triggers it
machine.mjsuploadsDockerfile-bootstrap.shto a fresh AL2023 instance and runs it as root. The script runsset -euo pipefail, thendnf install -y docker, builds the buildkite docker image, and reaches the new block:if git clone --depth=1 --branch "${BUN_BOOTSTRAP_REPO_REF:-main}" \ https://github.com/oven-sh/bun.git /tmp/bun-test-docker; then ... fi
With git absent, bash prints
bash: git: command not foundand the command exits 127. Because the command appears as the condition of anif,set -edoes not apply (POSIX: "the -e setting shall be ignored when executing the compound-list following the if"). Theifevaluates false, the body is skipped, and the script continues todocker container createand exits 0.Why existing code doesn't prevent it
Unlike
bootstrap.sh(which callsgit_path="$(require git)"and explicitly installs git viainstall_common_software),Dockerfile-bootstrap.shis a minimal host-side script that has never needed git before — it only orchestrates docker. Nothing in this file installs git, and nothing checks for it before the new clone.Step-by-step proof
machine.mjscallsaws.getBaseImage({distro: 'amazonlinux', release: '2023', ...})→ matchesal2023-ami-*-x86_64/arm64(the official AL2023 AMI, which ships without git).- Script runs as root, executes
dnf install -y docker(line 30) — installs docker only. docker buildx build ... --build-arg BUN_REPO_REF=...succeeds (git inside the container image is installed via apt at Dockerfile line ~25, but that's inside the container, not on the host).- Line 108:
if git clone ...→command not found, exit 127 → if-condition false. - Body (prepare-ci.sh invocation) skipped;
rm -rf /tmp/bun-test-dockeralso skipped (it's inside the if). - Script proceeds, prints "Bootstrap complete", exits 0. AMI is snapshotted without postgres/mysql/redis/minio images in
/var/lib/docker.
Impact
The bake doesn't fail — the block is explicitly documented as best-effort. The only consequence is that the new "pre-pull test docker images" optimization is silently a no-op on the
linux-{aarch64,x64}-amazonlinux-2023-with-dockerimages. Perci.mjs, those are build platforms (cpp/zig/link agents), nottestPlatforms(which use debian/ubuntu/alpine viabootstrap.sh, where this PR's separateprefetch_build_depspath does have git). So the practical value of pre-pulling test images on these specific hosts is marginal anyway — but the author clearly intended the block to execute, and it silently won't.Fix
Change line 30 to:
dnf install -y docker git
-
🟡
scripts/build/download.ts:161-163— The newpermanent = res.status >= 400 && res.status < 500check treats all 4xx as non-retryable, but HTTP 429 (Too Many Requests) and 408 (Request Timeout) are transient by definition and were previously retried with backoff. Since prefetch-deps.ts fires concurrent downloads at GitHub and only treatsHTTP 404as an expected miss, a 429 would now hard-fail instead of backing off. Consider&& res.status !== 408 && res.status !== 429.Extended reasoning...
What changed
Before this PR,
downloadWithRetryretried on any non-ok response: theif (!res.ok || res.body === null)branch setlastErrorandcontinued into the next backoff iteration regardless of status code. After this PR, the new line at scripts/build/download.ts:163 setspermanent = res.status >= 400 && res.status < 500, which causes the loop conditionattempt <= maxAttempts && !permanentto terminate after a single attempt for every 4xx code. The inline comment says "4xx is deterministic" — but that is not true for 429 (Too Many Requests) or 408 (Request Timeout), which are explicitly defined as retryable in HTTP semantics (and 429 frequently carries aRetry-Afterheader for exactly this reason).Concrete trigger path
scripts/prefetch-deps.tsenumerates the cross-product of{asan, lto, baseline, musl}variants and pushes ~dozens of GitHub release-asset / archive-tarball URLs into a queue.- Four parallel workers call
downloadWithRetry(item.url, path, ...)for each item. - GitHub's CDN (codeload.github.com / objects.githubusercontent.com) returns 429 under abuse detection or load.
res.okis false,res.statusis 429 →permanent = true→ loop exits after attempt 1 →throw lastError(the rawHTTP 429 Too Many Requests for ...BuildError).- Back in
prefetch-deps.ts'sfetchOne, the catch block tests/\bHTTP 404\b/.test(err.message)— this does not match 429, so the error is re-thrown. - The worker's promise rejects,
Promise.all(workers)rejects, andprefetch-deps.tsexits non-zero.
In
.buildkite/Dockerfile(lines 145–151), the prefetch step runs insideRUN set -e; ... (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch)with no|| trueguard, so a non-zero exit fails the entiredocker buildx build. (bootstrap.sh and bootstrap.ps1 wrap prefetch-deps in best-effort handling, so on those paths the bake survives — but the Dockerfile path does not.) Regular CI builds callingdownloadWithRetryviafetchPrebuilt/fetchZigwould also fail immediately on 429 instead of retrying.Why existing code doesn't prevent it
The only special-casing of non-ok statuses in callers is prefetch-deps.ts's
/\bHTTP 404\b/regex, which intentionally matches 404 only. Nothing in the call chain catches or retries 429. Before this PR the retry loop itself absorbed it; now nothing does.Impact
This is a minor robustness regression. GitHub release-asset and codeload endpoints rarely return 429 for anonymous downloads, image bakes are infrequent manual operations, and on CI most downloads now hit the prefetch cache and never reach
fetch(). So the practical likelihood is low — but when it does happen the failure mode is worse than before (hard fail vs. transparent retry), and the comment asserting "4xx is deterministic" is factually wrong for these codes.Fix
Exclude the two transient 4xx codes from the permanent check:
permanent = res.status >= 400 && res.status < 500 && res.status !== 408 && res.status !== 429;
This preserves the intended fast-fail on 404/403/401 (genuinely deterministic) while restoring backoff for rate-limiting and request-timeout responses.
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughAdds a boolean LTO build option and propagates it through build tooling; LTO is disabled for multi-platform "check" builds and, when enabled, configures full LTO with LLD. Zig/tooling commit selection simplified to a single pinned Zig commit and sharding/generation now respects the LTO setting. Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
test/docker/index.ts (1)
386-393: 🧹 Nitpick | 🔵 TrivialDeduplicate the build-required service list.
buildServices()now carries a second copy of the same"postgres_tls" | "mysql_tls" | "redis_unified"membership logic thatDockerComposeHelper.up()already hardcodes at Line 113. This PR had to touch both places, andservicesToBuildis only inferred asstring[], so a typo here would slip past type-checking and fail at runtime. Please move this to one shared, typed constant/Set<ServiceName>and reuse it in both paths.♻️ Proposed refactor
+const servicesNeedingBuild: readonly ServiceName[] = ["postgres_tls", "mysql_tls", "redis_unified"]; + class DockerComposeHelper { @@ - if (service === "mysql_tls" || service === "redis_unified" || service === "postgres_tls") { + if (servicesNeedingBuild.includes(service)) { const buildResult = await this.exec(["build", service]); @@ - const servicesToBuild = ["postgres_tls", "mysql_tls", "redis_unified"]; + const servicesToBuild = servicesNeedingBuild;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/docker/index.ts` around lines 386 - 393, The services list is duplicated in buildServices() and DockerComposeHelper.up(); create a single exported, typed constant (e.g., a readonly array or Set typed as Set<ServiceName> or ReadonlyArray<ServiceName>) named something like BUILT_SERVICES or SERVICES_REQUIRING_BUILD and replace the local servicesToBuild in buildServices() and the hardcoded list in DockerComposeHelper.up() to reference this constant so the membership is centralized and type-checked; update any imports/uses to reference the new symbol and adjust types so the compiler enforces ServiceName membership.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.buildkite/Dockerfile:
- Around line 145-153: The RUN block cloning bun should treat the prefetch step
as best-effort and then harden the output directory before exporting: change the
`if ... bun scripts/prefetch-deps.ts /opt/bun-prefetch` invocation so any
non-zero exit does not fail the build (e.g., run it in a subshell and swallow
errors or append `|| true`), and after the step (only if /opt/bun-prefetch
exists) run `chown -R root:root /opt/bun-prefetch || true` and `chmod -R a-w
/opt/bun-prefetch || true` (and add read/execute bits if needed) so
`/opt/bun-prefetch` is immutable and owned by root; reference the existing RUN
block, `${BUN_REPO_REF}`, `bun scripts/prefetch-deps.ts`, and
`/opt/bun-prefetch` when making the edits.
In @.buildkite/Dockerfile-bootstrap.sh:
- Around line 108-115: The git clone block that attempts to pull the bun repo
for Docker warmup currently swallows clone failures; add an else branch after
the if git clone ...; then print a clear warning (to stderr) including the
attempted ref variable ${BUN_BOOTSTRAP_REPO_REF:-main} and the destination
(/tmp/bun-test-docker) so clone failures are visible; update the block
surrounding the existing prepare-ci.sh checks and rm -rf to include this else
warning so bad refs or transient GitHub failures are diagnosable.
In `@build.zig`:
- Around line 756-760: The build currently allows opts.no_llvm=true with
opts.lto=true leading to obj.use_llvm = false while obj.lto = .full (an
unsupported Zig state); add a guard early after evaluating opts to detect when
opts.no_llvm and opts.lto are both true and fail fast with a clear error
message. Modify the logic around obj.use_llvm / obj.lto (the block setting
obj.use_llvm = !opts.no_llvm; obj.lto = .full; obj.use_lld = true) to check the
conflicting flags first and call a build-failure routine (or std.debug.panic
with a descriptive message) if both are set, so we never set obj.lto when
obj.use_llvm is false.
In `@scripts/bootstrap.ps1`:
- Around line 720-735: The Prefetch-Build-Deps flow still throws when "bun
scripts\prefetch-deps.ts" returns a non-zero exit ($LASTEXITCODE), which
contradicts the intended best-effort behavior; modify the block in the
Prefetch-Build-Deps function so that after running "& bun
scripts\prefetch-deps.ts $prefetchDir" you check $LASTEXITCODE and, if non-zero,
emit a warning/log (do not throw) and continue cleanup and setting of
BUN_BUILD_PREFETCH_DIR; keep the existing try/finally with Pop-Location and
ensure Remove-Item and attrib/Set-Env still run even when prefetch-deps.ts fails
so downstream steps proceed.
In `@scripts/machine.mjs`:
- Around line 1253-1271: The spawn can emit an "error" event (e.g., ENOENT)
which is not handled; update the logic around nodeSpawn/child so you attach an
"error" listener on child in addition to "close", and ensure the Promise waits
for either event: on "error" reject or resolve with a non-zero code and the
error signal/info, and on "close" resolve as before. Also remove/cleanup the
signal handlers and both listeners inside the resolution path (references:
nodeSpawn, child, forward, the process.on("SIGINT"/"SIGTERM") handlers and the
Promise that currently listens only for "close") so you don't leak handlers
after spawn failure or normal exit.
In `@test/docker/prepare-ci.sh`:
- Around line 81-86: The docker cache-save step is missing the newly designated
warmup images; update the docker save manifest (the block that runs docker save
when BUN_DOCKER_LOAD_CACHE=1) to include the same images you added with
pull_if_missing — specifically "ubuntu/squid:5.2-22.04_beta", "postgres:15.13",
and "redis:8-alpine" — so the cache archive contains the full warm set and later
jobs won’t re-pull those images.
In `@test/integration/build-prefetch/prefetch.test.ts`:
- Around line 37-48: The tests assert exact empty stderr
(expect(stderr).toBe("")) which flakily fails on ASAN shards because bun
subprocesses started via bunExe() with bunEnv may emit known ASAN warnings;
remove or relax those exact-empty-stderr assertions in the prefetch test blocks
(the Bun.spawn usage that defines proc, stdout, stderr, exitCode and checks
stdout, file contents and exitCode) and instead rely on the stable assertions
(stdout contains "using prefetch cache", filesystem check Bun.file(...).text()
=== "hi\n", and exitCode === 0); apply the same change to the other test blocks
noted (the other Bun.spawn sections around lines 59-79, 99-116, 145-160).
---
Outside diff comments:
In `@test/docker/index.ts`:
- Around line 386-393: The services list is duplicated in buildServices() and
DockerComposeHelper.up(); create a single exported, typed constant (e.g., a
readonly array or Set typed as Set<ServiceName> or ReadonlyArray<ServiceName>)
named something like BUILT_SERVICES or SERVICES_REQUIRING_BUILD and replace the
local servicesToBuild in buildServices() and the hardcoded list in
DockerComposeHelper.up() to reference this constant so the membership is
centralized and type-checked; update any imports/uses to reference the new
symbol and adjust types so the compiler enforces ServiceName membership.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0c97bbcc-70b7-495d-ad34-456b426f6a30
📥 Commits
Reviewing files that changed from the base of the PR and between 890ef5a and 2a376e285751bc6dd140341d8c0ef1e4dc95a236.
📒 Files selected for processing (16)
.buildkite/Dockerfile.buildkite/Dockerfile-bootstrap.sh.buildkite/ci.mjsbuild.zigscripts/bootstrap.ps1scripts/bootstrap.shscripts/build/download.tsscripts/build/zig.tsscripts/machine.mjsscripts/packer/variables.pkr.hclscripts/packer/windows-arm64.pkr.hclscripts/packer/windows-x64.pkr.hclscripts/prefetch-deps.tstest/docker/index.tstest/docker/prepare-ci.shtest/integration/build-prefetch/prefetch.test.ts
| RUN set -e; \ | ||
| if git clone --depth=1 --branch ${BUN_REPO_REF} https://github.com/oven-sh/bun.git /tmp/bun-clone \ | ||
| && [ -f /tmp/bun-clone/scripts/prefetch-deps.ts ]; then \ | ||
| (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch); \ | ||
| else \ | ||
| echo "warning: prefetch-deps.ts unavailable at ${BUN_REPO_REF}; skipping warm cache"; \ | ||
| fi; \ | ||
| rm -rf /tmp/bun-clone | ||
| ENV BUN_BUILD_PREFETCH_DIR=/opt/bun-prefetch |
There was a problem hiding this comment.
Mirror the bootstrap scripts here: make warm-cache best-effort and harden it before exporting it.
This Docker-only path currently fails the whole image build if bun scripts/prefetch-deps.ts exits non-zero, and it exposes /opt/bun-prefetch to later jobs without stripping write bits. That makes the image bake flakier than scripts/bootstrap.sh / scripts/bootstrap.ps1 and leaves a shared cache mutable inside a long-lived agent container.
⚙️ Suggested fix
RUN set -e; \
if git clone --depth=1 --branch ${BUN_REPO_REF} https://github.com/oven-sh/bun.git /tmp/bun-clone \
&& [ -f /tmp/bun-clone/scripts/prefetch-deps.ts ]; then \
- (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch); \
+ if ! (cd /tmp/bun-clone && bun scripts/prefetch-deps.ts /opt/bun-prefetch); then \
+ echo "warning: prefetch-deps.ts failed; continuing without warm cache"; \
+ rm -rf /opt/bun-prefetch; \
+ else \
+ chmod -R a-w /opt/bun-prefetch; \
+ fi; \
else \
echo "warning: prefetch-deps.ts unavailable at ${BUN_REPO_REF}; skipping warm cache"; \
fi; \
rm -rf /tmp/bun-clone🧰 Tools
🪛 Hadolint (2.14.0)
[warning] 145-145: Use WORKDIR to switch to a directory
(DL3003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.buildkite/Dockerfile around lines 145 - 153, The RUN block cloning bun
should treat the prefetch step as best-effort and then harden the output
directory before exporting: change the `if ... bun scripts/prefetch-deps.ts
/opt/bun-prefetch` invocation so any non-zero exit does not fail the build
(e.g., run it in a subshell and swallow errors or append `|| true`), and after
the step (only if /opt/bun-prefetch exists) run `chown -R root:root
/opt/bun-prefetch || true` and `chmod -R a-w /opt/bun-prefetch || true` (and add
read/execute bits if needed) so `/opt/bun-prefetch` is immutable and owned by
root; reference the existing RUN block, `${BUN_REPO_REF}`, `bun
scripts/prefetch-deps.ts`, and `/opt/bun-prefetch` when making the edits.
| if git clone --depth=1 --branch "${BUN_BOOTSTRAP_REPO_REF:-main}" \ | ||
| https://github.com/oven-sh/bun.git /tmp/bun-test-docker; then | ||
| if [ -f /tmp/bun-test-docker/test/docker/prepare-ci.sh ]; then | ||
| (cd /tmp/bun-test-docker/test/docker && sh prepare-ci.sh) || \ | ||
| echo "warning: prepare-ci.sh failed; test docker images not pre-pulled" | ||
| fi | ||
| rm -rf /tmp/bun-test-docker | ||
| fi |
There was a problem hiding this comment.
Log clone failures in this best-effort warmup path.
If the clone itself fails, this block currently does nothing and the bake looks successful even though the Docker warm cache was never populated. Please add an else warning here so bad refs/transient GitHub failures are diagnosable.
🪵 Suggested tweak
if git clone --depth=1 --branch "${BUN_BOOTSTRAP_REPO_REF:-main}" \
https://github.com/oven-sh/bun.git /tmp/bun-test-docker; then
if [ -f /tmp/bun-test-docker/test/docker/prepare-ci.sh ]; then
(cd /tmp/bun-test-docker/test/docker && sh prepare-ci.sh) || \
echo "warning: prepare-ci.sh failed; test docker images not pre-pulled"
fi
rm -rf /tmp/bun-test-docker
+else
+ echo "warning: failed to clone oven-sh/bun at ${BUN_BOOTSTRAP_REPO_REF:-main}; test docker images not pre-pulled"
fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.buildkite/Dockerfile-bootstrap.sh around lines 108 - 115, The git clone
block that attempts to pull the bun repo for Docker warmup currently swallows
clone failures; add an else branch after the if git clone ...; then print a
clear warning (to stderr) including the attempted ref variable
${BUN_BOOTSTRAP_REPO_REF:-main} and the destination (/tmp/bun-test-docker) so
clone failures are visible; update the block surrounding the existing
prepare-ci.sh checks and rm -rf to include this else warning so bad refs or
transient GitHub failures are diagnosable.
| obj.use_llvm = !opts.no_llvm; | ||
| obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm; | ||
| if (opts.lto) { | ||
| obj.lto = .full; | ||
| obj.use_lld = true; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does Zig support obj.lto = .fullwhenuse_llvm = false / the self-hosted backend is enabled?
💡 Result:
No, Zig does not support obj.lto = .full when use_llvm = false / the self-hosted backend is enabled.
Citations:
- 1: Disable LTO by default + some LTO fixes ziglang/zig#22230
- 2:
Compilation: Enable LTO for libraries by default ziglang/zig#22228 - 3: Error (link warning) building minimal Zig executable with LTO enabled for ReleaseSafe mode targeting Windows ziglang/zig#24274
- 4: zig ld: support LTO ziglang/zig#8680
- 5: https://mintlify.com/ziglang/zig/compiler/optimization
🏁 Script executed:
# Find the build.zig file and check the relevant sections
fd -t f "build.zig" | head -1 | xargs wc -lRepository: oven-sh/bun
Length of output: 69
🏁 Script executed:
# Check the BunBuildOptions struct and flag parsing
grep -n "no_llvm\|\.lto" build.zig | head -30Repository: oven-sh/bun
Length of output: 869
🏁 Script executed:
# Check how opts is initialized and where defaults are set
sed -n '50,70p' build.zig
sed -n '200,220p' build.zig
sed -n '640,660p' build.zigRepository: oven-sh/bun
Length of output: 2849
🏁 Script executed:
# Check if there are existing validation/guard patterns elsewhere in build.zig
rg "const fail_step|addFail|@panic" build.zig -A 2 -B 2 | head -50Repository: oven-sh/bun
Length of output: 1783
Reject -Dno_llvm=true together with -Dlto=true.
Full LTO is an LLVM-only path, but the code still allows both flags to be set independently. When both are true, obj.use_llvm becomes false while obj.lto = .full is set, which Zig does not support. Add a guard to fail fast instead of producing confusing Zig errors.
Suggested fix
obj.use_llvm = !opts.no_llvm;
obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm;
if (opts.lto) {
+ if (opts.no_llvm) {
+ const fail_step = b.addFail("LTO requires the LLVM backend");
+ obj.step.dependOn(&fail_step.step);
+ return;
+ }
obj.lto = .full;
obj.use_lld = true;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| obj.use_llvm = !opts.no_llvm; | |
| obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm; | |
| if (opts.lto) { | |
| obj.lto = .full; | |
| obj.use_lld = true; | |
| obj.use_llvm = !opts.no_llvm; | |
| obj.use_lld = if (opts.os == .mac or opts.os == .linux) false else !opts.no_llvm; | |
| if (opts.lto) { | |
| if (opts.no_llvm) { | |
| const fail_step = b.addFail("LTO requires the LLVM backend"); | |
| obj.step.dependOn(&fail_step.step); | |
| return; | |
| } | |
| obj.lto = .full; | |
| obj.use_lld = true; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@build.zig` around lines 756 - 760, The build currently allows
opts.no_llvm=true with opts.lto=true leading to obj.use_llvm = false while
obj.lto = .full (an unsupported Zig state); add a guard early after evaluating
opts to detect when opts.no_llvm and opts.lto are both true and fail fast with a
clear error message. Modify the logic around obj.use_llvm / obj.lto (the block
setting obj.use_llvm = !opts.no_llvm; obj.lto = .full; obj.use_lld = true) to
check the conflicting flags first and call a build-failure routine (or
std.debug.panic with a descriptive message) if both are set, so we never set
obj.lto when obj.use_llvm is false.
| Push-Location $cloneDir | ||
| try { | ||
| & bun scripts\prefetch-deps.ts $prefetchDir | ||
| if ($LASTEXITCODE -ne 0) { throw "prefetch-deps.ts failed" } | ||
| } finally { | ||
| Pop-Location | ||
| } | ||
| Remove-Item -Recurse -Force $cloneDir | ||
|
|
||
| # Read-only: download.ts only ever copies FROM here, and a writable baked | ||
| # input is something a misbehaving job could corrupt for later jobs on the | ||
| # same runner. | ||
| & attrib +R "$prefetchDir\*" /S /D | ||
|
|
||
| Set-Env "BUN_BUILD_PREFETCH_DIR" $prefetchDir | ||
| } |
There was a problem hiding this comment.
Prefetch-Build-Deps is still load-bearing on prefetch-deps.ts failure.
Clone failures are downgraded to warnings, but a non-zero exit from bun scripts\prefetch-deps.ts still throws out of this function and aborts the bootstrap. That contradicts the “best-effort” behavior described above and the shell implementation.
🛠️ Suggested fix
Push-Location $cloneDir
try {
& bun scripts\prefetch-deps.ts $prefetchDir
- if ($LASTEXITCODE -ne 0) { throw "prefetch-deps.ts failed" }
+ if ($LASTEXITCODE -ne 0) {
+ Write-Output "warning: prefetch-deps.ts failed; skipping warm cache"
+ Remove-Item -Recurse -Force -ErrorAction SilentlyContinue $prefetchDir
+ return
+ }
+ } catch {
+ Write-Output "warning: prefetch-deps.ts failed; skipping warm cache"
+ Remove-Item -Recurse -Force -ErrorAction SilentlyContinue $prefetchDir
+ return
} finally {
Pop-Location
}🧰 Tools
🪛 PSScriptAnalyzer (1.25.0)
[warning] Missing BOM encoding for non-ASCII encoded file 'bootstrap.ps1'
(PSUseBOMForUnicodeEncodedFile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/bootstrap.ps1` around lines 720 - 735, The Prefetch-Build-Deps flow
still throws when "bun scripts\prefetch-deps.ts" returns a non-zero exit
($LASTEXITCODE), which contradicts the intended best-effort behavior; modify the
block in the Prefetch-Build-Deps function so that after running "& bun
scripts\prefetch-deps.ts $prefetchDir" you check $LASTEXITCODE and, if non-zero,
emit a warning/log (do not throw) and continue cleanup and setting of
BUN_BUILD_PREFETCH_DIR; keep the existing try/finally with Pop-Location and
ensure Remove-Item and attrib/Set-Env still run even when prefetch-deps.ts fails
so downstream steps proceed.
| const child = nodeSpawn(packerArgs[0], packerArgs.slice(1), { | ||
| stdio: "inherit", | ||
| env: { | ||
| ...process.env, | ||
| // Packer also reads these env vars | ||
| ARM_CLIENT_ID: clientId, | ||
| ARM_CLIENT_SECRET: clientSecret, | ||
| ARM_SUBSCRIPTION_ID: subscriptionId, | ||
| ARM_TENANT_ID: tenantId, | ||
| }, | ||
| }); | ||
| let cancelled = false; | ||
| const forward = signal => { | ||
| cancelled = true; | ||
| console.log(`[packer] received ${signal}, forwarding to packer for Azure cleanup...`); | ||
| child.kill(signal); | ||
| }; | ||
| process.on("SIGINT", forward); | ||
| process.on("SIGTERM", forward); | ||
| const [code, signal] = await new Promise(done => child.on("close", (c, s) => done([c, s]))); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check the file exists and read the relevant section
wc -l scripts/machine.mjsRepository: oven-sh/bun
Length of output: 79
🏁 Script executed:
# Read the exact lines mentioned in the review
sed -n '1253,1271p' scripts/machine.mjsRepository: oven-sh/bun
Length of output: 538
🏁 Script executed:
# Get more context around this code block
sed -n '1240,1280p' scripts/machine.mjsRepository: oven-sh/bun
Length of output: 1206
🏁 Script executed:
# Search for other uses of nodeSpawn or spawn in this file to understand patterns
rg "nodeSpawn|\.spawn\(" scripts/machine.mjs -n -A 5 -B 2Repository: oven-sh/bun
Length of output: 4997
🏁 Script executed:
# Check if there are error handlers attached to other spawned processes in the file
rg '\.on\("error"' scripts/machine.mjs -n -B 3 -A 1Repository: oven-sh/bun
Length of output: 37
🏁 Script executed:
# Search for the spawnSafe function definition
rg "function spawnSafe|const spawnSafe|export.*spawnSafe" scripts/machine.mjs -A 10Repository: oven-sh/bun
Length of output: 488
🏁 Script executed:
# Get more context around line 1253 - read a larger block to see if nodeSpawn is actually there
sed -n '1200,1300p' scripts/machine.mjs | head -150Repository: oven-sh/bun
Length of output: 2894
🏁 Script executed:
# Check imports at the top of the file
head -50 scripts/machine.mjsRepository: oven-sh/bun
Length of output: 975
🏁 Script executed:
# Search for nodeSpawn usage
rg "nodeSpawn" scripts/machine.mjs -n -B 2 -A 2Repository: oven-sh/bun
Length of output: 37
🏁 Script executed:
# Search for the spawn and spawnSafe definitions in utils.mjs
rg "function spawn|export.*spawn|const spawn" scripts/utils.mjs -A 15 | head -100Repository: oven-sh/bun
Length of output: 2770
🏁 Script executed:
# Look specifically at what spawnSafe does
rg "spawnSafe" scripts/utils.mjs -A 20 | head -80Repository: oven-sh/bun
Length of output: 1961
🏁 Script executed:
# Check if nodeSpawn is defined anywhere in the codebase
rg "nodeSpawn" . --type js --type mjs -nRepository: oven-sh/bun
Length of output: 82
🏁 Script executed:
# Look for the spawn export to understand what nodeSpawn refers to
grep -n "^export.*spawn\|^const spawn\|^function spawn" scripts/utils.mjsRepository: oven-sh/bun
Length of output: 422
🏁 Script executed:
# Look for nodeSpawn import in utils.mjs
head -100 scripts/utils.mjs | grep -n "nodeSpawn\|import.*spawn"Repository: oven-sh/bun
Length of output: 141
🏁 Script executed:
# Search for nodeSpawn definition in utils.mjs
rg "nodeSpawn|import.*spawn" scripts/utils.mjs | head -20Repository: oven-sh/bun
Length of output: 434
🏁 Script executed:
# Read the full spawn function to see how it handles errors
sed -n '237,340p' scripts/utils.mjsRepository: oven-sh/bun
Length of output: 2626
🏁 Script executed:
# Check git status and see if there are pending changes
cd scripts && git statusRepository: oven-sh/bun
Length of output: 275
🏁 Script executed:
# Check if there's a git diff that shows the actual changes
git diff scripts/machine.mjs | head -200Repository: oven-sh/bun
Length of output: 37
🏁 Script executed:
# See the git log to understand recent changes
git log --oneline scripts/machine.mjs | head -5Repository: oven-sh/bun
Length of output: 95
🏁 Script executed:
# Search for the exact pattern from the review comment
rg "const cancelled = false" scripts/machine.mjs -B 5 -A 10Repository: oven-sh/bun
Length of output: 37
🏁 Script executed:
# Search for signal forwarding pattern with SIGINT and SIGTERM
rg "forward = signal|received.*signal.*forwarding" scripts/machine.mjs -B 5 -A 10Repository: oven-sh/bun
Length of output: 37
🏁 Script executed:
# Search more broadly for the packer signal handling
rg "packer.*received.*forwarding|child\.kill\(signal\)" scripts/machine.mjs -B 10 -A 10Repository: oven-sh/bun
Length of output: 37
🏁 Script executed:
# Check if there's a different branch or stashed changes
git stash list
git branch -aRepository: oven-sh/bun
Length of output: 152
🏁 Script executed:
# Check if there's an uncommitted draft or stage area
git diff --cached scripts/machine.mjs | head -100Repository: oven-sh/bun
Length of output: 37
🏁 Script executed:
# Look for any pattern matching the review comment more carefully
# Search for "packer.*init\|packer.*build" to understand Packer invocation
rg "packer.*init|packer.*build" scripts/machine.mjs -B 10 -A 20Repository: oven-sh/bun
Length of output: 2438
🏁 Script executed:
# Check what the review comment context implies - this looks like a PR review
# The snippet shows code that would replace the current spawnSafe pattern
# Let's verify the current state is still using spawnSafe
grep -n "await spawnSafe.*packer" scripts/machine.mjsRepository: oven-sh/bun
Length of output: 171
🏁 Script executed:
# Verify Node.js spawn error behavior - get the Node.js documentation context
# The key question: does subprocess "close" fire if spawn() itself fails?
# According to Node.js docs, if spawn fails to exec, it emits "error", not "close"
python3 << 'EOF'
# Document the Node.js child_process.spawn() behavior
behavior = {
"error_event": "Emitted when: spawn() cannot exec the command (ENOENT, EACCES, etc.)",
"error_examples": ["ENOENT (file not found)", "EACCES (permission denied)", "EISDIR (is a directory)"],
"close_event": "Emitted when: child process has exited AND stdio streams closed",
"close_vs_exit": "close fires after exit; multiple processes can share stdio",
"if_spawn_fails": "error event fires, close event does NOT fire",
"conclusion": "Listening only to 'close' will not catch spawn exec failures"
}
for key, value in behavior.items():
print(f"{key}: {value}")
EOFRepository: oven-sh/bun
Length of output: 522
Add error handler to catch spawn failures before waiting for process exit.
The code switches from spawnSafe() to raw nodeSpawn() to enable signal forwarding. However, if nodeSpawn() cannot exec Packer (e.g., file not found, permission denied), Node emits an "error" event instead of allowing the process to start. By only listening to the "close" event, spawn failures will not be caught.
Suggested fix
- const [code, signal] = await new Promise(done => child.on("close", (c, s) => done([c, s])));
+ const [code, signal] = await new Promise((resolve, reject) => {
+ child.once("error", reject);
+ child.once("close", (c, s) => resolve([c, s]));
+ });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/machine.mjs` around lines 1253 - 1271, The spawn can emit an "error"
event (e.g., ENOENT) which is not handled; update the logic around
nodeSpawn/child so you attach an "error" listener on child in addition to
"close", and ensure the Promise waits for either event: on "error" reject or
resolve with a non-zero code and the error signal/info, and on "close" resolve
as before. Also remove/cleanup the signal handlers and both listeners inside the
resolution path (references: nodeSpawn, child, forward, the
process.on("SIGINT"/"SIGTERM") handlers and the Promise that currently listens
only for "close") so you don't leak handlers after spawn failure or normal exit.
| pull_if_missing "ubuntu/squid:5.2-22.04_beta" | ||
| # Base images of the locally-built services above — pulled implicitly by | ||
| # `compose build`, listed here so a missing-layer pull surfaces in this step | ||
| # rather than as a slow first test. | ||
| pull_if_missing "postgres:15.13" | ||
| pull_if_missing "redis:8-alpine" |
There was a problem hiding this comment.
The cache-save manifest is now missing part of the warm set.
These images are now treated as required warmups here, but the docker save block at Lines 103-110 still archives only the old set. BUN_DOCKER_LOAD_CACHE=1 will therefore keep missing ubuntu/squid:5.2-22.04_beta, postgres:15.13, and redis:8-alpine, so later jobs still hit the network for exactly the images this change added.
💾 Minimal fix
docker save \
postgres:15 \
+ postgres:15.13 \
mysql:8.4 \
mysql:8.0 \
redis:7-alpine \
+ redis:8-alpine \
minio/minio:latest \
crossbario/autobahn-testsuite \
+ ubuntu/squid:5.2-22.04_beta \
-o "$CACHE_FILE"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/docker/prepare-ci.sh` around lines 81 - 86, The docker cache-save step
is missing the newly designated warmup images; update the docker save manifest
(the block that runs docker save when BUN_DOCKER_LOAD_CACHE=1) to include the
same images you added with pull_if_missing — specifically
"ubuntu/squid:5.2-22.04_beta", "postgres:15.13", and "redis:8-alpine" — so the
cache archive contains the full warm set and later jobs won’t re-pull those
images.
| await using proc = Bun.spawn({ | ||
| cmd: [bunExe(), join(repoRoot, "scripts", "build", "fetch-cli.ts"), "dep", "fake", repo, commit, dest, cache], | ||
| env: { ...bunEnv, BUN_BUILD_PREFETCH_DIR: prefetch }, | ||
| stdout: "pipe", | ||
| stderr: "pipe", | ||
| }); | ||
| const [stdout, stderr, exitCode] = await Promise.all([proc.stdout.text(), proc.stderr.text(), proc.exited]); | ||
|
|
||
| expect(stderr).toBe(""); | ||
| expect(stdout).toContain("using prefetch cache"); | ||
| expect(await Bun.file(join(dest, "hello.txt")).text()).toBe("hi\n"); | ||
| expect(exitCode).toBe(0); |
There was a problem hiding this comment.
These exact stderr === "" assertions will fail on ASAN shards.
All four tests spawn bunExe() with bunEnv, so debug ASAN builds can still emit the known WARNING: ASAN interferes... line on stderr. That makes these assertions fail even when the prefetch behavior is correct; here the stable regression guards are stdout, filesystem state, and exit code.
Based on learnings, bunExe() subprocesses in debug ASAN builds can emit a known warning on stderr even with bunEnv, so empty-stderr assertions are intentionally avoided in this pattern.
Also applies to: 59-79, 99-116, 145-160
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/integration/build-prefetch/prefetch.test.ts` around lines 37 - 48, The
tests assert exact empty stderr (expect(stderr).toBe("")) which flakily fails on
ASAN shards because bun subprocesses started via bunExe() with bunEnv may emit
known ASAN warnings; remove or relax those exact-empty-stderr assertions in the
prefetch test blocks (the Bun.spawn usage that defines proc, stdout, stderr,
exitCode and checks stdout, file contents and exitCode) and instead rely on the
stable assertions (stdout contains "using prefetch cache", filesystem check
Bun.file(...).text() === "hi\n", and exitCode === 0); apply the same change to
the other test blocks noted (the other Bun.spawn sections around lines 59-79,
99-116, 145-160).
There was a problem hiding this comment.
I didn't find any bugs, but this changes how release binaries are built (Zig→bitcode LTO across the C++ boundary) and bumps the pinned Zig compiler commit — worth a human sanity check before it ships.
Extended reasoning...
Overview
This PR enables full LTO for bun-zig.o by emitting it as LLVM bitcode (obj.lto = .full + obj.use_lld = true) so it participates in the same link-time optimization pass as the C++ side. Changes touch build.zig (new -Dlto option wired into BunBuildOptions and configureObj) and scripts/build/zig.ts (new zigLto() gate, -Dlto= build arg, codegenThreads() forced to 1 under LTO, and a ZIG_COMMIT_PARALLEL bump to pick up the EnableSplitLTOUnit/module-summary fix).
Security risks
None identified. This is build-system configuration; no runtime code paths, auth, crypto, or user-input handling are touched. The bumped Zig commit is from oven-sh's own fork.
Level of scrutiny
High. Although the diff is small (~30 lines) and mechanically clean, it changes the object format emitted for every shipped release binary and enables cross-language inlining that eliminates 42–100% of boundary symbols per the PR description. LTO across a Zig/C++ FFI boundary is the kind of thing that can surface latent ODR/visibility/UB issues only at link or runtime, and it also bumps the pinned compiler — both are production-critical build-pipeline decisions a human owner should sign off on.
Other factors
The logic itself looks correct: zigLto is properly gated on usingParallelCompiler (so the older Windows-host compiler is unaffected), addMultiCheck hardcodes lto = false for check-only steps, and codegenThreads() returning 1 under LTO keeps zigObjectPaths() and the no_merge_shards path consistent. CI status is mostly infrastructure failures ("Failed to create agent") unrelated to this change, so a green run hasn't yet validated the new bitcode path across all targets.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
build.zig (1)
204-216:⚠️ Potential issue | 🟠 MajorStill reject
-Dno_llvm=truetogether with-Dlto=true.These flags are still accepted independently, so the build can reach
configureObj()withobj.use_llvm = falseandobj.lto = .full. Fail fast here instead of constructing that contradictory configuration.Suggested fix
const no_llvm = b.option(bool, "no_llvm", "Experiment with Zig self hosted backends. No stability guaranteed") orelse false; const lto = b.option(bool, "lto", "Emit LLVM bitcode for full LTO instead of a native object") orelse false; +if (no_llvm and lto) { + std.debug.panic("`-Dlto=true` requires the LLVM backend", .{}); +}Does Zig support setting `obj.lto = .full` while `obj.use_llvm = false` / `-Dno_llvm=true` in `build.zig`?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@build.zig` around lines 204 - 216, The build accepts a contradictory option combination (no_llvm true with lto true) and can reach configureObj() with BunBuildOptions where obj.use_llvm = false and obj.lto = .full; add an early validation after reading the options (no_llvm and lto) and before constructing/using build_options to fail fast: detect if no_llvm && lto then emit a clear error message and stop configuration (e.g., process exit or b.addError/return) so the contradictory BunBuildOptions state is never created; reference the no_llvm and lto flags, the BunBuildOptions construction, and configureObj()/obj.use_llvm/obj.lto when locating where to add the check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@build.zig`:
- Around line 204-216: The build accepts a contradictory option combination
(no_llvm true with lto true) and can reach configureObj() with BunBuildOptions
where obj.use_llvm = false and obj.lto = .full; add an early validation after
reading the options (no_llvm and lto) and before constructing/using
build_options to fail fast: detect if no_llvm && lto then emit a clear error
message and stop configuration (e.g., process exit or b.addError/return) so the
contradictory BunBuildOptions state is never created; reference the no_llvm and
lto flags, the BunBuildOptions construction, and
configureObj()/obj.use_llvm/obj.lto when locating where to add the check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 433e7a3c-4d72-477c-a323-d65c655bb9ee
📥 Commits
Reviewing files that changed from the base of the PR and between 2a376e285751bc6dd140341d8c0ef1e4dc95a236 and d963aa4.
📒 Files selected for processing (2)
build.zigscripts/build/zig.ts
The split existed because COFF shard emission isn't implemented, but codegenThreads() already forces 1 on Windows so sharding never happens. Parallel sema is deterministic and works on all hosts.
| `-Denable_fuzzilli=${bool(cfg.fuzzilli)}`, | ||
| `-Denable_valgrind=${bool(cfg.valgrind)}`, | ||
| `-Denable_tinycc=${bool(cfg.tinycc)}`, | ||
| `-Dlto=${bool(cfg.lto)}`, |
There was a problem hiding this comment.
🟡 The comment at lines 446-448 still says "MUST be 0 on the stable compiler — see codegenThreads()", but this PR removed the stable/parallel compiler split and codegenThreads() no longer ever returns 0 (its minimum is now 1). Consider updating this comment to drop the stale "stable compiler" / "0" reference.
Extended reasoning...
What the issue is
The comment immediately following the new -Dlto=${bool(cfg.lto)} line in zigBuildArgs() reads:
// Sharded LLVM codegen — one shard per host core on the parallel
// compiler. Zig has no "auto" value (0 = single-threaded). MUST be 0
// on the stable compiler — see codegenThreads().
`-Dllvm_codegen_threads=${codegenThreads(cfg)}`,This comment is now stale documentation that this PR's own changes orphaned.
Why it's stale
This PR collapsed the ZIG_COMMIT / ZIG_COMMIT_PARALLEL split into a single ZIG_COMMIT constant and deleted:
usingParallelCompiler()defaultZigCommit()- the
if (!usingParallelCompiler(cfg)) return 0;branch at the top ofcodegenThreads()
After these removals, codegenThreads() (lines 54-66) returns at minimum 1 — never 0. There is no longer a "stable compiler" vs "parallel compiler" distinction anywhere in the codebase, and the function the comment tells the reader to consult no longer contains any logic about 0 or a stable compiler.
Step-by-step proof
- Before this PR,
codegenThreads()started withif (!usingParallelCompiler(cfg)) return 0;— the comment's "MUST be 0 on the stable compiler" pointed at that line. - The diff removes that line:
- if (!usingParallelCompiler(cfg)) return 0;and replaces it with nothing (the function now starts withif (cfg.windows) return 1;). - The diff also removes the entire "TEMPORARY SPLIT" doc block,
ZIG_COMMIT_PARALLEL,defaultZigCommit(), andusingParallelCompiler()— so "the stable compiler" no longer refers to anything that exists. - The comment at 446-448 was not touched and still tells the reader "MUST be 0 on the stable compiler — see codegenThreads()". A reader following that pointer will find no such logic.
Impact
No behavioral impact — this is purely misleading documentation. A future reader trying to understand why -Dllvm_codegen_threads is set the way it is will be sent looking for a "stable compiler" branch that no longer exists.
How to fix
Update the comment to reflect the current state, e.g.:
// Sharded LLVM codegen — see codegenThreads() for when sharding is
// gated off (Windows, LTO, non-ASAN CI). Zig has no "auto" value.Or simply drop the second sentence entirely, since codegenThreads()'s own doc comment now fully explains the gating.
Since 02fbd62 (#29618, enable LTO for bun-zig.o), `linux aarch64 - build-bun` OOMs on `r8g.xlarge`: lld's full-LTO link now ingests ~1.39 GB of bitcode (794 MB JSC + 347 MB libbun-profile + 247 MB bun-zig.o), peak memory crosses the 31.5 GiB available (no swap) about 18 min in, and the kernel OOM-kills lld — sometimes taking the BuildKite agent with it (`exit:-1`). This was already failing on the PR's own CI (build 47390) before merge. | target | instance | bitcode in | result | |---|---|---|---| | linux x64-glibc | r7i.xlarge (4c/32G) | 1392 MB | pass, 18.98 min | | linux aarch64-musl | r8g.xlarge (4c/32G) | 1310 MB | pass, 18.15 min | | **linux aarch64-glibc** | r8g.xlarge (4c/32G) | 1388 MB | **Killed, 18.75 min** | x64 has slightly *more* bitcode and survives, so the tipping factor is LLVM's AArch64 backend holding more state during codegen than X86 — not a clang bug, just expected full-LTO scaling. x64 is one bitcode-adding commit away from the same fate. Bump both linux link boxes to `2xlarge` (8 vCPU / 64 GiB). The extra cores also help lld's parallel LTO codegen phase (the single-threaded merge/opt half won't benefit, so expect ~14–15 min rather than half of 19). Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>
Emit `bun-zig.o` as LLVM bitcode (`obj.lto = .full`) so it participates in the same LTO link as the C/C++ side, instead of being a native ELF object that lld can only link, not optimize across. ## Why this didn't work before The only documented Zig→bitcode path was `-Dobj_format=bc`, which routes through `getEmittedLlvmBc()` → `-femit-llvm-bc`. That writes Zig's **self-hosted** unoptimized bitcode (Producer `"zig 0.15.2"`) before libLLVM ever touches it, and the self-hosted writer has encoding bugs for large modules → `Invalid record` in lld. The proper `-flto` path was blocked by `use_lld = false` on Linux causing `LtoRequiresLld` or silent fallback. This PR uses `obj.lto = .full` + `obj.use_lld = true`, which routes through libLLVM's `WriteBitcodeToFile` after the LTO pre-link O3 pipeline. The output (Producer `"LLVM20.1.2"`) is forward-compatible with lld 21/22. Bumps `ZIG_COMMIT_PARALLEL` to pick up oven-sh/zig@04e7f6ac1e, which adds `EnableSplitLTOUnit=1` + a module summary to Zig's LTO bitcode — required for `-fwhole-program-vtables` to accept the link (otherwise: `inconsistent LTO Unit splitting`). ## What gets inlined | Boundary | Declared | Eliminated | % | |---|---|---|---| | Zig `export fn` → C++ | 336 | 142 | 42% | | C `us_*` (usockets) ← Zig | 115 | 79 | 69% | | C++ `uws_*` (uWebSockets wrappers) ← Zig | 108 | 76 | 70% | | `mi_free` | — | all | 100% (0 call insns) | Verified by disassembly: e.g. `Bun__readOriginTimer` body (optional check, `clock_gettime`, ns math) appears directly inside C++ `Process_functionHRTime`; symbol is gone. ## Measured impact (linux-x64, vs latest canary) | | LTO | no Zig LTO | Δ | |---|---|---|---| | `Bun.escapeHTML` | 171.3 ns | 183.2 ns | **6.5%** | | `TextDecoder.decode` | 104.0 ns | 106.8 ns | **2.6%** | | `oha -n 1M -c 50` | ~200,600 req/s | ~193,800 req/s | **3.5%** | Caveat: canary is a slightly different revision; same-revision A/B would be cleaner. ## Notes - `zigLto(cfg)` gates on `usingParallelCompiler` — the older Windows `ZIG_COMMIT` predates the summary fix. - `codegenThreads()` returns 1 when LTO is on (zig_llvm.cpp gates SplitModule on `!lto`). --------- Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>
Since 02fbd62 (oven-sh#29618, enable LTO for bun-zig.o), `linux aarch64 - build-bun` OOMs on `r8g.xlarge`: lld's full-LTO link now ingests ~1.39 GB of bitcode (794 MB JSC + 347 MB libbun-profile + 247 MB bun-zig.o), peak memory crosses the 31.5 GiB available (no swap) about 18 min in, and the kernel OOM-kills lld — sometimes taking the BuildKite agent with it (`exit:-1`). This was already failing on the PR's own CI (build 47390) before merge. | target | instance | bitcode in | result | |---|---|---|---| | linux x64-glibc | r7i.xlarge (4c/32G) | 1392 MB | pass, 18.98 min | | linux aarch64-musl | r8g.xlarge (4c/32G) | 1310 MB | pass, 18.15 min | | **linux aarch64-glibc** | r8g.xlarge (4c/32G) | 1388 MB | **Killed, 18.75 min** | x64 has slightly *more* bitcode and survives, so the tipping factor is LLVM's AArch64 backend holding more state during codegen than X86 — not a clang bug, just expected full-LTO scaling. x64 is one bitcode-adding commit away from the same fate. Bump both linux link boxes to `2xlarge` (8 vCPU / 64 GiB). The extra cores also help lld's parallel LTO codegen phase (the single-threaded merge/opt half won't benefit, so expect ~14–15 min rather than half of 19). Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>
Emit `bun-zig.o` as LLVM bitcode (`obj.lto = .full`) so it participates in the same LTO link as the C/C++ side, instead of being a native ELF object that lld can only link, not optimize across. ## Why this didn't work before The only documented Zig→bitcode path was `-Dobj_format=bc`, which routes through `getEmittedLlvmBc()` → `-femit-llvm-bc`. That writes Zig's **self-hosted** unoptimized bitcode (Producer `"zig 0.15.2"`) before libLLVM ever touches it, and the self-hosted writer has encoding bugs for large modules → `Invalid record` in lld. The proper `-flto` path was blocked by `use_lld = false` on Linux causing `LtoRequiresLld` or silent fallback. This PR uses `obj.lto = .full` + `obj.use_lld = true`, which routes through libLLVM's `WriteBitcodeToFile` after the LTO pre-link O3 pipeline. The output (Producer `"LLVM20.1.2"`) is forward-compatible with lld 21/22. Bumps `ZIG_COMMIT_PARALLEL` to pick up oven-sh/zig@04e7f6ac1e, which adds `EnableSplitLTOUnit=1` + a module summary to Zig's LTO bitcode — required for `-fwhole-program-vtables` to accept the link (otherwise: `inconsistent LTO Unit splitting`). ## What gets inlined | Boundary | Declared | Eliminated | % | |---|---|---|---| | Zig `export fn` → C++ | 336 | 142 | 42% | | C `us_*` (usockets) ← Zig | 115 | 79 | 69% | | C++ `uws_*` (uWebSockets wrappers) ← Zig | 108 | 76 | 70% | | `mi_free` | — | all | 100% (0 call insns) | Verified by disassembly: e.g. `Bun__readOriginTimer` body (optional check, `clock_gettime`, ns math) appears directly inside C++ `Process_functionHRTime`; symbol is gone. ## Measured impact (linux-x64, vs latest canary) | | LTO | no Zig LTO | Δ | |---|---|---|---| | `Bun.escapeHTML` | 171.3 ns | 183.2 ns | **6.5%** | | `TextDecoder.decode` | 104.0 ns | 106.8 ns | **2.6%** | | `oha -n 1M -c 50` | ~200,600 req/s | ~193,800 req/s | **3.5%** | Caveat: canary is a slightly different revision; same-revision A/B would be cleaner. ## Notes - `zigLto(cfg)` gates on `usingParallelCompiler` — the older Windows `ZIG_COMMIT` predates the summary fix. - `codegenThreads()` returns 1 when LTO is on (zig_llvm.cpp gates SplitModule on `!lto`). --------- Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>
Since 02fbd62 (oven-sh#29618, enable LTO for bun-zig.o), `linux aarch64 - build-bun` OOMs on `r8g.xlarge`: lld's full-LTO link now ingests ~1.39 GB of bitcode (794 MB JSC + 347 MB libbun-profile + 247 MB bun-zig.o), peak memory crosses the 31.5 GiB available (no swap) about 18 min in, and the kernel OOM-kills lld — sometimes taking the BuildKite agent with it (`exit:-1`). This was already failing on the PR's own CI (build 47390) before merge. | target | instance | bitcode in | result | |---|---|---|---| | linux x64-glibc | r7i.xlarge (4c/32G) | 1392 MB | pass, 18.98 min | | linux aarch64-musl | r8g.xlarge (4c/32G) | 1310 MB | pass, 18.15 min | | **linux aarch64-glibc** | r8g.xlarge (4c/32G) | 1388 MB | **Killed, 18.75 min** | x64 has slightly *more* bitcode and survives, so the tipping factor is LLVM's AArch64 backend holding more state during codegen than X86 — not a clang bug, just expected full-LTO scaling. x64 is one bitcode-adding commit away from the same fate. Bump both linux link boxes to `2xlarge` (8 vCPU / 64 GiB). The extra cores also help lld's parallel LTO codegen phase (the single-threaded merge/opt half won't benefit, so expect ~14–15 min rather than half of 19). Co-authored-by: root <root@ip-10-0-2-234.us-west-2.compute.internal>
Emit
bun-zig.oas LLVM bitcode (obj.lto = .full) so it participates in the same LTO link as the C/C++ side, instead of being a native ELF object that lld can only link, not optimize across.Why this didn't work before
The only documented Zig→bitcode path was
-Dobj_format=bc, which routes throughgetEmittedLlvmBc()→-femit-llvm-bc. That writes Zig's self-hosted unoptimized bitcode (Producer"zig 0.15.2") before libLLVM ever touches it, and the self-hosted writer has encoding bugs for large modules →Invalid recordin lld. The proper-fltopath was blocked byuse_lld = falseon Linux causingLtoRequiresLldor silent fallback.This PR uses
obj.lto = .full+obj.use_lld = true, which routes through libLLVM'sWriteBitcodeToFileafter the LTO pre-link O3 pipeline. The output (Producer"LLVM20.1.2") is forward-compatible with lld 21/22.Bumps
ZIG_COMMIT_PARALLELto pick up oven-sh/zig@04e7f6ac1e, which addsEnableSplitLTOUnit=1+ a module summary to Zig's LTO bitcode — required for-fwhole-program-vtablesto accept the link (otherwise:inconsistent LTO Unit splitting).What gets inlined
export fn→ C++us_*(usockets) ← Ziguws_*(uWebSockets wrappers) ← Zigmi_freeVerified by disassembly: e.g.
Bun__readOriginTimerbody (optional check,clock_gettime, ns math) appears directly inside C++Process_functionHRTime; symbol is gone.Measured impact (linux-x64, vs latest canary)
Bun.escapeHTMLTextDecoder.decodeoha -n 1M -c 50Caveat: canary is a slightly different revision; same-revision A/B would be cleaner.
Notes
zigLto(cfg)gates onusingParallelCompiler— the older WindowsZIG_COMMITpredates the summary fix.codegenThreads()returns 1 when LTO is on (zig_llvm.cpp gates SplitModule on!lto).