Skip to content

Fix crash when starting a compiled executable in a deleted cwd#31496

Merged
Jarred-Sumner merged 5 commits into
mainfrom
farm/44564ba3/fix-deleted-cwd-standalone-crash
May 29, 2026
Merged

Fix crash when starting a compiled executable in a deleted cwd#31496
Jarred-Sumner merged 5 commits into
mainfrom
farm/44564ba3/fix-deleted-cwd-standalone-crash

Conversation

@robobun

@robobun robobun commented May 28, 2026

Copy link
Copy Markdown
Collaborator

What

Starting a bun build --compile standalone executable from a directory that has been deleted crashes with a segfault instead of a clean error:

❯ claude --dangerously-skip-permissions --continue
Transpiler: init failed: error.ENOENT
...
panic(main thread): Segmentation fault at address 0x641

Reproduction

echo 'console.log("hi")' > app.js
bun build --compile app.js --outfile app
mkdir /tmp/gone && cd /tmp/gone && rmdir /tmp/gone && exec ./app   # adjust path

A process can keep a deleted directory as its cwd until the last fd to it closes, and getcwd() then returns ENOENT. (POSIX-only — Windows refuses to remove a directory that is a process's cwd.)

Cause

A standalone compiled binary bypasses Arguments::parse (no global flags, no baked exec-argv), so absolute_working_dir stays unset and the first getcwd of the entire startup happens inside Transpiler::initFileSystem::init (src/resolver/lib.rs:316). When the cwd is gone that getcwd returns ENOENT.

The per-VM init hook init_runtime_state (src/runtime/jsc_hooks.rs) had no error channel, so it swallowed the error, logged Transpiler: init failed, and left vm.transpiler as the zeroed bytes the allocator handed out. The next read of that field (configure_definesrun_env_loaderTranspiler::env_mut) dereferenced a null pointer → crash.

The plain bun <file> / bun -e / bun run paths don't hit this: Arguments::parse calls getcwd first and surfaces the ENOENT cleanly, masking the bug — so only compiled executables crash.

Fix

Widen init_runtime_state to return Result<RuntimeState, bun_core::Error>. On Transpiler::init failure it unwinds the per-VM state it set up (the RuntimeState box, its thread-local slot, and the AST stores — mirroring deinit_runtime_state) and returns the error, which VirtualMachine::init propagates with ?. The CLI then turns it into a clean ENOENT message and exits 1 — the path the plain bun CLI already used, and what the reference (try Transpiler.init(...), VirtualMachine.zig:1241) intended. This also resolves the in-tree TODO(b2): widen init_runtime_state return to Result.

Verification

New test in test/bundler/bun-build-compile.test.ts compiles a standalone binary and runs it from a deleted cwd.

  • Before: the binary crashes — panic through boot_standaloneTranspiler::env_mut, terminates via SIGILL (exit 132).
  • After: stdout empty, stderr contains ENOENT, exit code 1.

A standalone `bun build --compile` binary bypasses `Arguments::parse`
(no global flags, no baked exec-argv), so `absolute_working_dir` stays
unset and the first `getcwd` of the entire startup happens inside
`Transpiler::init` (FileSystem::init). When the cwd has been deleted that
`getcwd` returns ENOENT.

The per-VM init hook `init_runtime_state` had no error channel, so it
swallowed the error, logged "Transpiler: init failed", and left
`vm.transpiler` as the zeroed bytes the allocator handed out. The next
read of that field (`configure_defines` -> `run_env_loader` ->
`env_mut`) dereferenced a null pointer and the process crashed — the
segfault seen when launching a compiled CLI from a directory that had
been removed.

Widen `init_runtime_state` to return `Result<_, Error>`. On
`Transpiler::init` failure it now unwinds the per-VM state it set up
(the RuntimeState box, its thread-local slot, and the AST stores) and
returns the error, which `VirtualMachine::init` propagates with `?`.
The CLI then turns it into a clean "ENOENT" message and exits 1 — the
same path the plain `bun` CLI already took, and what the reference
(`try Transpiler.init(...)`) intended.

Test compiles a standalone binary and runs it from a deleted cwd,
asserting a clean ENOENT + exit 1 rather than a crash.
@coderabbitai

coderabbitai Bot commented May 28, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@robobun, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 22 minutes and 28 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e6c7ca4b-fdbc-4c8a-9494-0eb1ba75803a

📥 Commits

Reviewing files that changed from the base of the PR and between b97cd70 and 0dc9ca6.

📒 Files selected for processing (1)
  • src/runtime/jsc_hooks.rs

Walkthrough

This PR refactors runtime state initialization to propagate errors instead of silently failing. RuntimeHooks::init_runtime_state now returns Result, with VirtualMachine::init propagating hook errors via ?. The hook implementation explicitly unwinds TLS and allocated state on error. A test validates that compiled binaries exit cleanly when the working directory is deleted before execution.

Changes

VM initialization error propagation

Layer / File(s) Summary
Hook signature and VM integration
src/jsc/VirtualMachine.rs
RuntimeHooks::init_runtime_state changes from returning opaque RuntimeState to Result<RuntimeState, bun_core::Error>. Documentation specifies that Err means allocations are already unwound and the hook's caller must propagate. VirtualMachine::init now uses ? operator on the hook call to abort initialization on error.
Error unwinding in init_runtime_state
src/runtime/jsc_hooks.rs
init_runtime_state implementation now returns Result and handles Transpiler::init failure by clearing RUNTIME_STATE TLS, dropping the RuntimeState allocation, deinitializing AST stores, and returning the error. Success path wraps the pointer in Ok(...).
Regression test for deleted working directory
test/bundler/bun-build-compile.test.ts
POSIX-only test verifies that a compiled binary exits cleanly with ENOENT in stderr when exec occurs after the working directory has been deleted, confirming error propagation prevents further VM code execution.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title directly describes the main fix: preventing a crash when a compiled executable starts in a deleted cwd, which is the primary objective of this PR.
Description check ✅ Passed The description includes both required template sections ('What' and 'How did you verify') with comprehensive details on the issue, cause, fix, and verification approach.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Status

Root cause: standalone --compile binaries bypass Arguments::parse, so the first getcwd of startup lands inside Transpiler::init. With a deleted cwd that returns ENOENT; the init_runtime_state hook swallowed it and left vm.transpiler zeroed → null deref at Transpiler::env_mut. Plain bun CLI masked this (it calls getcwd during arg-parsing first).

Fix: hook now returns Result, unwinds its own per-VM state on failure, and VirtualMachine::init propagates via ?. Clean ENOENT + exit 1 instead of a crash.

Gate: verified test fails on the unfixed build (SIGILL, exit 132) and passes with the fix (exit 1) under bun bd (ASAN).

Waiting on CI.

@robobun

robobun commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 12:15 AM PT - May 28th, 2026

@robobun, your commit 0dc9ca6 has 1 failures in Build #58597 (All Failures):

  • 📦 Binary size — 15 over 0.50 MB
  • targetthis build canary: main #58579
    sizeΔ
    bun-darwin-aarch6455.92 MB55.12 MB+822.3 KB
    bun-darwin-x6461.14 MB60.36 MB+799.8 KB
    bun-linux-aarch6469.36 MB68.55 MB+832.0 KB
    bun-linux-x6471.30 MB70.48 MB+832.0 KB
    bun-linux-x64-baseline70.36 MB69.55 MB+832.0 KB
    bun-linux-aarch64-musl62.73 MB61.98 MB+768.0 KB
    bun-linux-x64-musl65.62 MB64.83 MB+816.0 KB
    bun-linux-x64-musl-baseline64.97 MB64.17 MB+816.0 KB
    bun-linux-aarch64-android77.76 MB76.95 MB+832.0 KB
    bun-linux-x64-android79.74 MB78.94 MB+816.0 KB
    bun-freebsd-x6482.26 MB81.46 MB+816.0 KB
    bun-freebsd-aarch6484.24 MB83.45 MB+816.0 KB
    bun-windows-x6474.38 MB73.59 MB+804.0 KB
    bun-windows-x64-baseline73.40 MB72.60 MB+810.0 KB
    bun-windows-aarch6469.74 MB68.94 MB+814.0 KB

    Add [skip size check] to the commit message if this increase is intentional.


🧪   To try this PR locally:

bunx bun-pr 31496

That installs a local version of the PR into your bun-31496 executable, so you can run:

bun-31496 --bun

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/jsc/VirtualMachine.rs`:
- Around line 1656-1663: VirtualMachine::init currently installs
VM/MAIN_THREAD_VM and allocates pre-hook resources (log, console) before calling
the unsafe init_runtime_state hook, and on Err it returns without undoing those
installs; introduce an RAII guard struct (e.g., InitGuard) created immediately
after VM/MAIN_THREAD_VM and pre-hook allocations that on Drop clears the
thread-local/global VM pointers (MAIN_THREAD_VM/VM) and frees/releases the
log/console allocations, then convert the call to init_runtime_state to use
map_err or let-binding so that on Err the guard runs and cleanup happens before
propagating the error; apply the same pattern for the second failing path
referenced (the other init() section around the other hook).

In `@test/bundler/bun-build-compile.test.ts`:
- Around line 563-575: Replace the shell wrapper that runs "/bin/sh -c" (the
Bun.spawn call creating proc with cmd ["/bin/sh", "-c", ...]) with an inline
Bun-based helper: use bunExe() with "-e" to run a short script that does
process.chdir(gone) then removes the directory, then spawn the compiled binary
(outfile) directly (e.g. Bun.spawnSync([outfile]) so the child inherits the
deleted cwd) instead of invoking a shell; update the variables currently used
(proc, bunEnv, outfile, gone) accordingly and remove reliance on
shell-quoting-sensitive command strings.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bbbff726-114b-48e8-a82b-c4751b3d629d

📥 Commits

Reviewing files that changed from the base of the PR and between 3ec6669 and b97cd70.

📒 Files selected for processing (3)
  • src/jsc/VirtualMachine.rs
  • src/runtime/jsc_hooks.rs
  • test/bundler/bun-build-compile.test.ts

Comment thread src/jsc/VirtualMachine.rs
Comment thread test/bundler/bun-build-compile.test.ts
@robobun

robobun commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

CI status

Build #58582: 72 lanes green (including debian-13-x64-asan-test-bun, which runs the new test under ASAN). The only red lane is package-binary-size.

The diff is ~30 lines across two source files (a Result return type on init_runtime_state + a small error-cleanup block) — it cannot account for the 0.5 MB growth threshold that lane trips on. The size delta is baseline drift / link-time jitter against the moving main canary, not this change. autofix.ci's commit only reformatted the test file.

Pushed one ci: retrigger to re-measure against the current canary baseline. If it comes back green, this is ready; if package-binary-size is still the only red lane, it's flake unrelated to the diff.

Comment thread src/runtime/jsc_hooks.rs
The widened `init_runtime_state` makes `VirtualMachine::init` abort via
`?` when `Transpiler::init` fails, so a VM with a zeroed `transpiler`
never gets returned. `load_preloads` only runs via `reload_entry_point*`
on an already-constructed VM, so the `vm.transpiler.fs.is_null()` branch
(a temporary workaround for the old swallow-the-error behavior, which
referenced the now-removed `TODO(b2)`) is unreachable. Remove it and fix
the stale comment.
@robobun

robobun commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Review round addressed

Pushed 81bc75f:

  • claude[bot] nit (resolved): the PR deleted TODO(b2), but a now-dead vm.transpiler.fs.is_null() guard in load_preloads still referenced it. Traced all call sites — load_preloads only runs via reload_entry_point* on an already-Ok VM, so after the ?-propagation the guard is unreachable. Removed it and the stale comment.
  • coderabbit — RAII guard on the init error path (declined, explained inline): the VM/MAIN_THREAD_VM globals are set before the first ? in both the Rust and the Zig reference (which has no errdefer); init failure is process-fatal (Global::exit(1)), so nothing observes the partial VM; the VM/log/console are intentional process-lifetime leaks. A guard would diverge from the reference and guard an impossible re-entry.
  • coderabbit — replace /bin/sh wrapper with inline Bun (declined, explained inline): tried it — once the driver deletes its own cwd, Bun.spawnSync throws ENOENT: posix_spawn before the child starts, so it never exercises the bug. The repro fundamentally needs delete-then-exec (replace image), which is exactly the shell wrapper and exactly how a user hits it.

All three review threads resolved. Test still passes under ASAN (bun bd).

Comment thread src/runtime/jsc_hooks.rs
The comment above the `match Transpiler::init(...)` still claimed it
"returns `Err(Error::TODO)` until those surface, so the `Err` arm below
is the live path for now" — stale on two counts: `Transpiler::init` no
longer returns `Error::TODO` (it succeeds on the happy path), and this
change rewrote that `Err` arm to abort init, so the sentence now misreads
as "every VM init aborts". Drop the stale sentence; keep the still-accurate
`ptr::write` load-bearing note. Comment-only.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of my earlier nits are addressed and I didn't find any further issues, but since this touches VirtualMachine::init with unsafe pointer/TLS unwind logic on a hot startup path, I'd like a maintainer to sign off on the error-path cleanup ordering and the (well-argued) decision to skip an RAII guard around the pre-hook VM globals.

Extended reasoning...

Overview

This PR widens the RuntimeHooks::init_runtime_state hook signature from -> RuntimeState to -> Result<RuntimeState, bun_core::Error>, rewrites the Transpiler::init Err arm in src/runtime/jsc_hooks.rs to unwind per-VM state (clear RUNTIME_STATE TLS, heap::take the RuntimeState box, Store::deinit() the AST stores) and return Err, and has VirtualMachine::init propagate via ?. It also removes a now-dead fs.is_null() guard in load_preloads, refreshes two stale comments, and adds a POSIX-gated regression test in test/bundler/bun-build-compile.test.ts that compiles a standalone binary and execs it from a deleted cwd.

Security risks

None identified. The change converts a segfault into a clean ENOENT + exit 1. No new inputs, no auth/crypto/permissions surface, no user-controlled data flows into the new code path. The shell wrapper in the test interpolates only tempDir-generated paths, which the harness controls.

Level of scrutiny

High. VirtualMachine::init is on the startup path of every Bun process and every compiled binary. The new Err arm is hand-written unwind logic in unsafe Rust (raw-pointer heap::take, TLS slot manipulation, AST store teardown) that must mirror deinit_runtime_state correctly. The author also made a judgment call — declining CodeRabbit's RAII-guard suggestion for pre-hook VM/MAIN_THREAD_VM/log/console state — on the grounds of Zig-reference parity and the process-fatal nature of init failure. The reasoning is sound and well-cited, but it's the kind of architectural call a maintainer should confirm.

Other factors

  • Both of my prior inline nits (stale load_preloads guard referencing the deleted TODO(b2), and the stale "Err arm is the live path" comment above the rewritten match) were addressed in 81bc75f and 0dc9ca6; all four review threads on the PR are resolved.
  • The bug-hunting system found no issues.
  • The change resolves an explicit in-tree TODO(b2): widen init_runtime_state return to Result and matches the Zig reference's try Transpiler.init(...) semantics.
  • Regression test was verified to fail (SIGILL/exit 132) on the unfixed build and pass (exit 1, ENOENT in stderr) with the fix, including under ASAN.
  • CI: 72 lanes green; the musl-build red lanes are an unrelated LTO/data-layout toolchain failure ("Linking two modules of different data layouts"), not caused by this ~30-line diff.

The change looks correct to me, but "unsafe manual unwind in VM init" is squarely in the "critical code path" category where I should defer rather than approve unilaterally.

@robobun

robobun commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

Status

All four review threads are resolved; both claude[bot] nits fixed (81bc75f, 0dc9ca6), both coderabbit suggestions declined with rationale (and withdrawn by the bot). claude[bot]'s last note requests a maintainer sign-off on the init_runtime_state error-path unwind — deferring to a maintainer there, no code change requested.

CI build #58597 (latest push) is in progress with no failures so far — the ASAN/musl Rust+C++ compile lanes are green; the final build-bun link and the *-test-bun / package-binary-size lanes haven't reported yet.

The diff is a ~30-line source change (a Result return + comment refreshes) plus the regression test. A musl "different data layouts" LTO error, if it appears, is a toolchain/link-layout issue unrelated to this change — a source edit doesn't alter target data layouts. Will re-check once the build settles.

@robobun

robobun commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

CI is green except a flaky package-binary-size — ready for a maintainer to merge

Build #58597: every functional lane passed, including:

  • debian-13-x64-asan-test-bun (ASAN) — runs the new regression test ✅
  • debian-13-x64-test-bun
  • linux-aarch64-musl-build-bun / linux-x64-musl-build-bun ✅ (the earlier transient "different data layouts" LTO error did not recur)
  • all other *-build-bun / *-build-rust / *-build-cpp lanes ✅

The only red lane is package-binary-size, which fails if any platform binary grew > 0.5 MB vs the latest main build.

This is not caused by the diff:

  • The functional diff is ~30 lines (a Result return type + the error-path unwind) plus the test.
  • package-binary-size failed identically on the previous build and on my last commit (0dc9ca6), which is a pure comment edit — zero non-comment source lines changed, i.e. byte-identical machine code. A comment cannot move binary size by 0.5 MB.
  • The delta is baseline drift / link-time non-determinism against the moving main canary, not this PR.

I already spent my one automated re-roll on this same flake (the ci: retrigger commit) and it recurred, so I'm not going to keep re-rolling. The diff is green; this just needs a maintainer to merge (or to confirm the size check against a fresh canary). claude[bot] also asked for a maintainer to sign off on the init_runtime_state error-path unwind — both asks point to the same human-review step.

@Jarred-Sumner Jarred-Sumner merged commit 677fe1c into main May 29, 2026
78 of 80 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/44564ba3/fix-deleted-cwd-standalone-crash branch May 29, 2026 02:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants