Skip to content

Dedupe extracted embedded native modules in compiled binaries#29587

Open
robobun wants to merge 1 commit into
mainfrom
farm/17bdd035/dedupe-embedded-native-extracts
Open

Dedupe extracted embedded native modules in compiled binaries#29587
robobun wants to merge 1 commit into
mainfrom
farm/17bdd035/dedupe-embedded-native-extracts

Conversation

@robobun

@robobun robobun commented Apr 22, 2026

Copy link
Copy Markdown
Collaborator

Fixes #29585.

Repro

A bun build --compile binary that dlopen()s an embedded .so/.dylib (imported with { with: { type: "file" } }) leaked a fresh copy to /tmp per call, never cleaned up.

# libhello.so: 15 KB
./app   # dlopens libhello.so 5×
ls /tmp/.*.so
.5cbdbdcffd73f447-00000002.so
.5cbdbdcffd75f56f-00000004.so
.5cbdbdcffd79fddf-00000000.so
.5cbdbdcffd7bf55f-00000001.so
.5cbdbdcffd7df6ff-00000003.so

Workers amplify it because each Worker VM starts with its own tmpname_id_number counter at 0, so every Worker extracts on its first dlopen. Reporter's server saw ~14 GB/hour on Linux.

Cause

ModuleLoader.resolveEmbeddedFile picked a per-call unique tmpfile name:

// src/bun.js/ModuleLoader.zig (before)
const tmpfilename = bun.fs.FileSystem.tmpname(extname, buf, bun.hash(file.name));

tmpname mixes nanoTimestamp() with a per-VM counter, so each call produced a fresh path. Nothing tracked or unlinked the file.

Fix

  • Deterministic name from content hash. .bun-{bun.hash(file.contents)}.{ext}. Same content → same path → idempotent across dlopens, Workers, and process restarts.
  • Reuse existing on disk. First check fstatat; if a file exists with the expected size, use it.
  • Atomic write when extracting. Write to a scratch name, then rename(2) into the content-hashed destination. Concurrent workers writing at the same time still end up with one valid file — same content, so whichever rename wins is correct.
  • Per-File cache. StandaloneModuleGraph.File now holds extracted_path: ?[:0]const u8. Shared across Workers because standalone_module_graph is shared by pointer (see VirtualMachine.initWorker). A process-wide mutex guards the lazy init.

Verification

5 dlopens + 5 Workers (each dlopen once) × 2 runs:

case /tmp copies
before fix 20
after fix 1

Test at test/regression/issue/29585.test.ts fails pre-fix with Expected: 1 / Received: 10 and passes post-fix.

@robobun

robobun commented Apr 22, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 4:05 PM PT - Jun 16th, 2026

@robobun, your commit f54d3e1 has 1 failures in Build #62970 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29587

That installs a local version of the PR into your bun-29587 executable, so you can run:

bun-29587 --bun

@coderabbitai

coderabbitai Bot commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a nullable per-file extracted path field and implements mutex-serialized lazy extraction into a private per-user extract directory with content-hash deduplication and atomic placement; adds a Linux regression test verifying single-copy extraction across workers and restarts; minor Win32 API call reference change.

Changes

Cohort / File(s) Summary
Standalone module file metadata
src/StandaloneModuleGraph.zig
Added public field extracted_path: ?[:0]const u8 = null to StandaloneModuleGraph.File to cache an on-disk extracted absolute path (nullable, lazy).
Extraction and caching logic
src/bun.js/ModuleLoader.zig
Introduced extracted_path_lock, ExtractDir and helpers (openExtractDir, tryOpenExtractDir, joinExtractedPath). resolveEmbeddedFile now mutex-serializes cache init, validates file.extracted_path via stat, computes content-hash deterministic filenames in a per-user extract dir, reuses size-matching files, writes via scratch+atomic rename, and stores duplicated absolute path into file.extracted_path.
Regression test for native extraction
test/regression/issue/29585.test.ts
Added Linux-only regression test that builds a libhello.so, compiles a Bun native executable embedding it, runs the binary multiple times under a dedicated TMP dir to assert a single content-matching extracted .so is reused, and verifies deletion triggers correct re-extraction; skips if not Linux or compiler missing.
Win32 mkdir call reference
src/sys.zig
Replaced kernel32.CreateDirectoryW(...) with bun.windows.CreateDirectoryW(...) in mkdirA Windows path; behavior and arguments unchanged.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Dedupe extracted embedded native modules in compiled binaries' clearly summarizes the main change: deduplication of extracted native modules in compiled Bun binaries.
Description check ✅ Passed The PR description provides detailed explanation of the problem, cause, fix, and verification, exceeding the template requirements with comprehensive context.
Linked Issues check ✅ Passed The PR implementation addresses all coding requirements from #29585: deterministic naming, content-hash deduplication, atomic extraction, per-file caching, concurrent safety, and regression testing.
Out of Scope Changes check ✅ Passed All changes directly address issue #29585: StandaloneModuleGraph adds caching field, ModuleLoader implements deduplication logic, test validates the fix, and sys.zig change supports Windows compatibility.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread src/bun.js/ModuleLoader.zig Outdated
Comment thread src/bun.js/ModuleLoader.zig Outdated
Comment thread test/regression/issue/29585.test.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/29585.test.ts`:
- Around line 29-84: Both countExtractedCopies and removeExtractedCopies
duplicate the byte-by-byte matching logic; extract that into a small helper
(e.g., matchesBytes or fileMatchesBytes) that takes a file path (or Bun.File)
and the target Uint8Array and returns a boolean, and then call that helper from
both countExtractedCopies and removeExtractedCopies; implement the helper to
first check size/equality and then use a single buffer comparison
(Buffer.from(...).equals(...) or an equivalent fast comparison) to replace the
manual loop and reduce duplication.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8e7815d7-60ea-4fd9-a129-37c8a1c1de98

📥 Commits

Reviewing files that changed from the base of the PR and between a13252d and e243b95.

📒 Files selected for processing (1)
  • test/regression/issue/29585.test.ts

Comment thread test/regression/issue/29585.test.ts Outdated
Comment thread test/regression/issue/29585.test.ts Outdated
Comment thread test/regression/issue/29585.test.ts Outdated
robobun pushed a commit that referenced this pull request Apr 22, 2026
…test

Addresses review feedback on #29587:

- Security (CWE-377): the content-hashed tmpfile lived directly in /tmp,
  where any local user could pre-create `.bun-{hash}.{ext}` at the known
  name and plant a malicious shared library — the size-only reuse check
  would then hand that file to dlopen(). Extraction now goes into
  `{tmpdir}/bun-{uid}/`, which we create with mode 0700 (POSIX) and
  verify on every use (owner == uid, (mode & 0077) == 0, is-a-directory).
  Refuses to extract if the subdir can't be trusted. Opened with
  O_NOFOLLOW so a racing symlink can't redirect us.

- Stale cache: the fast path returned the cached path without checking
  it still existed. After systemd-tmpfiles swept the extracted file,
  every subsequent dlopen() in that process would fail with ENOENT for
  the rest of its lifetime. The fast path now stat()s the cached path
  and, on ENOENT, clears the cache and falls through to re-extract.

- Test hermeticity: the test read and rm'd files in the shared /tmp,
  which could race with concurrent runs or unrelated processes that
  happened to have a byte-identical .so there. It now points the
  spawned binary at a per-test `BUN_TMPDIR` so extraction stays
  sandboxed, and walks one level into that dir to find the
  `bun-{uid}/` subdir.

- Test: drop the brittle `expect(cc.stderr).toBe("")` — gcc/clang/ld
  can emit benign notes on success. Match the existing precedent in
  addr32.test.ts / garbage-env.test.ts (exit code only).

- Test: add a third run that deletes the extracted file mid-test to
  prove the cache self-heals.
Comment thread src/bun.js/ModuleLoader.zig Outdated
Comment thread src/bun.js/ModuleLoader.zig Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/ModuleLoader.zig`:
- Around line 167-174: The openExtractDir() implementation in ModuleLoader.zig
uses bun.sys.open(...) which on Windows loses O.NOFOLLOW semantics; change the
subdir_fd acquisition to branch on Environment.os and call
bun.sys.openDirAtWindowsA(bun.FD.cwd(), abs_z, .{ .no_follow = true, .read_only
= true }) when Environment.os == .windows, otherwise keep the existing
bun.sys.open(...) call, and preserve the current switch that maps .result to fd
and .err to return null so the error handling for subdir_fd remains identical.

In `@test/regression/issue/29585.test.ts`:
- Around line 55-62: The test currently serializes worker startup by awaiting
each worker's ready promise inside the creation loop; change it to first create
all Worker instances (using new Worker(import.meta.url) and pushing to workers
array), attach each worker's ready resolver via Promise.withResolvers and
w.addEventListener("message", () => resolve(), { once: true }) but do NOT await
inside the loop — instead collect all the per-worker promises and after the loop
call await Promise.all(collectedPromises) so all dlopen() calls run concurrently
and the test exercises real concurrent worker extraction and atomic write+rename
convergence.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 03f648b5-ff32-4c02-bebc-44b569cf40cd

📥 Commits

Reviewing files that changed from the base of the PR and between 9a66cbb and dc12e50.

📒 Files selected for processing (3)
  • src/bun.js/ModuleLoader.zig
  • src/sys.zig
  • test/regression/issue/29585.test.ts

Comment thread src/bun.js/ModuleLoader.zig Outdated
Comment thread test/regression/issue/29585.test.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/ModuleLoader.zig`:
- Around line 174-182: The mkdir call using bun.sys.mkdirA(abs_z, 0o700) is
vulnerable to the caller's umask; after creating the directory you must
explicitly enforce owner rwx (0700) rather than relying on the mode argument.
Update the code around the mkdir switch to, on POSIX success (and also when
EXIST path is accepted), call the appropriate API to set the mode to 0o700
(e.g., bun.sys.chmodA(abs_z, 0o700) or open+bun.sys.fchmod on the created
directory) and re-check permissions/ownership; apply the same change to the
other occurrence mentioned (lines ~203-214) so the directory always ends up with
owner read/write/execute regardless of umask.
- Around line 49-61: The fast-path that returns file.extracted_path must also
validate the file size like the fstatat() branch: after bun.sys.stat(cached)
succeeds, perform the same size check (compare the stat result's st_size to the
expected size stored on the File/Module struct — e.g. file.expected_size or
file.extracted_size used by the fstatat branch) and only memcpy/return path_buf
if sizes match; otherwise free(cached) and set file.extracted_path = null to
fall through to the recreate path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8aea0577-83a7-412c-812b-0fc79d1e274e

📥 Commits

Reviewing files that changed from the base of the PR and between dc12e50 and 69b53db.

📒 Files selected for processing (2)
  • src/bun.js/ModuleLoader.zig
  • test/regression/issue/29585.test.ts

Comment thread src/bun.js/ModuleLoader.zig Outdated
Comment thread src/bun.js/ModuleLoader.zig Outdated
Comment thread test/regression/issue/29585.test.ts
Comment thread src/jsc/ModuleLoader.zig Outdated
Comment thread src/bun.js/ModuleLoader.zig Outdated
Comment thread src/bun.js/ModuleLoader.zig Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my prior feedback has been addressed and I have no further findings, but this rewrites security-sensitive tempfile/dlopen path resolution (CWE-377 threat model on shared hosts) and warrants a human reviewer's sign-off. Also note build-zig is currently red on f795e58.

Extended reasoning...

Overview

This PR rewrites resolveEmbeddedFile in src/bun.js/ModuleLoader.zig to deduplicate extracted embedded native modules from bun build --compile binaries. It introduces a per-user 0700 extract directory under tmpdir, content-hashed filenames, an in-process mutex-guarded cache on StandaloneModuleGraph.File, atomic scratch-write+rename, fast-path revalidation via lstat with uid/size/type checks, and a randomized fallback directory when the canonical path is squatted. It also adds a Linux regression test and a one-line sys.zig change routing Windows mkdirA through bun.windows.CreateDirectoryW.

Security risks

The core of this change is a CWE-377 (predictable tempfile) defense for code that hands a filesystem path directly to dlopen(). The threat model — another local user pre-creating predictable paths in shared /tmp, symlink races, sticky-bit DoS via squatting, systemd-tmpfiles aging out the extract dir, setuid euid/ruid mismatch — was iterated on across several review rounds and the current revision addresses each: per-user 0700 subdir opened with O_NOFOLLOW, ownership/mode verified via fstat, geteuid() used for ownership comparison, fast-path revalidation checks st.uid/IFREG/size via lstat, and a randomized fallback dir avoids the squatting DoS. I have no remaining findings, but the consequence of a miss here is local-attacker RCE via dlopen, so the threat-model reasoning deserves a second pair of human eyes.

Level of scrutiny

High. This is production-critical native-module loading that runs in every compiled binary using embedded .so/.node files, and it directly governs which on-disk path is passed to dlopen(). The logic is ~180 new lines with platform branches (POSIX vs Windows) and subtle ordering requirements (mkdir → open NOFOLLOW → fchmod → fstat verify).

Other factors

  • All 10+ prior inline comments (from me and CodeRabbit) are marked resolved and the diff reflects the fixes.
  • CI (build-zig) is failing across all platforms on commit f795e58; the only commit since (df142df) is a doc-comment move, so the build is likely still red and needs investigation before merge.
  • The regression test is Linux-only and hermetic (isolated BUN_TMPDIR), but Windows/macOS paths are untested.

Comment thread src/jsc/ModuleLoader.zig Outdated
robobun pushed a commit that referenced this pull request Apr 22, 2026
The `readdirSync(tmpdir).toBeEmpty()` check was verifying that
BunProcess.cpp's `deleteAfter` successfully unlinked the extracted
.node. #29587 drops `deleteAfter` on POSIX because the file is now
content-hashed and cached — so an empty tmpdir isn't the right assertion
anymore. My replacement (uniqueness of content hashes per extracted
file) was redundant with test/regression/issue/29585.test.ts, which
already exercises the dedup logic end-to-end.

Just drop the FS-state check here. The existing `stdout === "hello
world!"` assertion is enough — it proves the compiled binary loaded
its embedded addons successfully. The dedup-correctness is covered by
29585.test.ts.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No further issues from my side — all the earlier CWE-377/tempfile concerns look addressed. Given this is security-sensitive dlopen/tempfile handling that's been through several hardening iterations, I'd like a human to give it a final look (and confirm the two x64-asan CI failures on 0ac4c97 are unrelated).

Extended reasoning...

Overview

This PR rewrites resolveEmbeddedFile in src/bun.js/ModuleLoader.zig to deduplicate extracted embedded native modules (.so/.node) from bun build --compile binaries. It introduces a per-user 0700 {tmpdir}/bun-{euid}/ subdirectory, content-hashed filenames, atomic scratch-then-rename extraction, an in-process File.extracted_path cache guarded by a global mutex, and a random-named fallback directory if the canonical one is squatted. It also drops the POSIX deleteAfter unlink in BunProcess.cpp, adds a one-line sys.zig change, updates test/napi/napi.test.ts, and adds a Linux regression test.

Security risks

This is squarely security-sensitive: the output path is handed straight to dlopen(), so any weakness in the trust check is local RCE. Across the review iterations the design moved from a predictable /tmp/.bun-{hash}.so (CWE-377) to a per-euid 0700 subdir opened with O_NOFOLLOW, fstat-verified for ISDIR && st.uid == geteuid() && (mode & 0o777) == 0o700, with fchmod after mkdir to defeat umask, a random-named fallback to avoid DoS-via-squatting, and a fast-path lstat that re-checks owner/type/size before reuse. The Windows branch uses openDirAtWindowsA(.no_follow=true) and relies on per-user %TEMP% ACLs. All the specific holes I flagged earlier are now closed in the diff, but the threat surface (multi-user shared /tmp, systemd-tmpfiles, setuid, symlink races) is subtle enough that a maintainer should sign off on the final shape.

Level of scrutiny

High. ~180 lines of new Zig in the module loader, cross-platform filesystem/permission logic, behavioural change in BunProcess.cpp (POSIX no longer unlinks the extracted .node after dlopen), and an existing test's assertion was relaxed. This is not a mechanical change.

Other factors

  • All prior claude[bot] and coderabbitai inline comments are marked resolved, and the bug hunter found nothing new on this revision.
  • CI on 0ac4c97 reports two x64-asan failures (test-worker-nested-uncaught.js crash, load-file-loader-a-lot.test.ts code 1). They aren't files this PR edits, but the PR does touch worker-adjacent loading; worth confirming they're pre-existing flakes.
  • The new worker regression test is gated to debug builds with a comment about a pre-existing release shutdown race — a maintainer should be aware that's being skipped.
  • I'm not approving per the guideline against auto-approving security-sensitive permission/tempfile code.

Comment thread src/jsc/bindings/BunProcess.cpp
Comment thread src/bun.js/ModuleLoader.zig Outdated
@robobun robobun requested a review from Jarred-Sumner as a code owner April 22, 2026 16:06

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No further issues from my side — all prior feedback has been addressed. Deferring to a human for final sign-off since this is security-sensitive tempfile/dlopen handling (CWE-377) with cross-platform branches and a behavior change (dropping the POSIX post-dlopen unlink).

Extended reasoning...

Overview

This PR rewrites resolveEmbeddedFile in src/bun.js/ModuleLoader.zig to dedupe extraction of embedded native modules (.so/.node) from bun build --compile binaries. It introduces a per-user 0700 {tmpdir}/bun-{euid}/ subdir, content-hashed filenames, an in-process cache on StandaloneModuleGraph.File, atomic write+rename, a random-subdir fallback when the canonical name is squatted, and owner/size validation on the fast path. It also drops the POSIX deleteAfter unlink in BunProcess.cpp (since the file is now intentionally persisted), updates napi.test.ts accordingly, adds a regression test, and tweaks mkdirA on Windows.

Security risks

This is squarely CWE-377 (insecure temporary file) territory feeding into dlopen(). The PR went through several security-hardening iterations during review: per-user 0700 subdir + O_NOFOLLOW open + ownership/mode verification, geteuid() for setuid correctness, lstat+uid+size checks on the cached fast path, fchmod-after-mkdir to defeat restrictive umasks, and a random-name fallback to prevent squat DoS. The current revision looks correct to me, but the threat model (multi-user shared hosts, systemd-tmpfiles sweeping, sticky-bit /tmp) is subtle enough that a human should confirm the final state.

Level of scrutiny

High. This is production-critical native-code-loading logic with a documented attack surface, platform-specific branches (POSIX vs Windows openDirAtWindowsA), and a deliberate behavior change (extracted .node files now persist on POSIX rather than being unlinked post-dlopen). Not a candidate for bot-only approval.

Other factors

All 12+ prior inline comments (mine and CodeRabbit's) are resolved, and the bug-hunting pass on the current revision found nothing. The new regression test is Linux-only and isolates its tmpdir via BUN_TMPDIR. The test/expectations.txt addition is unrelated (an ASAN timeout marker).

Jarred-Sumner pushed a commit that referenced this pull request May 4, 2026
…test

Addresses review feedback on #29587:

- Security (CWE-377): the content-hashed tmpfile lived directly in /tmp,
  where any local user could pre-create `.bun-{hash}.{ext}` at the known
  name and plant a malicious shared library — the size-only reuse check
  would then hand that file to dlopen(). Extraction now goes into
  `{tmpdir}/bun-{uid}/`, which we create with mode 0700 (POSIX) and
  verify on every use (owner == uid, (mode & 0077) == 0, is-a-directory).
  Refuses to extract if the subdir can't be trusted. Opened with
  O_NOFOLLOW so a racing symlink can't redirect us.

- Stale cache: the fast path returned the cached path without checking
  it still existed. After systemd-tmpfiles swept the extracted file,
  every subsequent dlopen() in that process would fail with ENOENT for
  the rest of its lifetime. The fast path now stat()s the cached path
  and, on ENOENT, clears the cache and falls through to re-extract.

- Test hermeticity: the test read and rm'd files in the shared /tmp,
  which could race with concurrent runs or unrelated processes that
  happened to have a byte-identical .so there. It now points the
  spawned binary at a per-test `BUN_TMPDIR` so extraction stays
  sandboxed, and walks one level into that dir to find the
  `bun-{uid}/` subdir.

- Test: drop the brittle `expect(cc.stderr).toBe("")` — gcc/clang/ld
  can emit benign notes on success. Match the existing precedent in
  addr32.test.ts / garbage-env.test.ts (exit code only).

- Test: add a third run that deletes the extracted file mid-test to
  prove the cache self-heals.
Jarred-Sumner pushed a commit that referenced this pull request May 4, 2026
The `readdirSync(tmpdir).toBeEmpty()` check was verifying that
BunProcess.cpp's `deleteAfter` successfully unlinked the extracted
.node. #29587 drops `deleteAfter` on POSIX because the file is now
content-hashed and cached — so an empty tmpdir isn't the right assertion
anymore. My replacement (uniqueness of content hashes per extracted
file) was redundant with test/regression/issue/29585.test.ts, which
already exercises the dedup logic end-to-end.

Just drop the FS-state check here. The existing `stdout === "hello
world!"` assertion is enough — it proves the compiled binary loaded
its embedded addons successfully. The dedup-correctness is covered by
29585.test.ts.
@Jarred-Sumner Jarred-Sumner force-pushed the farm/17bdd035/dedupe-embedded-native-extracts branch from e24ffb9 to 1a5d561 Compare May 4, 2026 10:36
Comment thread test/napi/napi.test.ts Outdated
Comment thread test/expectations.txt Outdated
Comment thread src/jsc/ModuleLoader.zig Outdated
Comment thread src/jsc/ModuleLoader.zig Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All prior feedback has been addressed and I didn't find anything new — deferring to a human reviewer only because this rewrites the embedded-native-module extraction path with a security model (per-user 0700 subdir, ownership/mode verification, predictable filenames feeding dlopen()) that warrants a second pair of eyes.

Extended reasoning...

Overview

This PR rewrites resolveEmbeddedFile in src/jsc/ModuleLoader.zig (~200 new lines) to deduplicate extracted embedded native modules (.so/.node) in bun build --compile binaries via a content-hashed filename inside a per-user 0700 subdirectory of tmpdir, with an in-process cache on StandaloneModuleGraph.File. It also removes the POSIX post-dlopen unlink in BunProcess.cpp, adds a one-line Windows mkdirA namespace fix in sys.zig, updates test/napi/napi.test.ts to drop the now-invalid "tmpdir is empty" assertion, and adds a new Linux regression test.

Security risks

This is the part that warrants human review. The change moves from unpredictable-per-call tempfile names to deterministic content-hashed names — a classic CWE-377 surface. Over ~15 review iterations the PR added: a per-user 0700 subdir ({tmpdir}/bun-{euid}/) opened with O_NOFOLLOW; ownership + mode verification via fstat; a random-name fallback when the canonical dir is squatted; geteuid() (not getuid()) for setuid binaries; lstat + uid + size revalidation on the fast-path cache; and lax-mode handling for CIFS/vfat/DrvFs/Docker bind-mounts. The final mode check was relaxed from strict == 0o700 to (mode & 0o022) == 0 with st.uid == uid as the load-bearing CWE-377 defence on real POSIX filesystems. I believe the current state is sound, but the threat model is subtle enough (multi-user hosts, sticky-bit /tmp, systemd-tmpfiles sweeps, lax-mode mounts) that a maintainer should sign off on the final security/availability trade-off rather than a bot.

Level of scrutiny

High. The output of this function is handed directly to dlopen(), so a wrong answer is arbitrary code execution. The PR also changes long-standing behavior (POSIX no longer unlinks the extracted .node after dlopen) and removes a test assertion in napi.test.ts.

Other factors

All 15+ inline review comments (mine and CodeRabbit's) are resolved, including the most recent commit 112bbfc which addressed the lax-mode-FS cross-restart dedup gap and the leaked-empty-dir cleanup. The new regression test exercises the main-thread, cross-restart, and tmpfiles-sweep self-heal paths on Linux; the Worker test is gated to debug builds due to a pre-existing unrelated shutdown race. CI status on the latest commit wasn't visible in the timeline yet.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No remaining issues from my side — all prior findings are addressed and the bug hunter is clean on this revision. Deferring to a human for final sign-off given the CWE-377 / dlopen surface in resolveEmbeddedFile/tryOpenExtractDir and the bundled unrelated CI-flake suppressions.

Extended reasoning...

Overview

This PR rewrites resolveEmbeddedFile in src/jsc/ModuleLoader.zig (~200 net new lines) to dedupe extraction of embedded native modules in bun build --compile binaries: content-hashed filenames inside a per-user 0700 {tmpdir}/bun-{euid}/ subdir, mutex-guarded per-File cache on the shared StandaloneModuleGraph, atomic scratch+rename, and an 8-iteration random-name fallback when the canonical dir is squatted. It also drops the POSIX post-dlopen unlink in BunProcess.cpp, adds a Linux regression test, adjusts napi.test.ts to the new persistence semantics, and bundles several unrelated CI-flake suppressions (expectations.txt, jsc-stress.test.ts, bun-install-registry.test.ts, node-http-backpressure-max.test.ts).

Security risks

The core change is exactly a CWE-377 surface: deterministic tempfile names whose paths are handed to dlopen(). Over ~15 review rounds the implementation evolved to address predictable-tempfile RCE, squatted-subdir DoS, getuid vs geteuid, fast-path ownership re-validation, O_NOFOLLOW on both POSIX and Windows, umask masking, and lax-mode filesystems. The current revision looks correct to me — lstat + uid + size on the fast path, fchmod-then-verify with a (mode & 0o022) == 0 floor on the slow path, just_created cleanup on rejection — but the threat model is subtle enough (multi-user /tmp, sticky bit, tmpfiles sweep, CIFS/DrvFs mounts) that a human security-minded reviewer should confirm the final state rather than relying solely on the iterative bot review.

Level of scrutiny

High. This is production module-loading code that gates what gets passed to dlopen(); a mistake here is local code execution. It also changes user-visible disk-persistence semantics (extracted .node files now survive on POSIX) and removes an existing #19550 cleanup behavior.

Other factors

  • All 15 of my prior inline findings and all coderabbit findings are marked resolved; the bug-hunting system found nothing on this revision.
  • Commits since my last comment (aa1df74..a56a2e0) are purely CI-flake annotations / timeout bumps unrelated to the core fix — reasonable for landing, but a human should confirm they're acceptable to bundle here.
  • The new regression test is Linux-only and the Worker variant is gated to debug builds due to a noted pre-existing release-build shutdown race, so coverage is partial.
  • One CI failure remains (jsc-stress on Windows x64) which the latest commit marks todoIf as a pre-existing WebKit issue.

Comment thread test/cli/install/bun-install-registry.test.ts Outdated
robobun pushed a commit that referenced this pull request May 16, 2026
…test

Addresses review feedback on #29587:

- Security (CWE-377): the content-hashed tmpfile lived directly in /tmp,
  where any local user could pre-create `.bun-{hash}.{ext}` at the known
  name and plant a malicious shared library — the size-only reuse check
  would then hand that file to dlopen(). Extraction now goes into
  `{tmpdir}/bun-{uid}/`, which we create with mode 0700 (POSIX) and
  verify on every use (owner == uid, (mode & 0077) == 0, is-a-directory).
  Refuses to extract if the subdir can't be trusted. Opened with
  O_NOFOLLOW so a racing symlink can't redirect us.

- Stale cache: the fast path returned the cached path without checking
  it still existed. After systemd-tmpfiles swept the extracted file,
  every subsequent dlopen() in that process would fail with ENOENT for
  the rest of its lifetime. The fast path now stat()s the cached path
  and, on ENOENT, clears the cache and falls through to re-extract.

- Test hermeticity: the test read and rm'd files in the shared /tmp,
  which could race with concurrent runs or unrelated processes that
  happened to have a byte-identical .so there. It now points the
  spawned binary at a per-test `BUN_TMPDIR` so extraction stays
  sandboxed, and walks one level into that dir to find the
  `bun-{uid}/` subdir.

- Test: drop the brittle `expect(cc.stderr).toBe("")` — gcc/clang/ld
  can emit benign notes on success. Match the existing precedent in
  addr32.test.ts / garbage-env.test.ts (exit code only).

- Test: add a third run that deletes the extracted file mid-test to
  prove the cache self-heals.
robobun pushed a commit that referenced this pull request May 16, 2026
The `readdirSync(tmpdir).toBeEmpty()` check was verifying that
BunProcess.cpp's `deleteAfter` successfully unlinked the extracted
.node. #29587 drops `deleteAfter` on POSIX because the file is now
content-hashed and cached — so an empty tmpdir isn't the right assertion
anymore. My replacement (uniqueness of content hashes per extracted
file) was redundant with test/regression/issue/29585.test.ts, which
already exercises the dedup logic end-to-end.

Just drop the FS-state check here. The existing `stdout === "hello
world!"` assertion is enough — it proves the compiled binary loaded
its embedded addons successfully. The dedup-correctness is covered by
29585.test.ts.
@robobun robobun force-pushed the farm/17bdd035/dedupe-embedded-native-extracts branch from 4630301 to d74a22d Compare May 16, 2026 21:35
robobun added a commit that referenced this pull request May 16, 2026
…n-arm (unrelated pre-existing flake)

On Windows AArch64 this test times out for 20s × 4 runner-internal
retries because req.once('close') doesn't fire within the deadline
after controller.abort(). It's a genuine Bun bug on that one platform
but orthogonal to #29585 — confirmed failing on ≥4 independent PR head
shas (#29587 builds 51292, 51519; #30219 build 50805; #30224 build
50951) always on :windows: 11 aarch64 test-bun shard 7/8, all 4
retries timing out identically.

Per the WINDOWS-AARCH64 + FLAKY precedent already in expectations.txt
(line 6), add the test to the skip list scoped to WINDOWS-AARCH64 so
it stops gating unrelated PRs. The underlying abort→close race is
tracked separately; this entry can come out when the Windows IOCP
path learns to emit 'close' in time.
robobun added a commit that referenced this pull request May 16, 2026
1. test/js/bun/test/parallel/test-http-should-emit-close-when-connection-is-aborted.ts —
   times out on all three Windows variants (x64, x64-baseline, aarch64)
   waiting for req.once('close') after controller.abort(). Widen the
   expectations.txt skip from WINDOWS-AARCH64 to WINDOWS since #51535
   showed the flake hits non-arm shards too.

2. test/cli/install/bun-install-registry.test.ts peerTests loop —
   :3659 intermittently reads a cached a-dep@1.0.9 instead of the
   fresh-hoisted peer-resolved version under local-registry contention.
   Hit on unrelated PRs #30219 (build 50805, recovered on retry #4),
   #30224 (build 50951, hard-failed 4/4), and #29587 (build 51535,
   hard-failed 4/4). Matches the existing todoIf(isFlaky && isMacOS &&
   situation === 'peer ^1.0.2') on the same loop — widen to also include
   isFlaky && isWindows. Precedent at :4222 in the same file already
   does this for 'text lockfile is hoisted'.

Both are real pre-existing Bun bugs orthogonal to embedded-native-module
dedup — filed here only because they're blocking this PR's CI green.
Comment thread src/runtime/jsc_hooks.rs Outdated
Comment thread src/runtime/jsc_hooks.rs
Comment thread src/runtime/jsc_hooks.rs Outdated
Comment thread src/jsc/bindings/BunProcess.cpp Outdated
Comment thread src/jsc/bindings/BunProcess.cpp Outdated
Comment thread src/runtime/jsc_hooks.rs Outdated
@robobun

robobun commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

Linking #30962 — same root cause, minimal repro (no Workers required). The fix on this branch (content-hashed filename + per-File.extracted_path cache) covers that shape too.

@Jarred-Sumner Jarred-Sumner left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

undo the .zig changes. just do the .rs changes. can you come up with a simpler solution? can we avoid this extract dir? can we just do 1 file?

@robobun robobun force-pushed the farm/17bdd035/dedupe-embedded-native-extracts branch from d25569f to 480a174 Compare May 18, 2026 05:51
@robobun

robobun commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

Done. Reset to a minimal -only diff on top of main and simplified the design:

  • No per-user subdir, no 8-iter random fallback, no open_extract_dir dance
  • Single content-hashed file directly in tmpdir: {tmpdir}/.bun-{uid}-{hash}.{ext}
  • uid in the filename so a multi-user /tmp doesn't collide between users
  • Fast path: lstat the cached path (uid + IFREG + size), reuse if valid
  • Warm path: fstatat the canonical name, reuse without rewriting
  • Cold path: scratch-write + rename(2) to canonical (atomic on POSIX & Windows)
  • If rename fails (adversarial: squatted destination on sticky /tmp), fall back to the scratch path for that one call — correct, no dedup until the squatter releases

Global Mutex serialises lazy init of File.extracted_path across Worker VMs. No Zig changes, no extracted_path_lock on the Zig side, no Zig helper functions — all the logic lives in src/runtime/jsc_hooks.rs (~160 lines).

Also drops the POSIX post-dlopen unlink in BunProcess.cpp (defeating the dedup) and a trivial <cassert> include that the unified-source bundler order trips on locally.

Latest: 480a174.

Comment thread src/runtime/jsc_hooks.rs Outdated
Comment thread src/runtime/jsc_hooks.rs Outdated
Comment thread test/regression/issue/29585.test.ts Outdated
Comment thread src/runtime/jsc_hooks.rs Outdated
Comment thread src/runtime/jsc_hooks.rs Outdated
@robobun robobun force-pushed the farm/17bdd035/dedupe-embedded-native-extracts branch from ddafb01 to a997aa2 Compare May 22, 2026 18:14
Comment thread src/runtime/jsc_hooks.rs Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No further findings from me — all prior issues are resolved and the final design looks coherent. Given this rewrites the tempfile/dlopen() extraction path with CWE-377 hardening (uid/ownership checks, lstat-based symlink defense, shared /tmp squatting fallback), it's worth a human pass on the security model before merge.

Extended reasoning...

Overview

This PR rewrites resolve_embedded_file_to_buf in src/runtime/jsc_hooks.rs (~200 lines net) to deduplicate extracted embedded native modules from bun build --compile binaries via content-hashed filenames in tmpdir, fixing a per-dlopen() /tmp leak (#29585). It also adds an extracted_path cache field on StandaloneModuleGraph::File, removes the now-counterproductive POSIX post-dlopen unlink in BunProcess.cpp, adjusts napi.test.ts accordingly, and adds a Linux regression test.

Security risks

The core change creates predictably-named files in shared /tmp and hands the resulting path to dlopen(). The review cycle surfaced and resolved four 🔴 findings in this area: a CWE-377 predictable-tempfile RCE (initial design), a stale-cache ENOENT regression, a squatted-subdir DoS, and a TOCTOU symlink-follow on the warm path. The final design mitigates these via uid-in-filename, lstat/lstatat (no-follow) ownership+ISREG+size validation on both fast and warm paths, atomic scratch-write+rename, and a scratch-path fallback when the canonical name is squatted. The residual threat model (shared sticky-bit /tmp, attacker with read access to the compiled binary) is non-trivial and the defenses are layered — exactly the kind of thing that benefits from human security review.

Level of scrutiny

High. This is production-critical native-module-loading code where a mistake means either RCE (load attacker's .so) or DoS (embedded modules fail to load). The PR went through ~25 review iterations including a full redesign (per-user 0700 subdir → flat uid-prefixed filename), which itself involved security tradeoffs that a human should validate.

Other factors

All my prior inline comments (and CodeRabbit's) are resolved; the bug-hunting system found nothing on the current revision. Test coverage is reasonable (Linux-only regression test exercising 10× dlopen, cross-restart reuse, and tmpfiles-sweep self-heal), though the Worker-concurrency test is gated to debug builds due to an unrelated pre-existing shutdown race. The unsafe { &mut *graph } aliasing pattern is acknowledged as a pre-existing codebase-wide convention, not new to this PR.

@robobun robobun force-pushed the farm/17bdd035/dedupe-embedded-native-extracts branch from 1da5c2c to 444d305 Compare June 7, 2026 03:43

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No remaining findings from my side — all prior rounds are addressed. Given this rewrites the embedded-library extraction path with CWE-377 hardening (predictable tempfile names, uid/symlink checks, dlopen target selection) and drops the POSIX post-dlopen unlink from #19550, it's worth a human pass on the threat model and the Windows behavior (which the regression test doesn't cover).

Extended reasoning...

Overview

This PR fixes #29585 by replacing per-call random tempfile extraction of embedded .so/.node/.dylib files in bun build --compile binaries with a content-hashed, deduplicated path. It touches:

  • src/runtime/jsc_hooks.rs (~230 lines net): rewrites resolve_embedded_file_to_buf with a three-tier fast/warm/cold path, a process-global mutex, lstat-based ownership/size validation, scratch-write + atomic rename, and a squatted-path fallback. Adds helpers format_canonical_name, write_absolute, path_to_nul_boxed, extract_owner_uid.
  • src/standalone_graph/StandaloneModuleGraph.rs: adds a mutable extracted_path: Option<Box<[u8]>> field to the shared File struct.
  • src/jsc/bindings/BunProcess.cpp: removes the POSIX post-dlopen unlink (the #19550 fix), gates deleteAfter/tryToDeleteIfNecessary to Windows-only, drops the Bun__unlink extern.
  • test/napi/napi.test.ts: removes the "tmpdir is empty after run" assertion (semantics changed — extracted files now intentionally persist).
  • test/regression/issue/29585.test.ts: new Linux-only regression test (the Worker variant is further gated to debug builds only due to a pre-existing release-build shutdown race).

Security risks

This is squarely security-sensitive code. The PR went through ~25 review iterations over six weeks, with multiple design resets, specifically because of CWE-377 (predictable tempfile) concerns:

  • The extracted path is handed directly to dlopen(), so a wrong reuse decision is arbitrary code execution.
  • The canonical filename .bun-{euid}-{wyhash(contents)}.{ext} is fully predictable to a local attacker who can read the compiled binary.
  • The defense rests on lstat/lstatat + st_uid == geteuid() + S_ISREG + size match, plus sticky-bit semantics for the rename-over-squatter fallback. Each of these was the subject of at least one resolved finding (symlink-following fstatat, real-vs-effective uid, fast-path bypass after tmpfiles sweep, etc.).
  • On Windows, the ours check is true unconditionally — the threat model there relies on %TEMP% being per-user, which is conventional but not enforced by this code.
  • The unsafe { &mut *graph } pattern writes through a shared singleton; the lock serialises only callers of this function, which the SAFETY comment now correctly scopes.

I'm not aware of a remaining concrete exploit path in the final design, but the reasoning is subtle enough that I shouldn't be the only reviewer.

Level of scrutiny

High. This is production runtime code on the dlopen() path of every compiled binary that embeds a native module, with a documented threat model involving multi-user /tmp. It also reverses a prior deliberate decision (#19550's immediate unlink) and weakens an existing test (napi.test.ts no longer asserts cleanup). The new regression test doesn't exercise macOS or Windows, and the Worker test is debug-only.

Other factors

  • All ~25 of my prior inline findings are resolved, and the current bug-hunting pass found nothing.
  • The PR went through a full design reset (480a174) midway, dropping the per-user-subdir approach for the simpler flat-file design — the diff is now reasonably compact.
  • CI is building on the latest commit; I haven't verified it's green across all targets.
  • No human reviewer appears to have weighed in at any point in the timeline.

@robobun

robobun commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator Author

CI note: the red on build 61227 is test/cli/install/bunx.test.ts > "should handle package that requires node 24", which runs bun x @angular/cli@latest against the live registry. @angular/cli@22.0.0 (published 2026-06-03) requires node ^22.22.3 || ^24.15.0 || >=26.0.0; Bun reports 24.3.0, so the Angular engine check exits 3. Every build on every branch since June 3 13:34 UTC fails this test identically (verified across 16 recent builds). Unrelated to this PR; needs a main-side fix (pin the Angular version in the test, or bump REPORTED_NODEJS_VERSION).

@robobun

robobun commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator Author

CI note (build 61227, continued): besides the repo-wide bunx/Angular breakage, one shard (debian 13 aarch64, shard 0) had isolated-install.test.ts fail with 3 ConnectionRefused errors after the verdaccio mock-registry child process segfaulted mid-run (panic: Segmentation fault at address 0x400000000012, crash report: https://bun.report/1.4.0/L_2444d305gWukgoCmj/9iDuql3hD2/1p0Duz4s1D+tui2D+zn+yD+ts2gDuzvhB_A2gggBkB). Verdaccio is plain JS on a normal bun runtime where this PR's code is unreachable: resolve_embedded_file_to_buf only runs inside compiled standalone binaries (Graph::get() is None otherwise), the new File.extracted_path field is only constructed when a standalone graph loads, and the BunProcess.cpp change only affects /$bunfs/-prefixed dlopen. The same file passed 47/47 on darwin x64 in the same build, and no other recent build shows this annotation — so it looks like a pre-existing aarch64 flake, but flagging the crash link in case a maintainer wants to remap it.

bun build --compile binaries dlopen() embedded .so/.node files through
resolve_embedded_file_to_buf. The previous implementation called
FileSystem::tmpname() each time, which appends a nanoTimestamp and a
per-VM counter — so every dlopen() leaked a fresh /tmp file (#29585
reported ~14 GiB/hour on an opencode + ripgrep-workers deployment).

Dedupe onto a single content-hashed path {tmpdir}/.bun-{uid}-{hash}.{ext}:

  - Fast path: cached absolute path on StandaloneModuleGraph::File.
    lstat-verified each call (uid + IFREG + size) so a tmpfiles sweep
    or attacker replacement re-extracts instead of being handed off to
    dlopen.
  - Warm path: no cache yet, but the content-hashed canonical file from
    a prior run is still on disk — lstatat-verified (no symlink follow)
    and reused without rewriting.
  - Cold path: write to a unique scratch name, rename(2) to canonical.
    Atomic on POSIX and Windows (MoveFileEx); concurrent Workers
    converge on the same content-hashed name. If rename fails
    (adversarial: another user owns a squatted destination on sticky
    /tmp), fall back to the scratch path — correct, with within-process
    dedup via the cached scratch path but no cross-restart dedup until
    the squatter releases the canonical name.

A global Mutex serialises the lazy init of File.extracted_path across
Worker VMs that share the graph.

Drop the POSIX post-dlopen unlink in BunProcess.cpp (#19550) — the
content-hashed tmpfile is supposed to persist for the dedup to work.
Windows keeps delete-on-reboot for hygiene (NTFS can't unlink an
in-use file).
@robobun robobun force-pushed the farm/17bdd035/dedupe-embedded-native-extracts branch from 444d305 to f54d3e1 Compare June 16, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compiled binary: Workers leak extracted native .so/.dylib files in /tmp (never cleaned up)

2 participants