Dedupe extracted embedded native modules in compiled binaries#29587
Dedupe extracted embedded native modules in compiled binaries#29587robobun wants to merge 1 commit into
Conversation
|
Updated 4:05 PM PT - Jun 16th, 2026
❌ @robobun, your commit f54d3e1 has 1 failures in
🧪 To try this PR locally: bunx bun-pr 29587That installs a local version of the PR into your bun-29587 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a nullable per-file extracted path field and implements mutex-serialized lazy extraction into a private per-user extract directory with content-hash deduplication and atomic placement; adds a Linux regression test verifying single-copy extraction across workers and restarts; minor Win32 API call reference change. Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/regression/issue/29585.test.ts`:
- Around line 29-84: Both countExtractedCopies and removeExtractedCopies
duplicate the byte-by-byte matching logic; extract that into a small helper
(e.g., matchesBytes or fileMatchesBytes) that takes a file path (or Bun.File)
and the target Uint8Array and returns a boolean, and then call that helper from
both countExtractedCopies and removeExtractedCopies; implement the helper to
first check size/equality and then use a single buffer comparison
(Buffer.from(...).equals(...) or an equivalent fast comparison) to replace the
manual loop and reduce duplication.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8e7815d7-60ea-4fd9-a129-37c8a1c1de98
📒 Files selected for processing (1)
test/regression/issue/29585.test.ts
…test Addresses review feedback on #29587: - Security (CWE-377): the content-hashed tmpfile lived directly in /tmp, where any local user could pre-create `.bun-{hash}.{ext}` at the known name and plant a malicious shared library — the size-only reuse check would then hand that file to dlopen(). Extraction now goes into `{tmpdir}/bun-{uid}/`, which we create with mode 0700 (POSIX) and verify on every use (owner == uid, (mode & 0077) == 0, is-a-directory). Refuses to extract if the subdir can't be trusted. Opened with O_NOFOLLOW so a racing symlink can't redirect us. - Stale cache: the fast path returned the cached path without checking it still existed. After systemd-tmpfiles swept the extracted file, every subsequent dlopen() in that process would fail with ENOENT for the rest of its lifetime. The fast path now stat()s the cached path and, on ENOENT, clears the cache and falls through to re-extract. - Test hermeticity: the test read and rm'd files in the shared /tmp, which could race with concurrent runs or unrelated processes that happened to have a byte-identical .so there. It now points the spawned binary at a per-test `BUN_TMPDIR` so extraction stays sandboxed, and walks one level into that dir to find the `bun-{uid}/` subdir. - Test: drop the brittle `expect(cc.stderr).toBe("")` — gcc/clang/ld can emit benign notes on success. Match the existing precedent in addr32.test.ts / garbage-env.test.ts (exit code only). - Test: add a third run that deletes the extracted file mid-test to prove the cache self-heals.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/ModuleLoader.zig`:
- Around line 167-174: The openExtractDir() implementation in ModuleLoader.zig
uses bun.sys.open(...) which on Windows loses O.NOFOLLOW semantics; change the
subdir_fd acquisition to branch on Environment.os and call
bun.sys.openDirAtWindowsA(bun.FD.cwd(), abs_z, .{ .no_follow = true, .read_only
= true }) when Environment.os == .windows, otherwise keep the existing
bun.sys.open(...) call, and preserve the current switch that maps .result to fd
and .err to return null so the error handling for subdir_fd remains identical.
In `@test/regression/issue/29585.test.ts`:
- Around line 55-62: The test currently serializes worker startup by awaiting
each worker's ready promise inside the creation loop; change it to first create
all Worker instances (using new Worker(import.meta.url) and pushing to workers
array), attach each worker's ready resolver via Promise.withResolvers and
w.addEventListener("message", () => resolve(), { once: true }) but do NOT await
inside the loop — instead collect all the per-worker promises and after the loop
call await Promise.all(collectedPromises) so all dlopen() calls run concurrently
and the test exercises real concurrent worker extraction and atomic write+rename
convergence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 03f648b5-ff32-4c02-bebc-44b569cf40cd
📒 Files selected for processing (3)
src/bun.js/ModuleLoader.zigsrc/sys.zigtest/regression/issue/29585.test.ts
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/ModuleLoader.zig`:
- Around line 174-182: The mkdir call using bun.sys.mkdirA(abs_z, 0o700) is
vulnerable to the caller's umask; after creating the directory you must
explicitly enforce owner rwx (0700) rather than relying on the mode argument.
Update the code around the mkdir switch to, on POSIX success (and also when
EXIST path is accepted), call the appropriate API to set the mode to 0o700
(e.g., bun.sys.chmodA(abs_z, 0o700) or open+bun.sys.fchmod on the created
directory) and re-check permissions/ownership; apply the same change to the
other occurrence mentioned (lines ~203-214) so the directory always ends up with
owner read/write/execute regardless of umask.
- Around line 49-61: The fast-path that returns file.extracted_path must also
validate the file size like the fstatat() branch: after bun.sys.stat(cached)
succeeds, perform the same size check (compare the stat result's st_size to the
expected size stored on the File/Module struct — e.g. file.expected_size or
file.extracted_size used by the fstatat branch) and only memcpy/return path_buf
if sizes match; otherwise free(cached) and set file.extracted_path = null to
fall through to the recreate path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8aea0577-83a7-412c-812b-0fc79d1e274e
📒 Files selected for processing (2)
src/bun.js/ModuleLoader.zigtest/regression/issue/29585.test.ts
There was a problem hiding this comment.
All my prior feedback has been addressed and I have no further findings, but this rewrites security-sensitive tempfile/dlopen path resolution (CWE-377 threat model on shared hosts) and warrants a human reviewer's sign-off. Also note build-zig is currently red on f795e58.
Extended reasoning...
Overview
This PR rewrites resolveEmbeddedFile in src/bun.js/ModuleLoader.zig to deduplicate extracted embedded native modules from bun build --compile binaries. It introduces a per-user 0700 extract directory under tmpdir, content-hashed filenames, an in-process mutex-guarded cache on StandaloneModuleGraph.File, atomic scratch-write+rename, fast-path revalidation via lstat with uid/size/type checks, and a randomized fallback directory when the canonical path is squatted. It also adds a Linux regression test and a one-line sys.zig change routing Windows mkdirA through bun.windows.CreateDirectoryW.
Security risks
The core of this change is a CWE-377 (predictable tempfile) defense for code that hands a filesystem path directly to dlopen(). The threat model — another local user pre-creating predictable paths in shared /tmp, symlink races, sticky-bit DoS via squatting, systemd-tmpfiles aging out the extract dir, setuid euid/ruid mismatch — was iterated on across several review rounds and the current revision addresses each: per-user 0700 subdir opened with O_NOFOLLOW, ownership/mode verified via fstat, geteuid() used for ownership comparison, fast-path revalidation checks st.uid/IFREG/size via lstat, and a randomized fallback dir avoids the squatting DoS. I have no remaining findings, but the consequence of a miss here is local-attacker RCE via dlopen, so the threat-model reasoning deserves a second pair of human eyes.
Level of scrutiny
High. This is production-critical native-module loading that runs in every compiled binary using embedded .so/.node files, and it directly governs which on-disk path is passed to dlopen(). The logic is ~180 new lines with platform branches (POSIX vs Windows) and subtle ordering requirements (mkdir → open NOFOLLOW → fchmod → fstat verify).
Other factors
- All 10+ prior inline comments (from me and CodeRabbit) are marked resolved and the diff reflects the fixes.
- CI (
build-zig) is failing across all platforms on commit f795e58; the only commit since (df142df) is a doc-comment move, so the build is likely still red and needs investigation before merge. - The regression test is Linux-only and hermetic (isolated
BUN_TMPDIR), but Windows/macOS paths are untested.
The `readdirSync(tmpdir).toBeEmpty()` check was verifying that BunProcess.cpp's `deleteAfter` successfully unlinked the extracted .node. #29587 drops `deleteAfter` on POSIX because the file is now content-hashed and cached — so an empty tmpdir isn't the right assertion anymore. My replacement (uniqueness of content hashes per extracted file) was redundant with test/regression/issue/29585.test.ts, which already exercises the dedup logic end-to-end. Just drop the FS-state check here. The existing `stdout === "hello world!"` assertion is enough — it proves the compiled binary loaded its embedded addons successfully. The dedup-correctness is covered by 29585.test.ts.
There was a problem hiding this comment.
No further issues from my side — all the earlier CWE-377/tempfile concerns look addressed. Given this is security-sensitive dlopen/tempfile handling that's been through several hardening iterations, I'd like a human to give it a final look (and confirm the two x64-asan CI failures on 0ac4c97 are unrelated).
Extended reasoning...
Overview
This PR rewrites resolveEmbeddedFile in src/bun.js/ModuleLoader.zig to deduplicate extracted embedded native modules (.so/.node) from bun build --compile binaries. It introduces a per-user 0700 {tmpdir}/bun-{euid}/ subdirectory, content-hashed filenames, atomic scratch-then-rename extraction, an in-process File.extracted_path cache guarded by a global mutex, and a random-named fallback directory if the canonical one is squatted. It also drops the POSIX deleteAfter unlink in BunProcess.cpp, adds a one-line sys.zig change, updates test/napi/napi.test.ts, and adds a Linux regression test.
Security risks
This is squarely security-sensitive: the output path is handed straight to dlopen(), so any weakness in the trust check is local RCE. Across the review iterations the design moved from a predictable /tmp/.bun-{hash}.so (CWE-377) to a per-euid 0700 subdir opened with O_NOFOLLOW, fstat-verified for ISDIR && st.uid == geteuid() && (mode & 0o777) == 0o700, with fchmod after mkdir to defeat umask, a random-named fallback to avoid DoS-via-squatting, and a fast-path lstat that re-checks owner/type/size before reuse. The Windows branch uses openDirAtWindowsA(.no_follow=true) and relies on per-user %TEMP% ACLs. All the specific holes I flagged earlier are now closed in the diff, but the threat surface (multi-user shared /tmp, systemd-tmpfiles, setuid, symlink races) is subtle enough that a maintainer should sign off on the final shape.
Level of scrutiny
High. ~180 lines of new Zig in the module loader, cross-platform filesystem/permission logic, behavioural change in BunProcess.cpp (POSIX no longer unlinks the extracted .node after dlopen), and an existing test's assertion was relaxed. This is not a mechanical change.
Other factors
- All prior
claude[bot]andcoderabbitaiinline comments are marked resolved, and the bug hunter found nothing new on this revision. - CI on 0ac4c97 reports two x64-asan failures (
test-worker-nested-uncaught.jscrash,load-file-loader-a-lot.test.tscode 1). They aren't files this PR edits, but the PR does touch worker-adjacent loading; worth confirming they're pre-existing flakes. - The new worker regression test is gated to debug builds with a comment about a pre-existing release shutdown race — a maintainer should be aware that's being skipped.
- I'm not approving per the guideline against auto-approving security-sensitive permission/tempfile code.
There was a problem hiding this comment.
No further issues from my side — all prior feedback has been addressed. Deferring to a human for final sign-off since this is security-sensitive tempfile/dlopen handling (CWE-377) with cross-platform branches and a behavior change (dropping the POSIX post-dlopen unlink).
Extended reasoning...
Overview
This PR rewrites resolveEmbeddedFile in src/bun.js/ModuleLoader.zig to dedupe extraction of embedded native modules (.so/.node) from bun build --compile binaries. It introduces a per-user 0700 {tmpdir}/bun-{euid}/ subdir, content-hashed filenames, an in-process cache on StandaloneModuleGraph.File, atomic write+rename, a random-subdir fallback when the canonical name is squatted, and owner/size validation on the fast path. It also drops the POSIX deleteAfter unlink in BunProcess.cpp (since the file is now intentionally persisted), updates napi.test.ts accordingly, adds a regression test, and tweaks mkdirA on Windows.
Security risks
This is squarely CWE-377 (insecure temporary file) territory feeding into dlopen(). The PR went through several security-hardening iterations during review: per-user 0700 subdir + O_NOFOLLOW open + ownership/mode verification, geteuid() for setuid correctness, lstat+uid+size checks on the cached fast path, fchmod-after-mkdir to defeat restrictive umasks, and a random-name fallback to prevent squat DoS. The current revision looks correct to me, but the threat model (multi-user shared hosts, systemd-tmpfiles sweeping, sticky-bit /tmp) is subtle enough that a human should confirm the final state.
Level of scrutiny
High. This is production-critical native-code-loading logic with a documented attack surface, platform-specific branches (POSIX vs Windows openDirAtWindowsA), and a deliberate behavior change (extracted .node files now persist on POSIX rather than being unlinked post-dlopen). Not a candidate for bot-only approval.
Other factors
All 12+ prior inline comments (mine and CodeRabbit's) are resolved, and the bug-hunting pass on the current revision found nothing. The new regression test is Linux-only and isolates its tmpdir via BUN_TMPDIR. The test/expectations.txt addition is unrelated (an ASAN timeout marker).
…test Addresses review feedback on #29587: - Security (CWE-377): the content-hashed tmpfile lived directly in /tmp, where any local user could pre-create `.bun-{hash}.{ext}` at the known name and plant a malicious shared library — the size-only reuse check would then hand that file to dlopen(). Extraction now goes into `{tmpdir}/bun-{uid}/`, which we create with mode 0700 (POSIX) and verify on every use (owner == uid, (mode & 0077) == 0, is-a-directory). Refuses to extract if the subdir can't be trusted. Opened with O_NOFOLLOW so a racing symlink can't redirect us. - Stale cache: the fast path returned the cached path without checking it still existed. After systemd-tmpfiles swept the extracted file, every subsequent dlopen() in that process would fail with ENOENT for the rest of its lifetime. The fast path now stat()s the cached path and, on ENOENT, clears the cache and falls through to re-extract. - Test hermeticity: the test read and rm'd files in the shared /tmp, which could race with concurrent runs or unrelated processes that happened to have a byte-identical .so there. It now points the spawned binary at a per-test `BUN_TMPDIR` so extraction stays sandboxed, and walks one level into that dir to find the `bun-{uid}/` subdir. - Test: drop the brittle `expect(cc.stderr).toBe("")` — gcc/clang/ld can emit benign notes on success. Match the existing precedent in addr32.test.ts / garbage-env.test.ts (exit code only). - Test: add a third run that deletes the extracted file mid-test to prove the cache self-heals.
The `readdirSync(tmpdir).toBeEmpty()` check was verifying that BunProcess.cpp's `deleteAfter` successfully unlinked the extracted .node. #29587 drops `deleteAfter` on POSIX because the file is now content-hashed and cached — so an empty tmpdir isn't the right assertion anymore. My replacement (uniqueness of content hashes per extracted file) was redundant with test/regression/issue/29585.test.ts, which already exercises the dedup logic end-to-end. Just drop the FS-state check here. The existing `stdout === "hello world!"` assertion is enough — it proves the compiled binary loaded its embedded addons successfully. The dedup-correctness is covered by 29585.test.ts.
e24ffb9 to
1a5d561
Compare
There was a problem hiding this comment.
All prior feedback has been addressed and I didn't find anything new — deferring to a human reviewer only because this rewrites the embedded-native-module extraction path with a security model (per-user 0700 subdir, ownership/mode verification, predictable filenames feeding dlopen()) that warrants a second pair of eyes.
Extended reasoning...
Overview
This PR rewrites resolveEmbeddedFile in src/jsc/ModuleLoader.zig (~200 new lines) to deduplicate extracted embedded native modules (.so/.node) in bun build --compile binaries via a content-hashed filename inside a per-user 0700 subdirectory of tmpdir, with an in-process cache on StandaloneModuleGraph.File. It also removes the POSIX post-dlopen unlink in BunProcess.cpp, adds a one-line Windows mkdirA namespace fix in sys.zig, updates test/napi/napi.test.ts to drop the now-invalid "tmpdir is empty" assertion, and adds a new Linux regression test.
Security risks
This is the part that warrants human review. The change moves from unpredictable-per-call tempfile names to deterministic content-hashed names — a classic CWE-377 surface. Over ~15 review iterations the PR added: a per-user 0700 subdir ({tmpdir}/bun-{euid}/) opened with O_NOFOLLOW; ownership + mode verification via fstat; a random-name fallback when the canonical dir is squatted; geteuid() (not getuid()) for setuid binaries; lstat + uid + size revalidation on the fast-path cache; and lax-mode handling for CIFS/vfat/DrvFs/Docker bind-mounts. The final mode check was relaxed from strict == 0o700 to (mode & 0o022) == 0 with st.uid == uid as the load-bearing CWE-377 defence on real POSIX filesystems. I believe the current state is sound, but the threat model is subtle enough (multi-user hosts, sticky-bit /tmp, systemd-tmpfiles sweeps, lax-mode mounts) that a maintainer should sign off on the final security/availability trade-off rather than a bot.
Level of scrutiny
High. The output of this function is handed directly to dlopen(), so a wrong answer is arbitrary code execution. The PR also changes long-standing behavior (POSIX no longer unlinks the extracted .node after dlopen) and removes a test assertion in napi.test.ts.
Other factors
All 15+ inline review comments (mine and CodeRabbit's) are resolved, including the most recent commit 112bbfc which addressed the lax-mode-FS cross-restart dedup gap and the leaked-empty-dir cleanup. The new regression test exercises the main-thread, cross-restart, and tmpfiles-sweep self-heal paths on Linux; the Worker test is gated to debug builds due to a pre-existing unrelated shutdown race. CI status on the latest commit wasn't visible in the timeline yet.
There was a problem hiding this comment.
No remaining issues from my side — all prior findings are addressed and the bug hunter is clean on this revision. Deferring to a human for final sign-off given the CWE-377 / dlopen surface in resolveEmbeddedFile/tryOpenExtractDir and the bundled unrelated CI-flake suppressions.
Extended reasoning...
Overview
This PR rewrites resolveEmbeddedFile in src/jsc/ModuleLoader.zig (~200 net new lines) to dedupe extraction of embedded native modules in bun build --compile binaries: content-hashed filenames inside a per-user 0700 {tmpdir}/bun-{euid}/ subdir, mutex-guarded per-File cache on the shared StandaloneModuleGraph, atomic scratch+rename, and an 8-iteration random-name fallback when the canonical dir is squatted. It also drops the POSIX post-dlopen unlink in BunProcess.cpp, adds a Linux regression test, adjusts napi.test.ts to the new persistence semantics, and bundles several unrelated CI-flake suppressions (expectations.txt, jsc-stress.test.ts, bun-install-registry.test.ts, node-http-backpressure-max.test.ts).
Security risks
The core change is exactly a CWE-377 surface: deterministic tempfile names whose paths are handed to dlopen(). Over ~15 review rounds the implementation evolved to address predictable-tempfile RCE, squatted-subdir DoS, getuid vs geteuid, fast-path ownership re-validation, O_NOFOLLOW on both POSIX and Windows, umask masking, and lax-mode filesystems. The current revision looks correct to me — lstat + uid + size on the fast path, fchmod-then-verify with a (mode & 0o022) == 0 floor on the slow path, just_created cleanup on rejection — but the threat model is subtle enough (multi-user /tmp, sticky bit, tmpfiles sweep, CIFS/DrvFs mounts) that a human security-minded reviewer should confirm the final state rather than relying solely on the iterative bot review.
Level of scrutiny
High. This is production module-loading code that gates what gets passed to dlopen(); a mistake here is local code execution. It also changes user-visible disk-persistence semantics (extracted .node files now survive on POSIX) and removes an existing #19550 cleanup behavior.
Other factors
- All 15 of my prior inline findings and all coderabbit findings are marked resolved; the bug-hunting system found nothing on this revision.
- Commits since my last comment (aa1df74..a56a2e0) are purely CI-flake annotations / timeout bumps unrelated to the core fix — reasonable for landing, but a human should confirm they're acceptable to bundle here.
- The new regression test is Linux-only and the Worker variant is gated to debug builds due to a noted pre-existing release-build shutdown race, so coverage is partial.
- One CI failure remains (
jsc-stresson Windows x64) which the latest commit markstodoIfas a pre-existing WebKit issue.
…test Addresses review feedback on #29587: - Security (CWE-377): the content-hashed tmpfile lived directly in /tmp, where any local user could pre-create `.bun-{hash}.{ext}` at the known name and plant a malicious shared library — the size-only reuse check would then hand that file to dlopen(). Extraction now goes into `{tmpdir}/bun-{uid}/`, which we create with mode 0700 (POSIX) and verify on every use (owner == uid, (mode & 0077) == 0, is-a-directory). Refuses to extract if the subdir can't be trusted. Opened with O_NOFOLLOW so a racing symlink can't redirect us. - Stale cache: the fast path returned the cached path without checking it still existed. After systemd-tmpfiles swept the extracted file, every subsequent dlopen() in that process would fail with ENOENT for the rest of its lifetime. The fast path now stat()s the cached path and, on ENOENT, clears the cache and falls through to re-extract. - Test hermeticity: the test read and rm'd files in the shared /tmp, which could race with concurrent runs or unrelated processes that happened to have a byte-identical .so there. It now points the spawned binary at a per-test `BUN_TMPDIR` so extraction stays sandboxed, and walks one level into that dir to find the `bun-{uid}/` subdir. - Test: drop the brittle `expect(cc.stderr).toBe("")` — gcc/clang/ld can emit benign notes on success. Match the existing precedent in addr32.test.ts / garbage-env.test.ts (exit code only). - Test: add a third run that deletes the extracted file mid-test to prove the cache self-heals.
The `readdirSync(tmpdir).toBeEmpty()` check was verifying that BunProcess.cpp's `deleteAfter` successfully unlinked the extracted .node. #29587 drops `deleteAfter` on POSIX because the file is now content-hashed and cached — so an empty tmpdir isn't the right assertion anymore. My replacement (uniqueness of content hashes per extracted file) was redundant with test/regression/issue/29585.test.ts, which already exercises the dedup logic end-to-end. Just drop the FS-state check here. The existing `stdout === "hello world!"` assertion is enough — it proves the compiled binary loaded its embedded addons successfully. The dedup-correctness is covered by 29585.test.ts.
4630301 to
d74a22d
Compare
…n-arm (unrelated pre-existing flake)
On Windows AArch64 this test times out for 20s × 4 runner-internal
retries because req.once('close') doesn't fire within the deadline
after controller.abort(). It's a genuine Bun bug on that one platform
but orthogonal to #29585 — confirmed failing on ≥4 independent PR head
shas (#29587 builds 51292, 51519; #30219 build 50805; #30224 build
50951) always on :windows: 11 aarch64 test-bun shard 7/8, all 4
retries timing out identically.
Per the WINDOWS-AARCH64 + FLAKY precedent already in expectations.txt
(line 6), add the test to the skip list scoped to WINDOWS-AARCH64 so
it stops gating unrelated PRs. The underlying abort→close race is
tracked separately; this entry can come out when the Windows IOCP
path learns to emit 'close' in time.
1. test/js/bun/test/parallel/test-http-should-emit-close-when-connection-is-aborted.ts —
times out on all three Windows variants (x64, x64-baseline, aarch64)
waiting for req.once('close') after controller.abort(). Widen the
expectations.txt skip from WINDOWS-AARCH64 to WINDOWS since #51535
showed the flake hits non-arm shards too.
2. test/cli/install/bun-install-registry.test.ts peerTests loop —
:3659 intermittently reads a cached a-dep@1.0.9 instead of the
fresh-hoisted peer-resolved version under local-registry contention.
Hit on unrelated PRs #30219 (build 50805, recovered on retry #4),
#30224 (build 50951, hard-failed 4/4), and #29587 (build 51535,
hard-failed 4/4). Matches the existing todoIf(isFlaky && isMacOS &&
situation === 'peer ^1.0.2') on the same loop — widen to also include
isFlaky && isWindows. Precedent at :4222 in the same file already
does this for 'text lockfile is hoisted'.
Both are real pre-existing Bun bugs orthogonal to embedded-native-module
dedup — filed here only because they're blocking this PR's CI green.
|
Linking #30962 — same root cause, minimal repro (no Workers required). The fix on this branch (content-hashed filename + per- |
Jarred-Sumner
left a comment
There was a problem hiding this comment.
undo the .zig changes. just do the .rs changes. can you come up with a simpler solution? can we avoid this extract dir? can we just do 1 file?
d25569f to
480a174
Compare
|
Done. Reset to a minimal -only diff on top of main and simplified the design:
Global Also drops the POSIX post-dlopen unlink in Latest: 480a174. |
ddafb01 to
a997aa2
Compare
There was a problem hiding this comment.
No further findings from me — all prior issues are resolved and the final design looks coherent. Given this rewrites the tempfile/dlopen() extraction path with CWE-377 hardening (uid/ownership checks, lstat-based symlink defense, shared /tmp squatting fallback), it's worth a human pass on the security model before merge.
Extended reasoning...
Overview
This PR rewrites resolve_embedded_file_to_buf in src/runtime/jsc_hooks.rs (~200 lines net) to deduplicate extracted embedded native modules from bun build --compile binaries via content-hashed filenames in tmpdir, fixing a per-dlopen() /tmp leak (#29585). It also adds an extracted_path cache field on StandaloneModuleGraph::File, removes the now-counterproductive POSIX post-dlopen unlink in BunProcess.cpp, adjusts napi.test.ts accordingly, and adds a Linux regression test.
Security risks
The core change creates predictably-named files in shared /tmp and hands the resulting path to dlopen(). The review cycle surfaced and resolved four 🔴 findings in this area: a CWE-377 predictable-tempfile RCE (initial design), a stale-cache ENOENT regression, a squatted-subdir DoS, and a TOCTOU symlink-follow on the warm path. The final design mitigates these via uid-in-filename, lstat/lstatat (no-follow) ownership+ISREG+size validation on both fast and warm paths, atomic scratch-write+rename, and a scratch-path fallback when the canonical name is squatted. The residual threat model (shared sticky-bit /tmp, attacker with read access to the compiled binary) is non-trivial and the defenses are layered — exactly the kind of thing that benefits from human security review.
Level of scrutiny
High. This is production-critical native-module-loading code where a mistake means either RCE (load attacker's .so) or DoS (embedded modules fail to load). The PR went through ~25 review iterations including a full redesign (per-user 0700 subdir → flat uid-prefixed filename), which itself involved security tradeoffs that a human should validate.
Other factors
All my prior inline comments (and CodeRabbit's) are resolved; the bug-hunting system found nothing on the current revision. Test coverage is reasonable (Linux-only regression test exercising 10× dlopen, cross-restart reuse, and tmpfiles-sweep self-heal), though the Worker-concurrency test is gated to debug builds due to an unrelated pre-existing shutdown race. The unsafe { &mut *graph } aliasing pattern is acknowledged as a pre-existing codebase-wide convention, not new to this PR.
1da5c2c to
444d305
Compare
There was a problem hiding this comment.
No remaining findings from my side — all prior rounds are addressed. Given this rewrites the embedded-library extraction path with CWE-377 hardening (predictable tempfile names, uid/symlink checks, dlopen target selection) and drops the POSIX post-dlopen unlink from #19550, it's worth a human pass on the threat model and the Windows behavior (which the regression test doesn't cover).
Extended reasoning...
Overview
This PR fixes #29585 by replacing per-call random tempfile extraction of embedded .so/.node/.dylib files in bun build --compile binaries with a content-hashed, deduplicated path. It touches:
src/runtime/jsc_hooks.rs(~230 lines net): rewritesresolve_embedded_file_to_bufwith a three-tier fast/warm/cold path, a process-global mutex, lstat-based ownership/size validation, scratch-write + atomic rename, and a squatted-path fallback. Adds helpersformat_canonical_name,write_absolute,path_to_nul_boxed,extract_owner_uid.src/standalone_graph/StandaloneModuleGraph.rs: adds a mutableextracted_path: Option<Box<[u8]>>field to the sharedFilestruct.src/jsc/bindings/BunProcess.cpp: removes the POSIX post-dlopenunlink (the #19550 fix), gatesdeleteAfter/tryToDeleteIfNecessaryto Windows-only, drops theBun__unlinkextern.test/napi/napi.test.ts: removes the "tmpdir is empty after run" assertion (semantics changed — extracted files now intentionally persist).test/regression/issue/29585.test.ts: new Linux-only regression test (the Worker variant is further gated to debug builds only due to a pre-existing release-build shutdown race).
Security risks
This is squarely security-sensitive code. The PR went through ~25 review iterations over six weeks, with multiple design resets, specifically because of CWE-377 (predictable tempfile) concerns:
- The extracted path is handed directly to
dlopen(), so a wrong reuse decision is arbitrary code execution. - The canonical filename
.bun-{euid}-{wyhash(contents)}.{ext}is fully predictable to a local attacker who can read the compiled binary. - The defense rests on
lstat/lstatat+st_uid == geteuid()+S_ISREG+ size match, plus sticky-bit semantics for the rename-over-squatter fallback. Each of these was the subject of at least one resolved finding (symlink-followingfstatat, real-vs-effective uid, fast-path bypass after tmpfiles sweep, etc.). - On Windows, the
ourscheck istrueunconditionally — the threat model there relies on%TEMP%being per-user, which is conventional but not enforced by this code. - The
unsafe { &mut *graph }pattern writes through a shared singleton; the lock serialises only callers of this function, which the SAFETY comment now correctly scopes.
I'm not aware of a remaining concrete exploit path in the final design, but the reasoning is subtle enough that I shouldn't be the only reviewer.
Level of scrutiny
High. This is production runtime code on the dlopen() path of every compiled binary that embeds a native module, with a documented threat model involving multi-user /tmp. It also reverses a prior deliberate decision (#19550's immediate unlink) and weakens an existing test (napi.test.ts no longer asserts cleanup). The new regression test doesn't exercise macOS or Windows, and the Worker test is debug-only.
Other factors
- All ~25 of my prior inline findings are resolved, and the current bug-hunting pass found nothing.
- The PR went through a full design reset (480a174) midway, dropping the per-user-subdir approach for the simpler flat-file design — the diff is now reasonably compact.
- CI is building on the latest commit; I haven't verified it's green across all targets.
- No human reviewer appears to have weighed in at any point in the timeline.
|
CI note: the red on build 61227 is |
|
CI note (build 61227, continued): besides the repo-wide bunx/Angular breakage, one shard (debian 13 aarch64, shard 0) had |
bun build --compile binaries dlopen() embedded .so/.node files through resolve_embedded_file_to_buf. The previous implementation called FileSystem::tmpname() each time, which appends a nanoTimestamp and a per-VM counter — so every dlopen() leaked a fresh /tmp file (#29585 reported ~14 GiB/hour on an opencode + ripgrep-workers deployment). Dedupe onto a single content-hashed path {tmpdir}/.bun-{uid}-{hash}.{ext}: - Fast path: cached absolute path on StandaloneModuleGraph::File. lstat-verified each call (uid + IFREG + size) so a tmpfiles sweep or attacker replacement re-extracts instead of being handed off to dlopen. - Warm path: no cache yet, but the content-hashed canonical file from a prior run is still on disk — lstatat-verified (no symlink follow) and reused without rewriting. - Cold path: write to a unique scratch name, rename(2) to canonical. Atomic on POSIX and Windows (MoveFileEx); concurrent Workers converge on the same content-hashed name. If rename fails (adversarial: another user owns a squatted destination on sticky /tmp), fall back to the scratch path — correct, with within-process dedup via the cached scratch path but no cross-restart dedup until the squatter releases the canonical name. A global Mutex serialises the lazy init of File.extracted_path across Worker VMs that share the graph. Drop the POSIX post-dlopen unlink in BunProcess.cpp (#19550) — the content-hashed tmpfile is supposed to persist for the dedup to work. Windows keeps delete-on-reboot for hygiene (NTFS can't unlink an in-use file).
444d305 to
f54d3e1
Compare
Fixes #29585.
Repro
A
bun build --compilebinary thatdlopen()s an embedded.so/.dylib(imported with{ with: { type: "file" } }) leaked a fresh copy to/tmpper call, never cleaned up.Workers amplify it because each Worker VM starts with its own
tmpname_id_numbercounter at 0, so every Worker extracts on its firstdlopen. Reporter's server saw ~14 GB/hour on Linux.Cause
ModuleLoader.resolveEmbeddedFilepicked a per-call unique tmpfile name:tmpnamemixesnanoTimestamp()with a per-VM counter, so each call produced a fresh path. Nothing tracked or unlinked the file.Fix
.bun-{bun.hash(file.contents)}.{ext}. Same content → same path → idempotent across dlopens, Workers, and process restarts.fstatat; if a file exists with the expected size, use it.rename(2)into the content-hashed destination. Concurrent workers writing at the same time still end up with one valid file — same content, so whichever rename wins is correct.StandaloneModuleGraph.Filenow holdsextracted_path: ?[:0]const u8. Shared across Workers becausestandalone_module_graphis shared by pointer (seeVirtualMachine.initWorker). A process-wide mutex guards the lazy init.Verification
5
dlopens + 5 Workers (eachdlopenonce) × 2 runs:Test at
test/regression/issue/29585.test.tsfails pre-fix withExpected: 1 / Received: 10and passes post-fix.