fix(ffi): defer JSBigInt allocation for threadsafe JSCallback i64/u64 args to the JS thread#30165
fix(ffi): defer JSBigInt allocation for threadsafe JSCallback i64/u64 args to the JS thread#30165robobun wants to merge 8 commits into
Conversation
WalkthroughThis PR extends Bun's FFI system to support thread-safe callbacks with proper JavaScript BigInt handling. Thread-safe callbacks now receive an additional ABI type descriptor array, allowing the JavaScript thread to decode 64-bit integer arguments as BigInt values rather than allocating them on the OS thread, and includes test coverage for mixed-type callback scenarios. ChangesThreadsafe FFI Callbacks with BigInt Support
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Status — ready for maintainer review/merge; remaining CI red is fleet-wide, not this PRThe diff is green. ✅ Gate: without the fix, off-thread Rebases: #30412 (Rust rewrite — fix ported to the Rust codegen), #31116 (clippy), #31332 (threadsafe wrapper refcounting — merged with this PR's CI, build 60254 (head 8363a37): all 15 red jobs are the same single test — I've already used my single CI re-roll; not pushing further retriggers. A maintainer can merge over the fleet-wide red or re-run once the bunx test is fixed on main. |
|
Found 2 issues this PR may fix:
🤖 Generated with Claude Code |
1b6dd85 to
d4121bd
Compare
8ad2195 to
493b57a
Compare
There was a problem hiding this comment.
All my earlier feedback has been addressed and I have nothing further to flag, but given this changes the threadsafe-callback ABI and touches off-thread → JS-thread GC-heap handoff (plus the user-visible JSCallback constructor now throwing on validation errors), it'd be good to have a human who owns bun:ffi sign off.
Extended reasoning...
Overview
This PR fixes a real crash: threadsafe JSCallback trampolines with int64_t/uint64_t/i64_fast/u64_fast args were calling JSBigInt::createFrom on whatever OS thread invoked the callback, corrupting the JSC GC heap. The fix defers the BigInt allocation to the posted task on the JS thread by passing raw 64-bit bits plus a per-arg ABIType table through a new 4-arg FFI_Callback_threadsafe_call signature. It spans FFI.h (TCC header), ffi.zig (codegen + symbol binding), JSFFIFunction.cpp (decode on JS thread), bindings/FFI.zig (translated-C mirror), ffi.ts (JSCallback ctor), two regenerated fixture snapshots, and a new pthread-driven integration test.
Along the way it also fixes two pre-existing bugs: the dead "Threadsafe functions must return void" guard (read function.threadsafe instead of the local) and JSCallback's constructor swallowing native validation errors. It also tightens UINT64_TO_JSVALUE to use <= MAX_INT52 so u64_fast matches i64_fast at the boundary.
Security risks
None identified. No new attack surface; inputs are the same FFI arg types as before, just marshalled differently. The argTypes array is generated by Bun's own codegen (a static const baked into the TCC trampoline), not user-controlled at the C++ boundary.
Level of scrutiny
High. This is cross-thread code interacting with the JSC GC heap, with an ABI change coordinated across Zig codegen, a TCC-compiled C header, a C++ extern, and a Zig extern decl — four places that must agree on the new 4-arg signature. The FFIABIType enum values in C++ are hand-mirrored from Zig's ABIType (verified: 7/8/15/16 match). The JSCallback constructor now throws where it previously returned { ptr: undefined }, which is a user-visible behavior change (correct, and matches dlopen/linkSymbols, but still a change). There's also a known textual conflict with #28115 in JSFFIFunction.cpp.
Other factors
I reviewed this across three prior passes; each round surfaced an issue (the dead guard → undeclared _FFI_Callback_call, the u64_fast </<= divergence, the void-vs-ZIG_REPR_TYPE decl mismatch, two test-convention nits) and the author addressed all of them. The new test compiles a real shared library and invokes callbacks from a pthread, which is exactly the right shape to gate this regression. The fixture-snapshot diffs are mostly unrelated drift from main per the description. Nothing left to flag from my side, but the combination of ABI change + cross-thread GC + behavior change puts this outside what I'd shadow-approve.
493b57a to
6e853fb
Compare
944a5db to
94f2c62
Compare
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
src/bun.js/bindings/GeneratedBindings.zig:1-6— Commit 944a5db accidentally adds two stale codegen artifacts —src/bun.js/bindings/GeneratedBindings.zigandsrc/bun.js/bindings/GeneratedJS2Native.zig(~780 lines) — at the pre-restructure path that.gitignoredoesn't cover. The build only imports these fromsrc/jsc/bindings/(where they're gitignored and regenerated), so these copies are dead code unrelated to the FFI fix; pleasegit rmboth files and optionally extend.gitignoreto also coversrc/bun.js/bindings/Generated*.zig.Extended reasoning...
What
Commit
944a5db("doc: update ABIType path reference after src/ restructure") was meant to be a one-line comment fix inJSFFIFunction.cpp, but it also adds two brand-new files:src/bun.js/bindings/GeneratedBindings.zig(391 lines)src/bun.js/bindings/GeneratedJS2Native.zig(388 lines)
Both diff hunks start with
--- /dev/null, i.e. these files did not exist onmainbefore this PR.Why they're stale and at the wrong path
These are codegen outputs. The codegen writes them under the post-restructure path
src/jsc/bindings/:src/codegen/bindgen.ts:1591writessrc/jsc/bindings/GeneratedBindings.zigsrc/codegen/bundle-modules.ts:484writes../jsc/bindings/GeneratedJS2Native.zig
…and the build imports them from there:
bun.zig:209,214does@import("./jsc/bindings/GeneratedBindings.zig")/@import("./jsc/bindings/GeneratedJS2Native.zig").Accordingly,
.gitignore:128-129ignores exactly:src/jsc/bindings/GeneratedJS2Native.zig src/jsc/bindings/GeneratedBindings.zigThe two files committed here live under the old
src/bun.js/bindings/path, which is not in.gitignore.git check-ignoreconfirms:src/jsc/bindings/GeneratedBindings.zigis ignored,src/bun.js/bindings/GeneratedBindings.zigis not. So when the author rangit add, the stray copies at the legacy path were picked up.Why nothing catches this
A repo-wide grep finds zero references to
src/bun.js/bindings/GeneratedBindings.zigorsrc/bun.js/bindings/GeneratedJS2Native.zig— nothing imports them, nothing reads them, the build never touches them. They are pure dead code that will silently rot as the canonical generated files (undersrc/jsc/bindings/) evolve, while bloating an FFI threadsafe-callback PR with ~780 lines of unrelated bindgen output.Step-by-step proof
.gitignore:128-129listssrc/jsc/bindings/Generated{JS2Native,Bindings}.zigonly.git check-ignore -v src/bun.js/bindings/GeneratedBindings.zig→ no match (not ignored).git log --diff-filter=A -- src/bun.js/bindings/GeneratedBindings.zig→ added in944a5db(this PR's HEAD).bun.zig:209→pub const GeneratedBindings = @import("./jsc/bindings/GeneratedBindings.zig");— the build looks atsrc/jsc/bindings/, notsrc/bun.js/bindings/.rg -l 'bun.js/bindings/Generated'→ no hits anywhere in the tree.- Therefore the two new files at
src/bun.js/bindings/are unreferenced codegen artifacts accidentally staged because the legacy path isn't gitignored.
Impact
No runtime effect — the files are never compiled or read. But they (a) add ~780 lines of noise to a focused FFI fix, (b) will diverge from the real generated bindings over time and confuse anyone who greps for the symbols they define, and (c) leave a trap for the next person whose checkout has stale codegen at the old path.
Fix
git rm src/bun.js/bindings/GeneratedBindings.zig src/bun.js/bindings/GeneratedJS2Native.zig
Optionally also append to
.gitignoreso this can't recur:src/bun.js/bindings/GeneratedJS2Native.zig src/bun.js/bindings/GeneratedBindings.zig
94f2c62 to
9409be4
Compare
4bc5ad0 to
26dd8b6
Compare
There was a problem hiding this comment.
All of my prior feedback has been addressed (including the INT64_TO_JSVALUE int32 bound in 1ab0231) and I have nothing further to flag — deferring final sign-off to a human given the cross-language ABI change to FFI_Callback_threadsafe_call and the thread-safety-critical codegen it touches.
Extended reasoning...
Overview
This PR defers JSBigInt allocation for int64_t/uint64_t/i64_fast/u64_fast arguments of threadsafe: true JSCallbacks from the TCC-generated trampoline (which may run on an arbitrary OS thread) to the task posted to the JS thread. It does so by changing the FFI_Callback_threadsafe_call ABI to accept a per-argument uint8_t argTypes[] table, emitting that table from the codegen, and decoding raw 64-bit bits in a new decodeThreadsafeCallbackArgument on the JS thread. The change is mirrored across four languages: the embedded C header (FFI.h), the Zig codegen (ffi.zig), two Rust codegen modules (ffi_body.rs, host_fns.rs, abi_type.rs), the C++ task runner (JSFFIFunction.cpp), and the translated-C Zig mirror (src/jsc/FFI.zig). It also fixes two pre-existing bugs (the dead threadsafe && return_type != Void guard, and JSCallback swallowing native-side errors) and tightens two off-by-one boundary comparisons in {U,}INT64_TO_JSVALUE. New tests compile a shared library that invokes callbacks from a real pthread.
Security risks
None identified. The change is internal to bun:ffi's callback marshalling; it does not touch auth, crypto, permissions, or untrusted-input parsing. The new codegen only runs for user-constructed JSCallbacks and the argTypes table is emitted from already-validated ABIType enum values.
Level of scrutiny
High. This is thread-safety-critical code at the JSC GC boundary, and it changes an extern "C" function signature that must agree across TCC-emitted C, the C++ definition, a Zig extern, and two Rust extern blocks. It also introduces a user-visible behavioral change (new JSCallback now throws on validation errors instead of silently producing ptr: undefined). The fix is well-reasoned and well-tested, but the blast radius of a mistake here is heap corruption, so a human should own the merge.
Other factors
This PR has been through six review iterations with the bug-hunting system; every finding (the _FFI_Callback_call regression, the u64_fast/MAX_INT52 and i64_fast/MAX_INT32 boundary divergences, the void return-type mismatch, the stale path comment, the clashing_extern_declarations lint, and two test-convention nits) has been addressed by the author, the most recent in commit 1ab0231. CI shows only unrelated package-manager flakes. The PR description notes an expected textual conflict with #28115 in JSFFIFunction.cpp, which is one more reason for a human to coordinate the merge.
6a30613 to
93896c5
Compare
… args to the JS thread
The TCC-generated trampoline for `new JSCallback(fn, { threadsafe: true })`
converted int64_t / uint64_t / i64_fast / u64_fast arguments to JSValues
via INT64_TO_JSVALUE_SLOW / UINT64_TO_JSVALUE_SLOW before handing them to
FFI_Callback_threadsafe_call. Those helpers call JSBigInt::createFrom,
which heap-allocates a JS cell. Since the trampoline runs on whatever OS
thread invoked the callback, this allocated on a non-JS thread without the
JS lock, corrupting the GC heap.
For threadsafe callbacks the trampoline now passes the raw 64-bit value
for these argument types along with a per-argument ABIType table, and
FFI_Callback_threadsafe_call performs the BigInt / number conversion
inside the task it posts to the JS thread.
The "Threadsafe functions must return void" validation in
generateSymbolForFunction read function.threadsafe (the out-param,
still at its default false) instead of the local threadsafe variable,
so the guard never fired. Combined with the previous commit moving
_FFI_Callback_call behind an #else branch, this would have turned
`new JSCallback(fn, { threadsafe: true, returns: 'int32_t' })` from
silently-broken into a TCC compile error.
Also make the JSCallback constructor surface validation errors from
nativeCallback by throwing them (matching dlopen/linkSymbols), instead
of destructuring `{ctx, ptr}` out of the returned Error instance and
leaving both undefined.
…e callback task JSBigInt::createFrom declares a throw scope and can throw on OOM. The posted task in FFI_Callback_threadsafe_call called it in a loop without an enclosing scope or exception check, tripping JSC's exception-scope validator on the asan lane before profiledCall.
…eadsafe decl UINT64_TO_JSVALUE in FFI.h used strict < MAX_INT52 where INT64_TO_JSVALUE used <=, so a u64_fast callback received 9007199254740991 as a BigInt while an i64_fast callback received it as a number. Since that value is Number.MAX_SAFE_INTEGER it should be a number in both. This also brings the non-threadsafe u64_fast path in line with the threadsafe decode path added in this PR. Also: - declare the IS_THREADSAFE FFI_Callback_call prototype as void to match FFI_Callback_threadsafe_call's actual signature (threadsafe callbacks always return void so the trampoline never reads a return value). - hoist the bun:ffi import in the test to module scope per test/CLAUDE.md.
host_fns.rs). The trampoline generator there was a faithful port of the original Zig, including the two bugs this PR fixes: - generate_symbol_for_function read function.threadsafe (the not-yet-assigned out-param, always false) instead of the local, so the 'Threadsafe functions must return void' guard never fired. - print_callback_source_code converted int64_t / uint64_t / i64_fast / u64_fast arguments via *_TO_JSVALUE_SLOW in the trampoline, allocating a JSBigInt off the JS thread. Apply the same fix as the Zig/C++ side: for threadsafe callbacks emit IS_THREADSAFE, pass raw 64-bit bits for those arg types plus a static argTypes[] table, and call the 4-arg FFI_Callback_call so FFI_Callback_threadsafe_call can do the BigInt conversion on the JS thread. Also add ABIType::may_allocate_bigint_when_converted_to_js and update the Rust extern decl to match the C++ signature.
MAX_INT32 is 2^31, so 'val <= MAX_INT32' let exactly 2147483648 through the (int32_t) cast and wrap to -2147483648. Use a strict < like UINT64_TO_JSVALUE already does, so the non-threadsafe i64_fast path agrees with the threadsafe decode added in this PR.
93896c5 to
8363a37
Compare
There was a problem hiding this comment.
All 8 prior findings are addressed and this pass found nothing new — looks correct to me, but given it changes the threadsafe-callback ABI and coordinates cross-thread JSC heap allocation across the C++/Zig/Rust codegens, I'd like a human FFI owner to give the final sign-off.
Extended reasoning...
Overview
This PR fixes off-thread JSBigInt allocation in bun:ffi threadsafe JSCallback trampolines by deferring the 64-bit-int → JSValue conversion to the JS thread. It touches 12 files across four language layers: the TCC-compiled C header (FFI.h), the C++ task-posting path (JSFFIFunction.cpp), and the parallel Zig (ffi.zig) and Rust (ffi_body.rs / host_fns.rs / abi_type.rs) callback codegens, plus the JS wrapper (ffi.ts) and a new pthread-based integration test. It also changes the FFI_Callback_threadsafe_call extern signature (3→4 args, EncodedJSValue→void), fixes a dead validation guard, makes JSCallback throw on native validation errors, and tightens two pre-existing boundary off-by-ones in {U,}INT64_TO_JSVALUE.
Security risks
None identified. The change is internal to the FFI marshalling path; no new untrusted input parsing, auth, or privilege surfaces. The user-facing behavior change (new JSCallback now throws on invalid options instead of silently yielding ptr: undefined) is a strict tightening.
Level of scrutiny
High. This is concurrency-correctness code at the JSC-GC / native-thread boundary, with an ABI change that must stay in lockstep across C++, Zig, two Rust codegens, and the TCC-emitted C. A mismatch in any one of those would corrupt memory or break callback dispatch. The PR has been rebased across three substantial upstream rewrites (#30412 Rust port, #31332 wrapper refcounting, #31116 clippy), so the merge surface is non-trivial. That's beyond what I'm comfortable shadow-approving without a human FFI/JSC owner confirming the cross-language wiring and the #31332 merge.
Other factors
All eight inline findings I raised over previous revisions (the _FFI_Callback_call regression, dead function.threadsafe guard, u64_fast/i64_fast boundary divergences, void return-type decl, stale path comment, mod.rs extern clash, test-convention nits) have been fixed and resolved; the current bug-hunting pass found nothing. The new test exercises real pthreads with all four 64-bit ABI types plus a mixed-arg case and forces a full GC, and CI is reported green. I have not previously posted a top-level review verdict on this PR — only inline findings — so this is my first summary.
Problem
new JSCallback(fn, { args: ['int64_t'], threadsafe: true })generates a TCC trampoline that converts each native argument to aJSValueand then callsFFI_Callback_threadsafe_call, which posts a task to the JS thread. Forint64_t/uint64_t/i64_fast/u64_fastarguments, the conversion goes throughINT64_TO_JSVALUE_SLOW/UINT64_TO_JSVALUE_SLOW, which callJSBigInt::createFrom— a heap allocation on the JSC GC heap.The trampoline runs on whatever OS thread invoked the callback. When that is not the JS thread,
JSBigInt::createFromallocates without holding the JS lock, corrupting MarkedBlock free lists / GC state.Repro
test/js/bun/ffi/ffi-threadsafe-callback-bigint.test.tscompiles a tiny shared library that invokes a threadsafeJSCallbackwith a 64-bit integer argument from a realpthread. On a debug build without this fix:— the off-thread allocation produced an invalid cell.
Fix
For
threadsafe: truecallbacks only, the generated trampoline now:int64_t/uint64_t/i64_fast/u64_fastarguments instead of calling*_TO_JSVALUE_SLOW, andstatic const uint8_t argTypes[N]table and passes it as a fourth argument toFFI_Callback_call.FFI_Callback_threadsafe_callcopies the raw values and the type table into the posted task and performs theJSBigInt/jsNumberconversion there, on the JS thread. All other argument types (whose*_TO_JSVALUEencodings are immediate and never heap-allocate) are unchanged. Non-threadsafe callbacks are unchanged.Example of the new trampoline for
{ args: ['int32_t','int64_t','uint64_t','double'], threadsafe: true }:Verification
bun bd test test/js/bun/ffi/ffi-threadsafe-callback-bigint.test.tsASSERTION FAILED: isSymbol() || isHeapBigInt()inJSCell::toStringSlowCaseint64_t/uint64_t/i64_fast/u64_fast(big & small) and a mixed-arg callback deliver the expected valuesExisting
run ffithreadsafe-callback and non-threadsafe-callback tests (all arg types, both normal and fast-int variants) pass unchanged.The
ffi.test.fixture.{callback,receiver}.csnapshot files are regenerated byffi.test.jsfrom the currentFFI.h; most of their diff is unrelated drift that had accumulated onmain.Rebase after #31332 (
ffi: avoid copying the threadsafe callback wrapper on the calling thread): that commit rewrote the sameFFI_Callback_threadsafe_callthis PR extends (refcounted wrapper, cachedScriptExecutionContextIdentifier,Ref{wrapper}capture). The merged version keeps main's wrapper-lifetime handling and adds this PR'sargTypesmarshalling, throw scope, and per-argument BigInt decode inside the posted task. The new foreign-threadcc.test.tscoverage added by #31332 passes together with this PR's tests.Related
#28115 fixes a separate off-thread access inside
FFI_Callback_threadsafe_callitself (copyingJSC::Strong<>). This PR fixes the step that happens in the TCC trampoline before that function is called. There will be a small textual conflict inJSFFIFunction.cppwhen both land.While here, also fixed two pre-existing bugs this change would have otherwise exposed:
"Threadsafe functions must return void"guard ingenerateSymbolForFunctionreadfunction.threadsafe(the not-yet-assigned out-param, alwaysfalse) instead of the localthreadsafe, so it never fired.JSCallback's constructor destructured{ ctx, ptr }directly out ofnativeCallback's result without checking for an error return, so every validation error from the native side was silently swallowed intoptr: undefined. It now throws, matchingdlopen/linkSymbols.Fixes #24529
Rebase after #30412 (Rewrite Bun in Rust): the FFI callback codegen was rewritten in Rust (
src/runtime/ffi/{ffi_body.rs,host_fns.rs,abi_type.rs}) as a faithful port of the original Zig — including both bugs this PR fixes. Commit 9409be4 applies the same fix to the Rust codegen: the deadthreadsafe && return_type != Voidguard, theIS_THREADSAFEdefine, raw-bit marshalling for 64-bit int args, theargTypes[]table, and the 4-argFFI_Callback_callsignature. The Zig changes are retained (the file is still compiled) for parity.