prove: SSLConfig intern/deref race observability (do NOT merge)#27863
Closed
cirospaciari wants to merge 8 commits into
Closed
prove: SSLConfig intern/deref race observability (do NOT merge)#27863cirospaciari wants to merge 8 commits into
cirospaciari wants to merge 8 commits into
Conversation
Add scoped debug logs (BUN_DEBUG_SSLConfig=1) to GlobalRegistry.intern() and destroy() to observe the race condition between intern() finding a dying config and ref()'s debugAssert catching a 0->1 resurrection. The test exercises the proxy+TLS path with keepalive:false and no ca (so requires_custom_request_ctx=false, meaning the SSL context cache does NOT hold an extra ref). This makes the refcount hit 0 on every wave completion. Logs show: - 'intern: found existing 0xABC, refcount=N before ref' — if N is ever 0, that's the 0->1 resurrection bug (debugAssert panics in debug builds, ASAN catches the UAF in release+ASAN builds) - 'destroy 0xABC: strong reached 0, freeing' — proves refcount hits 0 On macOS the race window is too narrow to trigger naturally. On Linux CI (especially linux-x64-asan with different thread scheduling), the race is more likely to manifest. See: #27838
Collaborator
|
Updated 5:06 PM PT - Mar 6th, 2026
❌ @cirospaciari, your commit f42fa72 has 2 failures in
🧪 To try this PR locally: bunx bun-pr 27863That installs a local version of the PR into your bun-27863 --bun |
Standalone C program that reproduces the exact same race pattern as SSLConfig.GlobalRegistry — shared map protected by mutex, atomic refcount, deref→destroy without checking if intern() just resurrected the entry. Build and run with ASAN: cc -fsanitize=address -pthread -O1 -o race_repro sslconfig-race-repro.c ./race_repro ASAN reports heap-use-after-free on EVERY run: Thread T1: READ of interned->data (like ProxyTunnel reading tls_props) freed by Thread T2: config_destroy (like SSLConfig.destroy) The race: thread A does fetchSub(1)→0→destroy()→free, while thread B does intern()→find existing→ref() 0→1→returns dangling pointer. This is the exact pattern causing the segfault at 0x0 in production.
Workers share the static SSLConfig.GlobalRegistry across threads. Multiple workers doing serial proxy+TLS fetches with keepalive:false creates real cross-thread contention between intern() (worker JS threads) and deref() (worker HTTP threads). On ASAN builds, if the 0->1 resurrection race triggers, ASAN catches the heap-use-after-free — same pattern proven by sslconfig-race-repro.c.
8 workers x 20 waves x 16 concurrent fetches per wave = 2560 total. Concurrent batches maximize overlap between deref() (HTTP threads) and intern() (JS threads) across workers sharing the GlobalRegistry.
cirospaciari
added a commit
that referenced
this pull request
Mar 6, 2026
The CI test never triggered the race (window too narrow, workers naturally synchronize). This debug-only script DOES reproduce it 3/3. Key insight: BUN_DEBUG_SSLConfig=1 enables scoped logging in deref() and destroy(), widening the race window from ~10 CPU cycles to 100μs+ via stderr writes. Driver worker cycles refcount through 0; probe workers spam intern via fetch+abort in setImmediate loops. See script header comment for full repro steps.
cirospaciari
added a commit
that referenced
this pull request
Mar 7, 2026
Non-deterministic by nature (race window is ~10 CPU cycles in release), but catches the race ~60% on debug+ASAN without any special env vars due to debug.deinit() in ref_count.zig widening the window. Driver worker cycles refcount through 0 via serial proxy+TLS fetches. Probe workers spam intern() via fetch+abort in setImmediate loops. For deterministic repro use BUN_DEBUG_SSLConfig=1 — see #27863.
Jarred-Sumner
pushed a commit
that referenced
this pull request
Mar 8, 2026
…up (#27838) ## Problem Segfault at address `0x0` in `create_ssl_context_from_bun_options` during proxy tunnel setup (Bun v1.3.11, linux x86_64_baseline). The crash is `strlen(NULL)` on a freed cert/key string. ## Root cause `SSLConfig.GlobalRegistry` is a dedup cache keyed by TLS option content. It stored raw pointers without contributing to refcount — a weak cache — but `intern()` called plain `ref()` on what it found, which blindly does `fetchAdd(1)` with only a `debugAssert(old > 0)` (no-op in release). | HTTP thread (`deref`, last holder) | JS thread (`intern`, new fetch same TLS opts) | |---|---| | `fetchSub` 1→0, enters `destroy()` | | | | locks mutex, `getOrPut` finds dying entry (still in map) | | | `ref()` → **0→1** (resurrection, no check in release) | | | unlocks, returns dangling pointer to new fetch | | locks mutex in `remove()`, evicts, unlocks | | | `deinit()` frees cert/key strings | | | `allocator.destroy()` | | | | proxy tunnel builds SSL ctx from freed config → `strlen(NULL)` | Proxy amplifies this: `ProxyTunnel.start()` creates a *second* SSL context from `tls_props` *after* the CONNECT round-trip, giving a multi-millisecond window for another request's teardown to win the race. ## Fix: Arc/Weak split refcounting This is the Rust `Arc<T>`/`Weak<T>` pattern (and C++ `shared_ptr`/`weak_ptr`): two atomic counters instead of one. ### New primitive: `ThreadSafeWeakableRefCount` Added to `src/ptr/ref_count.zig` alongside `ThreadSafeRefCount`. General-purpose, reusable for any type that needs weak references. ```zig strong: atomic u32 // live users; 1->0 calls drop_contents() weak: atomic u32 // weak holders + (1 if strong > 0); 1->0 calls free_memory() ``` The `+1` on weak is the "collective" weak ref held on behalf of all strong refs. It guarantees the struct allocation stays live across the `strong 1→0 → drop_contents()` window, so `upgrade()` is memory-safe as long as *any* weak ref exists. | Method | Does | |---|---| | `ref()` / `deref()` | bump/drop strong; at strong 1→0: `drop_contents()` then drop collective weak | | `weakRef()` / `weakDeref()` | bump/drop weak; at weak 1→0: `free_memory()` | | `upgrade()` | CAS-loop on strong, only increments if currently > 0 — **never revives a dead object** | ### SSLConfig wiring ```zig const RC = bun.ptr.ThreadSafeWeakableRefCount(@this(), "ref_count", dropContents, freeMemory, .{}); fn dropContents(this: *SSLConfig) void { // strong 1->0 GlobalRegistry.remove(this); // while content intact (map eql needs it) this.deinit(); // free strings } fn freeMemory(this: *SSLConfig) void { // weak 1->0 bun.default_allocator.destroy(this); } ``` ### Registry now holds weak refs **`intern()`** 1. `getOrPut` by content. 2. Found existing? `upgrade()` it: - **Success** (strong was > 0): got a real ref. Free `new_config`, return existing. - **Fail** (strong is 0, dying): registry `weakDeref`s the old entry, replaces the slot with `new_config`. 3. Registry `weakRef`s the winning entry. **`remove()`** (called from `dropContents`) - Look up by content hash, check **pointer identity**. If `intern()` replaced our slot while we were blocked on the mutex, the pointer won't match → no-op (intern already dropped our weak ref). - Otherwise evict and `weakDeref`. ### Why `upgrade()` on strong==0 is not a UAF The weak count holds the allocation. Registry holds a weak ref → `weak_count ≥ 1` → struct memory is live. The CAS reads a live atomic. Contents may be garbage (mid-`deinit()`), but `upgrade()` only touches `strong_count` — never the content. This is the **key difference** from a single-counter design: memory safety is compositional (refcounts alone guarantee it), not entangled with mutex ordering. The mutex here only protects map structure and the invariant that entry content is intact while in the map. ### Corrected race | HTTP thread | JS thread | Result | |---|---|---| | strong 1→0, blocked in `remove()` | `upgrade()` fails, `weakDeref` old (2→1), replace slot, `weakRef` new | old: `remove()` ptr-mismatch no-op → `deinit` → collective `weakDeref` (1→0) → free. JS thread has a fresh, valid config. | ## Also - `packages/bun-usockets/src/crypto/openssl.c`: NULL guards before `strlen(content)` in `us_ssl_ctx_use_privatekey_content` and `us_ssl_ctx_use_certificate_chain`. Defense-in-depth — turns a segfault into a clean SSL error if a NULL ever slips through. - `test/js/web/fetch/fetch-proxy-tls-intern-race.test.ts`: stress test firing overlapping waves of 8 concurrent proxy fetches with identical TLS options and `keepalive: false` (forces immediate deref on completion, no keepalive pool masking the race). --- ## Reproduction & verification See #27863 for the full reproduction recipe. With a debug+ASAN build and `BUN_DEBUG_SSLConfig=1` (which widens the race window via stderr logging), the repro script crashes **3/3 on main** and **passes 3/3 on this branch**. **Why it proves the same crash as production:** The production segfault at `openssl.c:1173` and our debug assertion failure at `ref_count.zig:476` are the **same root cause**, caught at different points: ``` [HTTP thread] [JS thread] deref() fetchSub 1→0 ↓ (race window) intern() finds dying config ref() → old_count = 0 ┌────────────────────────────┐ │ DEBUG: assertValid → panic │ ← repro catches here │ RELEASE: silently succeeds │ └────────────────────────────┘ destroy/deinit/free returns dangling *SSLConfig stored in client.tls_props ... proxy CONNECT ... tls_props.?.* → reads freed struct strdup(garbage) → segfault at 0x0 ← production crash ``` In release, `debugAssert(old_count > 0)` is a no-op. `ref()` silently does 0→1, `intern()` returns the dangling pointer, and the crash surfaces ~200μs later when `startProxyHandshake` dereferences the freed struct. Same race, same dangling pointer — debug catches it at the source, release at the symptom. **On this branch:** `upgrade()` uses a CAS loop that refuses the 0→1 bump. When it fails, `intern()` replaces the map slot with a fresh config instead of resurrecting the dying one. The repro logs show many clean `deref 1 - 0` to different addresses (fresh allocations) with zero `ref 0 - 1`. --------- Co-authored-by: Claude <noreply@anthropic.com>
structwafel
pushed a commit
to structwafel/bun
that referenced
this pull request
Apr 25, 2026
…up (oven-sh#27838) ## Problem Segfault at address `0x0` in `create_ssl_context_from_bun_options` during proxy tunnel setup (Bun v1.3.11, linux x86_64_baseline). The crash is `strlen(NULL)` on a freed cert/key string. ## Root cause `SSLConfig.GlobalRegistry` is a dedup cache keyed by TLS option content. It stored raw pointers without contributing to refcount — a weak cache — but `intern()` called plain `ref()` on what it found, which blindly does `fetchAdd(1)` with only a `debugAssert(old > 0)` (no-op in release). | HTTP thread (`deref`, last holder) | JS thread (`intern`, new fetch same TLS opts) | |---|---| | `fetchSub` 1→0, enters `destroy()` | | | | locks mutex, `getOrPut` finds dying entry (still in map) | | | `ref()` → **0→1** (resurrection, no check in release) | | | unlocks, returns dangling pointer to new fetch | | locks mutex in `remove()`, evicts, unlocks | | | `deinit()` frees cert/key strings | | | `allocator.destroy()` | | | | proxy tunnel builds SSL ctx from freed config → `strlen(NULL)` | Proxy amplifies this: `ProxyTunnel.start()` creates a *second* SSL context from `tls_props` *after* the CONNECT round-trip, giving a multi-millisecond window for another request's teardown to win the race. ## Fix: Arc/Weak split refcounting This is the Rust `Arc<T>`/`Weak<T>` pattern (and C++ `shared_ptr`/`weak_ptr`): two atomic counters instead of one. ### New primitive: `ThreadSafeWeakableRefCount` Added to `src/ptr/ref_count.zig` alongside `ThreadSafeRefCount`. General-purpose, reusable for any type that needs weak references. ```zig strong: atomic u32 // live users; 1->0 calls drop_contents() weak: atomic u32 // weak holders + (1 if strong > 0); 1->0 calls free_memory() ``` The `+1` on weak is the "collective" weak ref held on behalf of all strong refs. It guarantees the struct allocation stays live across the `strong 1→0 → drop_contents()` window, so `upgrade()` is memory-safe as long as *any* weak ref exists. | Method | Does | |---|---| | `ref()` / `deref()` | bump/drop strong; at strong 1→0: `drop_contents()` then drop collective weak | | `weakRef()` / `weakDeref()` | bump/drop weak; at weak 1→0: `free_memory()` | | `upgrade()` | CAS-loop on strong, only increments if currently > 0 — **never revives a dead object** | ### SSLConfig wiring ```zig const RC = bun.ptr.ThreadSafeWeakableRefCount(@this(), "ref_count", dropContents, freeMemory, .{}); fn dropContents(this: *SSLConfig) void { // strong 1->0 GlobalRegistry.remove(this); // while content intact (map eql needs it) this.deinit(); // free strings } fn freeMemory(this: *SSLConfig) void { // weak 1->0 bun.default_allocator.destroy(this); } ``` ### Registry now holds weak refs **`intern()`** 1. `getOrPut` by content. 2. Found existing? `upgrade()` it: - **Success** (strong was > 0): got a real ref. Free `new_config`, return existing. - **Fail** (strong is 0, dying): registry `weakDeref`s the old entry, replaces the slot with `new_config`. 3. Registry `weakRef`s the winning entry. **`remove()`** (called from `dropContents`) - Look up by content hash, check **pointer identity**. If `intern()` replaced our slot while we were blocked on the mutex, the pointer won't match → no-op (intern already dropped our weak ref). - Otherwise evict and `weakDeref`. ### Why `upgrade()` on strong==0 is not a UAF The weak count holds the allocation. Registry holds a weak ref → `weak_count ≥ 1` → struct memory is live. The CAS reads a live atomic. Contents may be garbage (mid-`deinit()`), but `upgrade()` only touches `strong_count` — never the content. This is the **key difference** from a single-counter design: memory safety is compositional (refcounts alone guarantee it), not entangled with mutex ordering. The mutex here only protects map structure and the invariant that entry content is intact while in the map. ### Corrected race | HTTP thread | JS thread | Result | |---|---|---| | strong 1→0, blocked in `remove()` | `upgrade()` fails, `weakDeref` old (2→1), replace slot, `weakRef` new | old: `remove()` ptr-mismatch no-op → `deinit` → collective `weakDeref` (1→0) → free. JS thread has a fresh, valid config. | ## Also - `packages/bun-usockets/src/crypto/openssl.c`: NULL guards before `strlen(content)` in `us_ssl_ctx_use_privatekey_content` and `us_ssl_ctx_use_certificate_chain`. Defense-in-depth — turns a segfault into a clean SSL error if a NULL ever slips through. - `test/js/web/fetch/fetch-proxy-tls-intern-race.test.ts`: stress test firing overlapping waves of 8 concurrent proxy fetches with identical TLS options and `keepalive: false` (forces immediate deref on completion, no keepalive pool masking the race). --- ## Reproduction & verification See oven-sh#27863 for the full reproduction recipe. With a debug+ASAN build and `BUN_DEBUG_SSLConfig=1` (which widens the race window via stderr logging), the repro script crashes **3/3 on main** and **passes 3/3 on this branch**. **Why it proves the same crash as production:** The production segfault at `openssl.c:1173` and our debug assertion failure at `ref_count.zig:476` are the **same root cause**, caught at different points: ``` [HTTP thread] [JS thread] deref() fetchSub 1→0 ↓ (race window) intern() finds dying config ref() → old_count = 0 ┌────────────────────────────┐ │ DEBUG: assertValid → panic │ ← repro catches here │ RELEASE: silently succeeds │ └────────────────────────────┘ destroy/deinit/free returns dangling *SSLConfig stored in client.tls_props ... proxy CONNECT ... tls_props.?.* → reads freed struct strdup(garbage) → segfault at 0x0 ← production crash ``` In release, `debugAssert(old_count > 0)` is a no-op. `ref()` silently does 0→1, `intern()` returns the dangling pointer, and the crash surfaces ~200μs later when `startProxyHandshake` dereferences the freed struct. Same race, same dangling pointer — debug catches it at the source, release at the symptom. **On this branch:** `upgrade()` uses a CAS loop that refuses the 0→1 bump. When it fails, `intern()` replaces the map slot with a fresh config instead of resurrecting the dying one. The repro logs show many clean `deref 1 - 0` to different addresses (fresh allocations) with zero `ref 0 - 1`. --------- Co-authored-by: Claude <noreply@anthropic.com>
xhjkl
pushed a commit
to xhjkl/bun
that referenced
this pull request
May 14, 2026
…up (oven-sh#27838) ## Problem Segfault at address `0x0` in `create_ssl_context_from_bun_options` during proxy tunnel setup (Bun v1.3.11, linux x86_64_baseline). The crash is `strlen(NULL)` on a freed cert/key string. ## Root cause `SSLConfig.GlobalRegistry` is a dedup cache keyed by TLS option content. It stored raw pointers without contributing to refcount — a weak cache — but `intern()` called plain `ref()` on what it found, which blindly does `fetchAdd(1)` with only a `debugAssert(old > 0)` (no-op in release). | HTTP thread (`deref`, last holder) | JS thread (`intern`, new fetch same TLS opts) | |---|---| | `fetchSub` 1→0, enters `destroy()` | | | | locks mutex, `getOrPut` finds dying entry (still in map) | | | `ref()` → **0→1** (resurrection, no check in release) | | | unlocks, returns dangling pointer to new fetch | | locks mutex in `remove()`, evicts, unlocks | | | `deinit()` frees cert/key strings | | | `allocator.destroy()` | | | | proxy tunnel builds SSL ctx from freed config → `strlen(NULL)` | Proxy amplifies this: `ProxyTunnel.start()` creates a *second* SSL context from `tls_props` *after* the CONNECT round-trip, giving a multi-millisecond window for another request's teardown to win the race. ## Fix: Arc/Weak split refcounting This is the Rust `Arc<T>`/`Weak<T>` pattern (and C++ `shared_ptr`/`weak_ptr`): two atomic counters instead of one. ### New primitive: `ThreadSafeWeakableRefCount` Added to `src/ptr/ref_count.zig` alongside `ThreadSafeRefCount`. General-purpose, reusable for any type that needs weak references. ```zig strong: atomic u32 // live users; 1->0 calls drop_contents() weak: atomic u32 // weak holders + (1 if strong > 0); 1->0 calls free_memory() ``` The `+1` on weak is the "collective" weak ref held on behalf of all strong refs. It guarantees the struct allocation stays live across the `strong 1→0 → drop_contents()` window, so `upgrade()` is memory-safe as long as *any* weak ref exists. | Method | Does | |---|---| | `ref()` / `deref()` | bump/drop strong; at strong 1→0: `drop_contents()` then drop collective weak | | `weakRef()` / `weakDeref()` | bump/drop weak; at weak 1→0: `free_memory()` | | `upgrade()` | CAS-loop on strong, only increments if currently > 0 — **never revives a dead object** | ### SSLConfig wiring ```zig const RC = bun.ptr.ThreadSafeWeakableRefCount(@this(), "ref_count", dropContents, freeMemory, .{}); fn dropContents(this: *SSLConfig) void { // strong 1->0 GlobalRegistry.remove(this); // while content intact (map eql needs it) this.deinit(); // free strings } fn freeMemory(this: *SSLConfig) void { // weak 1->0 bun.default_allocator.destroy(this); } ``` ### Registry now holds weak refs **`intern()`** 1. `getOrPut` by content. 2. Found existing? `upgrade()` it: - **Success** (strong was > 0): got a real ref. Free `new_config`, return existing. - **Fail** (strong is 0, dying): registry `weakDeref`s the old entry, replaces the slot with `new_config`. 3. Registry `weakRef`s the winning entry. **`remove()`** (called from `dropContents`) - Look up by content hash, check **pointer identity**. If `intern()` replaced our slot while we were blocked on the mutex, the pointer won't match → no-op (intern already dropped our weak ref). - Otherwise evict and `weakDeref`. ### Why `upgrade()` on strong==0 is not a UAF The weak count holds the allocation. Registry holds a weak ref → `weak_count ≥ 1` → struct memory is live. The CAS reads a live atomic. Contents may be garbage (mid-`deinit()`), but `upgrade()` only touches `strong_count` — never the content. This is the **key difference** from a single-counter design: memory safety is compositional (refcounts alone guarantee it), not entangled with mutex ordering. The mutex here only protects map structure and the invariant that entry content is intact while in the map. ### Corrected race | HTTP thread | JS thread | Result | |---|---|---| | strong 1→0, blocked in `remove()` | `upgrade()` fails, `weakDeref` old (2→1), replace slot, `weakRef` new | old: `remove()` ptr-mismatch no-op → `deinit` → collective `weakDeref` (1→0) → free. JS thread has a fresh, valid config. | ## Also - `packages/bun-usockets/src/crypto/openssl.c`: NULL guards before `strlen(content)` in `us_ssl_ctx_use_privatekey_content` and `us_ssl_ctx_use_certificate_chain`. Defense-in-depth — turns a segfault into a clean SSL error if a NULL ever slips through. - `test/js/web/fetch/fetch-proxy-tls-intern-race.test.ts`: stress test firing overlapping waves of 8 concurrent proxy fetches with identical TLS options and `keepalive: false` (forces immediate deref on completion, no keepalive pool masking the race). --- ## Reproduction & verification See oven-sh#27863 for the full reproduction recipe. With a debug+ASAN build and `BUN_DEBUG_SSLConfig=1` (which widens the race window via stderr logging), the repro script crashes **3/3 on main** and **passes 3/3 on this branch**. **Why it proves the same crash as production:** The production segfault at `openssl.c:1173` and our debug assertion failure at `ref_count.zig:476` are the **same root cause**, caught at different points: ``` [HTTP thread] [JS thread] deref() fetchSub 1→0 ↓ (race window) intern() finds dying config ref() → old_count = 0 ┌────────────────────────────┐ │ DEBUG: assertValid → panic │ ← repro catches here │ RELEASE: silently succeeds │ └────────────────────────────┘ destroy/deinit/free returns dangling *SSLConfig stored in client.tls_props ... proxy CONNECT ... tls_props.?.* → reads freed struct strdup(garbage) → segfault at 0x0 ← production crash ``` In release, `debugAssert(old_count > 0)` is a no-op. `ref()` silently does 0→1, `intern()` returns the dangling pointer, and the crash surfaces ~200μs later when `startProxyHandshake` dereferences the freed struct. Same race, same dangling pointer — debug catches it at the source, release at the symptom. **On this branch:** `upgrade()` uses a CAS loop that refuses the 0→1 bump. When it fails, `intern()` replaces the map slot with a fresh config instead of resurrecting the dying one. The repro logs show many clean `deref 1 - 0` to different addresses (fresh allocations) with zero `ref 0 - 1`. --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Proof-of-concept branch demonstrating the SSLConfig
intern()/deref()race fixed in #27838. Reliably reproduces the crash 3/3 on debug builds.The race
SSLConfig.GlobalRegistryis a weak-dedup cache shared across threads.deref()(HTTP thread) andintern()(JS thread) can race:deref()→fetchSubmakes count 1→0destroy()(race window open)intern()→mutex.lock()(wins mutex)getOrPutfinds dying config in mapexisting.ref()→ count 0→1 (UAF)remove()→mutex.lock()(now wins)deinit()→ frees fields,destroy(this)this.tls_props.?.*→ segfault reading freed memorydebugAssert(old_count > 0)atref_count.zig:246is a no-op in release builds — the bump 0→1 succeeds silently.Crash stack from production (v1.3.11 linux x86_64_baseline):
Reproduction recipe (macOS/Linux)
1. Build bun with ASAN (debug)
This downloads the prebuilt ASAN WebKit (
bun-webkit-{os}-{arch}-debug-asan.tar.gz) — available for macOS arm64 and Linux x64/arm64.2. Run the repro script
Expected output (on this branch, unfixed code)
Exit code 133 (SIGTRAP from assert). Reproduced 3/3 runs.
Expected output on #27838 (fixed)
Exit code 0. No
ref 0 - 1anywhere in the log.upgrade()refuses the 0→1 CAS and intern replaces the slot with a fresh config instead.Why
BUN_DEBUG_SSLConfig=1is requiredIn release builds, the race window between
fetchSub(1→0)andmutex.lock()is ~5-10 CPU cycles — too narrow to hit deterministically.In debug builds with scoped logging enabled, the
deref()path does:fetchSub(1)→ count goes to 0scope.log("0x{x} deref 1 - 0")→ stderr write, ~100μs (ref_count.zig:253-258)count.debug.deinit()→ frees tracking hashmaps (ref_count.zig:262-264)destroy(self)→log("destroy: strong reached 0")→ another stderr write (SSLConfig.zig:275)GlobalRegistry.remove()→mutex.lock()Steps 2-4 widen the window to ~100-200μs. That's enough for a probe worker's
setImmediate-driven intern loop to slip in.Changes in this branch
src/bun.js/api/server/SSLConfig.zig— added debug logs (observability only, no behavior change)test/js/web/fetch/sslconfig-race-repro.ts— standalone reproduction scriptWhy the debug repro proves the same crash as production
Production crash (release v1.3.11, linux x86_64_baseline):
Caught on the HTTP thread, when the dangling
tls_propsis dereferenced.Our debug repro:
Caught on a worker JS thread, at the moment
ref()is called on a dying config.Same causal chain, different catch point:
In release builds,
debugAssert(old_count > 0)(ref_count.zig:246) is a no-op, and thedebug.magiccheck doesn't exist (enable_debug = false). Soref()silently bumps 0→1 andintern()returns the dangling pointer. The crash surfaces much later whenstartProxyHandshakedereferencesthis.tls_props.?.*— freed memory that mimalloc may have reused, giving garbage/NULL pointers tostrdup.In debug builds,
debug.deinit()(ref_count.zig:518) setsdebug.magic = undefinedright afterfetchSub(1→0). The nextref()'sassertValid()catches it. We crash at the source of the bug, not the symptom.The same dangling pointer, same config, same race. Debug has better instrumentation that catches it at the moment the invariant ("refs only on live objects") is violated, ~200μs before the production symptom would appear.