prove: SSLConfig intern/deref race observability (do NOT merge) by cirospaciari · Pull Request #27863 · oven-sh/bun

cirospaciari · 2026-03-06T20:14:43Z

Proof-of-concept branch demonstrating the SSLConfig intern()/deref() race fixed in #27838. Reliably reproduces the crash 3/3 on debug builds.

The race

SSLConfig.GlobalRegistry is a weak-dedup cache shared across threads. deref() (HTTP thread) and intern() (JS thread) can race:

HTTP thread (last ref holder)	JS thread (new fetch)
`deref()` → `fetchSub` makes count 1→0
enters `destroy()` (race window open)	`intern()` → `mutex.lock()` (wins mutex)
	`getOrPut` finds dying config in map
	`existing.ref()` → count 0→1 (UAF)
	returns dangling pointer
`remove()` → `mutex.lock()` (now wins)
`deinit()` → frees fields, `destroy(this)`
(memory freed)	...proxy CONNECT completes... `this.tls_props.?.*` → segfault reading freed memory

debugAssert(old_count > 0) at ref_count.zig:246 is a no-op in release builds — the bump 0→1 succeeds silently.

Crash stack from production (v1.3.11 linux x86_64_baseline):

openssl.c:1173 — create_ssl_context_from_bun_options → strlen(NULL)
SocketContext.zig:245 — SSLWrapper init
ProxyTunnel.zig:277 — startProxyHandshake
http.zig:1609 — handleOnDataHeaders

Reproduction recipe (macOS/Linux)

1. Build bun with ASAN (debug)

cmake -B build/asan -G Ninja -DCMAKE_BUILD_TYPE=Debug -DENABLE_ASAN=ON
ninja -C build/asan bun-debug

This downloads the prebuilt ASAN WebKit (bun-webkit-{os}-{arch}-debug-asan.tar.gz) — available for macOS arm64 and Linux x64/arm64.

2. Run the repro script

BUN_DEBUG_QUIET_LOGS=1 BUN_DEBUG_SSLConfig=1 ./build/asan/bun-debug run test/js/web/fetch/sslconfig-race-repro.ts

Expected output (on this branch, unfixed code)

[repro] backend=... proxy=...
[sslconfig] 0x304020380 deref 1 - 0
[sslconfig] destroy 0x304020380: strong reached 0, freeing
[sslconfig] 0x304020380   ref 0 - 1                          ← smoking gun
panic: reached unreachable code
ref_count.zig:476 — bun.assert(debug.magic == .valid) in assertValid
ref_count.zig:237 — count.debug.assertValid() in ref
SSLConfig.zig:313 — existing.ref() in intern
fetch.zig:471    — GlobalRegistry.intern

Exit code 133 (SIGTRAP from assert). Reproduced 3/3 runs.

Expected output on #27838 (fixed)

[repro] backend=... proxy=...
[repro] results: [ 2000, 500000+, 500000+ ] (if you see this, race didn't trigger)

Exit code 0. No ref 0 - 1 anywhere in the log. upgrade() refuses the 0→1 CAS and intern replaces the slot with a fresh config instead.

Why `BUN_DEBUG_SSLConfig=1` is required

In release builds, the race window between fetchSub(1→0) and mutex.lock() is ~5-10 CPU cycles — too narrow to hit deterministically.

In debug builds with scoped logging enabled, the deref() path does:

fetchSub(1) → count goes to 0
scope.log("0x{x} deref 1 - 0") → stderr write, ~100μs (ref_count.zig:253-258)
count.debug.deinit() → frees tracking hashmaps (ref_count.zig:262-264)
destroy(self) → log("destroy: strong reached 0") → another stderr write (SSLConfig.zig:275)
GlobalRegistry.remove() → mutex.lock()

Steps 2-4 widen the window to ~100-200μs. That's enough for a probe worker's setImmediate-driven intern loop to slip in.

Changes in this branch

src/bun.js/api/server/SSLConfig.zig — added debug logs (observability only, no behavior change)
test/js/web/fetch/sslconfig-race-repro.ts — standalone reproduction script

Why the debug repro proves the same crash as production

Production crash (release v1.3.11, linux x86_64_baseline):

Segmentation fault at address 0x00000000
openssl.c:1173       — create_ssl_context_from_bun_options → strdup(options.passphrase)
SocketContext.zig:245 — SSLWrapper init
ProxyTunnel.zig:277  — startProxyHandshake
http.zig:1609        — handleOnDataHeaders

Caught on the HTTP thread, when the dangling tls_props is dereferenced.

Our debug repro:

panic: reached unreachable code
ref_count.zig:476 — bun.assert(debug.magic == .valid) in assertValid
ref_count.zig:237 — count.debug.assertValid() in ref
SSLConfig.zig:313 — existing.ref() in intern
fetch.zig:471    — GlobalRegistry.intern

Caught on a worker JS thread, at the moment ref() is called on a dying config.

Same causal chain, different catch point:

  [HTTP thread]                   [JS thread]
  deref() fetchSub 1→0
    ↓ (race window open)
                                   intern() finds dying config in map
                                   existing.ref() → old_count = 0
                                   ┌────────────────────────────────────┐
                                   │ DEBUG: debugAssert + assertValid   │
                                   │        → panic  ← OUR REPRO        │
                                   ├────────────────────────────────────┤
                                   │ RELEASE: both are no-ops           │
                                   │        → ref() 0→1 silently succeeds│
                                   └────────────────────────────────────┘
  destroy() → deinit()             intern() returns dangling *SSLConfig
    frees passphrase/cert/key      stored in client.tls_props
  allocator.destroy(this)
    ↓                              ... proxy CONNECT round-trip ...
  (memory freed/reused)
                                   startProxyHandshake (http.zig:1457):
                                     ssl_options = this.tls_props.?.*
                                     ← READS FREED STRUCT
                                     ↓
                                   SSLWrapper.init → create_ssl_context_from_bun_options
                                     strdup(options.passphrase)
                                     ← passphrase slice is garbage/NULL
                                     → segfault at 0x0  ← PRODUCTION CRASH

In release builds, debugAssert(old_count > 0) (ref_count.zig:246) is a no-op, and the debug.magic check doesn't exist (enable_debug = false). So ref() silently bumps 0→1 and intern() returns the dangling pointer. The crash surfaces much later when startProxyHandshake dereferences this.tls_props.?.* — freed memory that mimalloc may have reused, giving garbage/NULL pointers to strdup.

In debug builds, debug.deinit() (ref_count.zig:518) sets debug.magic = undefined right after fetchSub(1→0). The next ref()'s assertValid() catches it. We crash at the source of the bug, not the symptom.

The same dangling pointer, same config, same race. Debug has better instrumentation that catches it at the moment the invariant ("refs only on live objects") is violated, ~200μs before the production symptom would appear.

Add scoped debug logs (BUN_DEBUG_SSLConfig=1) to GlobalRegistry.intern() and destroy() to observe the race condition between intern() finding a dying config and ref()'s debugAssert catching a 0->1 resurrection. The test exercises the proxy+TLS path with keepalive:false and no ca (so requires_custom_request_ctx=false, meaning the SSL context cache does NOT hold an extra ref). This makes the refcount hit 0 on every wave completion. Logs show: - 'intern: found existing 0xABC, refcount=N before ref' — if N is ever 0, that's the 0->1 resurrection bug (debugAssert panics in debug builds, ASAN catches the UAF in release+ASAN builds) - 'destroy 0xABC: strong reached 0, freeing' — proves refcount hits 0 On macOS the race window is too narrow to trigger naturally. On Linux CI (especially linux-x64-asan with different thread scheduling), the race is more likely to manifest. See: #27838

robobun · 2026-03-06T20:14:55Z

^{Updated 5:06 PM PT - Mar 6th, 2026}

❌ @cirospaciari, your commit f42fa72 has 2 failures in Build #38772 (All Failures):

test/integration/next-pages/test/dev-server.test.ts - code 1 on 🍎 14 aarch64
test/integration/next-pages/test/dev-server.test.ts - code 1 on 🍎 13 aarch64
test/js/bun/test/parallel/test-integration-rspack.ts - timeout on 🪟 2019 x64

🧪 To try this PR locally:

bunx bun-pr 27863

That installs a local version of the PR into your bun-27863 executable, so you can run:

bun-27863 --bun

Standalone C program that reproduces the exact same race pattern as SSLConfig.GlobalRegistry — shared map protected by mutex, atomic refcount, deref→destroy without checking if intern() just resurrected the entry. Build and run with ASAN: cc -fsanitize=address -pthread -O1 -o race_repro sslconfig-race-repro.c ./race_repro ASAN reports heap-use-after-free on EVERY run: Thread T1: READ of interned->data (like ProxyTunnel reading tls_props) freed by Thread T2: config_destroy (like SSLConfig.destroy) The race: thread A does fetchSub(1)→0→destroy()→free, while thread B does intern()→find existing→ref() 0→1→returns dangling pointer. This is the exact pattern causing the segfault at 0x0 in production.

Workers share the static SSLConfig.GlobalRegistry across threads. Multiple workers doing serial proxy+TLS fetches with keepalive:false creates real cross-thread contention between intern() (worker JS threads) and deref() (worker HTTP threads). On ASAN builds, if the 0->1 resurrection race triggers, ASAN catches the heap-use-after-free — same pattern proven by sslconfig-race-repro.c.

8 workers x 20 waves x 16 concurrent fetches per wave = 2560 total. Concurrent batches maximize overlap between deref() (HTTP threads) and intern() (JS threads) across workers sharing the GlobalRegistry.

The CI test never triggered the race (window too narrow, workers naturally synchronize). This debug-only script DOES reproduce it 3/3. Key insight: BUN_DEBUG_SSLConfig=1 enables scoped logging in deref() and destroy(), widening the race window from ~10 CPU cycles to 100μs+ via stderr writes. Driver worker cycles refcount through 0; probe workers spam intern via fetch+abort in setImmediate loops. See script header comment for full repro steps.

Non-deterministic by nature (race window is ~10 CPU cycles in release), but catches the race ~60% on debug+ASAN without any special env vars due to debug.deinit() in ref_count.zig widening the window. Driver worker cycles refcount through 0 via serial proxy+TLS fetches. Probe workers spam intern() via fetch+abort in setImmediate loops. For deterministic repro use BUN_DEBUG_SSLConfig=1 — see #27863.

@this

…up (#27838) ## Problem Segfault at address `0x0` in `create_ssl_context_from_bun_options` during proxy tunnel setup (Bun v1.3.11, linux x86_64_baseline). The crash is `strlen(NULL)` on a freed cert/key string. ## Root cause `SSLConfig.GlobalRegistry` is a dedup cache keyed by TLS option content. It stored raw pointers without contributing to refcount — a weak cache — but `intern()` called plain `ref()` on what it found, which blindly does `fetchAdd(1)` with only a `debugAssert(old > 0)` (no-op in release). | HTTP thread (`deref`, last holder) | JS thread (`intern`, new fetch same TLS opts) | |---|---| | `fetchSub` 1→0, enters `destroy()` | | | | locks mutex, `getOrPut` finds dying entry (still in map) | | | `ref()` → **0→1** (resurrection, no check in release) | | | unlocks, returns dangling pointer to new fetch | | locks mutex in `remove()`, evicts, unlocks | | | `deinit()` frees cert/key strings | | | `allocator.destroy()` | | | | proxy tunnel builds SSL ctx from freed config → `strlen(NULL)` | Proxy amplifies this: `ProxyTunnel.start()` creates a *second* SSL context from `tls_props` *after* the CONNECT round-trip, giving a multi-millisecond window for another request's teardown to win the race. ## Fix: Arc/Weak split refcounting This is the Rust `Arc<T>`/`Weak<T>` pattern (and C++ `shared_ptr`/`weak_ptr`): two atomic counters instead of one. ### New primitive: `ThreadSafeWeakableRefCount` Added to `src/ptr/ref_count.zig` alongside `ThreadSafeRefCount`. General-purpose, reusable for any type that needs weak references. ```zig strong: atomic u32 // live users; 1->0 calls drop_contents() weak: atomic u32 // weak holders + (1 if strong > 0); 1->0 calls free_memory() ``` The `+1` on weak is the "collective" weak ref held on behalf of all strong refs. It guarantees the struct allocation stays live across the `strong 1→0 → drop_contents()` window, so `upgrade()` is memory-safe as long as *any* weak ref exists. | Method | Does | |---|---| | `ref()` / `deref()` | bump/drop strong; at strong 1→0: `drop_contents()` then drop collective weak | | `weakRef()` / `weakDeref()` | bump/drop weak; at weak 1→0: `free_memory()` | | `upgrade()` | CAS-loop on strong, only increments if currently > 0 — **never revives a dead object** | ### SSLConfig wiring ```zig const RC = bun.ptr.ThreadSafeWeakableRefCount(@this(), "ref_count", dropContents, freeMemory, .{}); fn dropContents(this: *SSLConfig) void { // strong 1->0 GlobalRegistry.remove(this); // while content intact (map eql needs it) this.deinit(); // free strings } fn freeMemory(this: *SSLConfig) void { // weak 1->0 bun.default_allocator.destroy(this); } ``` ### Registry now holds weak refs **`intern()`** 1. `getOrPut` by content. 2. Found existing? `upgrade()` it: - **Success** (strong was > 0): got a real ref. Free `new_config`, return existing. - **Fail** (strong is 0, dying): registry `weakDeref`s the old entry, replaces the slot with `new_config`. 3. Registry `weakRef`s the winning entry. **`remove()`** (called from `dropContents`) - Look up by content hash, check **pointer identity**. If `intern()` replaced our slot while we were blocked on the mutex, the pointer won't match → no-op (intern already dropped our weak ref). - Otherwise evict and `weakDeref`. ### Why `upgrade()` on strong==0 is not a UAF The weak count holds the allocation. Registry holds a weak ref → `weak_count ≥ 1` → struct memory is live. The CAS reads a live atomic. Contents may be garbage (mid-`deinit()`), but `upgrade()` only touches `strong_count` — never the content. This is the **key difference** from a single-counter design: memory safety is compositional (refcounts alone guarantee it), not entangled with mutex ordering. The mutex here only protects map structure and the invariant that entry content is intact while in the map. ### Corrected race | HTTP thread | JS thread | Result | |---|---|---| | strong 1→0, blocked in `remove()` | `upgrade()` fails, `weakDeref` old (2→1), replace slot, `weakRef` new | old: `remove()` ptr-mismatch no-op → `deinit` → collective `weakDeref` (1→0) → free. JS thread has a fresh, valid config. | ## Also - `packages/bun-usockets/src/crypto/openssl.c`: NULL guards before `strlen(content)` in `us_ssl_ctx_use_privatekey_content` and `us_ssl_ctx_use_certificate_chain`. Defense-in-depth — turns a segfault into a clean SSL error if a NULL ever slips through. - `test/js/web/fetch/fetch-proxy-tls-intern-race.test.ts`: stress test firing overlapping waves of 8 concurrent proxy fetches with identical TLS options and `keepalive: false` (forces immediate deref on completion, no keepalive pool masking the race). --- ## Reproduction & verification See #27863 for the full reproduction recipe. With a debug+ASAN build and `BUN_DEBUG_SSLConfig=1` (which widens the race window via stderr logging), the repro script crashes **3/3 on main** and **passes 3/3 on this branch**. **Why it proves the same crash as production:** The production segfault at `openssl.c:1173` and our debug assertion failure at `ref_count.zig:476` are the **same root cause**, caught at different points: ``` [HTTP thread] [JS thread] deref() fetchSub 1→0 ↓ (race window) intern() finds dying config ref() → old_count = 0 ┌────────────────────────────┐ │ DEBUG: assertValid → panic │ ← repro catches here │ RELEASE: silently succeeds │ └────────────────────────────┘ destroy/deinit/free returns dangling *SSLConfig stored in client.tls_props ... proxy CONNECT ... tls_props.?.* → reads freed struct strdup(garbage) → segfault at 0x0 ← production crash ``` In release, `debugAssert(old_count > 0)` is a no-op. `ref()` silently does 0→1, `intern()` returns the dangling pointer, and the crash surfaces ~200μs later when `startProxyHandshake` dereferences the freed struct. Same race, same dangling pointer — debug catches it at the source, release at the symptom. **On this branch:** `upgrade()` uses a CAS loop that refuses the 0→1 bump. When it fails, `intern()` replaces the map slot with a fresh config instead of resurrecting the dying one. The repro logs show many clean `deref 1 - 0` to different addresses (fresh allocations) with zero `ref 0 - 1`. --------- Co-authored-by: Claude <noreply@anthropic.com>

@this

…up (oven-sh#27838) ## Problem Segfault at address `0x0` in `create_ssl_context_from_bun_options` during proxy tunnel setup (Bun v1.3.11, linux x86_64_baseline). The crash is `strlen(NULL)` on a freed cert/key string. ## Root cause `SSLConfig.GlobalRegistry` is a dedup cache keyed by TLS option content. It stored raw pointers without contributing to refcount — a weak cache — but `intern()` called plain `ref()` on what it found, which blindly does `fetchAdd(1)` with only a `debugAssert(old > 0)` (no-op in release). | HTTP thread (`deref`, last holder) | JS thread (`intern`, new fetch same TLS opts) | |---|---| | `fetchSub` 1→0, enters `destroy()` | | | | locks mutex, `getOrPut` finds dying entry (still in map) | | | `ref()` → **0→1** (resurrection, no check in release) | | | unlocks, returns dangling pointer to new fetch | | locks mutex in `remove()`, evicts, unlocks | | | `deinit()` frees cert/key strings | | | `allocator.destroy()` | | | | proxy tunnel builds SSL ctx from freed config → `strlen(NULL)` | Proxy amplifies this: `ProxyTunnel.start()` creates a *second* SSL context from `tls_props` *after* the CONNECT round-trip, giving a multi-millisecond window for another request's teardown to win the race. ## Fix: Arc/Weak split refcounting This is the Rust `Arc<T>`/`Weak<T>` pattern (and C++ `shared_ptr`/`weak_ptr`): two atomic counters instead of one. ### New primitive: `ThreadSafeWeakableRefCount` Added to `src/ptr/ref_count.zig` alongside `ThreadSafeRefCount`. General-purpose, reusable for any type that needs weak references. ```zig strong: atomic u32 // live users; 1->0 calls drop_contents() weak: atomic u32 // weak holders + (1 if strong > 0); 1->0 calls free_memory() ``` The `+1` on weak is the "collective" weak ref held on behalf of all strong refs. It guarantees the struct allocation stays live across the `strong 1→0 → drop_contents()` window, so `upgrade()` is memory-safe as long as *any* weak ref exists. | Method | Does | |---|---| | `ref()` / `deref()` | bump/drop strong; at strong 1→0: `drop_contents()` then drop collective weak | | `weakRef()` / `weakDeref()` | bump/drop weak; at weak 1→0: `free_memory()` | | `upgrade()` | CAS-loop on strong, only increments if currently > 0 — **never revives a dead object** | ### SSLConfig wiring ```zig const RC = bun.ptr.ThreadSafeWeakableRefCount(@this(), "ref_count", dropContents, freeMemory, .{}); fn dropContents(this: *SSLConfig) void { // strong 1->0 GlobalRegistry.remove(this); // while content intact (map eql needs it) this.deinit(); // free strings } fn freeMemory(this: *SSLConfig) void { // weak 1->0 bun.default_allocator.destroy(this); } ``` ### Registry now holds weak refs **`intern()`** 1. `getOrPut` by content. 2. Found existing? `upgrade()` it: - **Success** (strong was > 0): got a real ref. Free `new_config`, return existing. - **Fail** (strong is 0, dying): registry `weakDeref`s the old entry, replaces the slot with `new_config`. 3. Registry `weakRef`s the winning entry. **`remove()`** (called from `dropContents`) - Look up by content hash, check **pointer identity**. If `intern()` replaced our slot while we were blocked on the mutex, the pointer won't match → no-op (intern already dropped our weak ref). - Otherwise evict and `weakDeref`. ### Why `upgrade()` on strong==0 is not a UAF The weak count holds the allocation. Registry holds a weak ref → `weak_count ≥ 1` → struct memory is live. The CAS reads a live atomic. Contents may be garbage (mid-`deinit()`), but `upgrade()` only touches `strong_count` — never the content. This is the **key difference** from a single-counter design: memory safety is compositional (refcounts alone guarantee it), not entangled with mutex ordering. The mutex here only protects map structure and the invariant that entry content is intact while in the map. ### Corrected race | HTTP thread | JS thread | Result | |---|---|---| | strong 1→0, blocked in `remove()` | `upgrade()` fails, `weakDeref` old (2→1), replace slot, `weakRef` new | old: `remove()` ptr-mismatch no-op → `deinit` → collective `weakDeref` (1→0) → free. JS thread has a fresh, valid config. | ## Also - `packages/bun-usockets/src/crypto/openssl.c`: NULL guards before `strlen(content)` in `us_ssl_ctx_use_privatekey_content` and `us_ssl_ctx_use_certificate_chain`. Defense-in-depth — turns a segfault into a clean SSL error if a NULL ever slips through. - `test/js/web/fetch/fetch-proxy-tls-intern-race.test.ts`: stress test firing overlapping waves of 8 concurrent proxy fetches with identical TLS options and `keepalive: false` (forces immediate deref on completion, no keepalive pool masking the race). --- ## Reproduction & verification See oven-sh#27863 for the full reproduction recipe. With a debug+ASAN build and `BUN_DEBUG_SSLConfig=1` (which widens the race window via stderr logging), the repro script crashes **3/3 on main** and **passes 3/3 on this branch**. **Why it proves the same crash as production:** The production segfault at `openssl.c:1173` and our debug assertion failure at `ref_count.zig:476` are the **same root cause**, caught at different points: ``` [HTTP thread] [JS thread] deref() fetchSub 1→0 ↓ (race window) intern() finds dying config ref() → old_count = 0 ┌────────────────────────────┐ │ DEBUG: assertValid → panic │ ← repro catches here │ RELEASE: silently succeeds │ └────────────────────────────┘ destroy/deinit/free returns dangling *SSLConfig stored in client.tls_props ... proxy CONNECT ... tls_props.?.* → reads freed struct strdup(garbage) → segfault at 0x0 ← production crash ``` In release, `debugAssert(old_count > 0)` is a no-op. `ref()` silently does 0→1, `intern()` returns the dangling pointer, and the crash surfaces ~200μs later when `startProxyHandshake` dereferences the freed struct. Same race, same dangling pointer — debug catches it at the source, release at the symptom. **On this branch:** `upgrade()` uses a CAS loop that refuses the 0→1 bump. When it fails, `intern()` replaces the map slot with a fresh config instead of resurrecting the dying one. The repro logs show many clean `deref 1 - 0` to different addresses (fresh allocations) with zero `ref 0 - 1`. --------- Co-authored-by: Claude <noreply@anthropic.com>

@this

…up (oven-sh#27838) ## Problem Segfault at address `0x0` in `create_ssl_context_from_bun_options` during proxy tunnel setup (Bun v1.3.11, linux x86_64_baseline). The crash is `strlen(NULL)` on a freed cert/key string. ## Root cause `SSLConfig.GlobalRegistry` is a dedup cache keyed by TLS option content. It stored raw pointers without contributing to refcount — a weak cache — but `intern()` called plain `ref()` on what it found, which blindly does `fetchAdd(1)` with only a `debugAssert(old > 0)` (no-op in release). | HTTP thread (`deref`, last holder) | JS thread (`intern`, new fetch same TLS opts) | |---|---| | `fetchSub` 1→0, enters `destroy()` | | | | locks mutex, `getOrPut` finds dying entry (still in map) | | | `ref()` → **0→1** (resurrection, no check in release) | | | unlocks, returns dangling pointer to new fetch | | locks mutex in `remove()`, evicts, unlocks | | | `deinit()` frees cert/key strings | | | `allocator.destroy()` | | | | proxy tunnel builds SSL ctx from freed config → `strlen(NULL)` | Proxy amplifies this: `ProxyTunnel.start()` creates a *second* SSL context from `tls_props` *after* the CONNECT round-trip, giving a multi-millisecond window for another request's teardown to win the race. ## Fix: Arc/Weak split refcounting This is the Rust `Arc<T>`/`Weak<T>` pattern (and C++ `shared_ptr`/`weak_ptr`): two atomic counters instead of one. ### New primitive: `ThreadSafeWeakableRefCount` Added to `src/ptr/ref_count.zig` alongside `ThreadSafeRefCount`. General-purpose, reusable for any type that needs weak references. ```zig strong: atomic u32 // live users; 1->0 calls drop_contents() weak: atomic u32 // weak holders + (1 if strong > 0); 1->0 calls free_memory() ``` The `+1` on weak is the "collective" weak ref held on behalf of all strong refs. It guarantees the struct allocation stays live across the `strong 1→0 → drop_contents()` window, so `upgrade()` is memory-safe as long as *any* weak ref exists. | Method | Does | |---|---| | `ref()` / `deref()` | bump/drop strong; at strong 1→0: `drop_contents()` then drop collective weak | | `weakRef()` / `weakDeref()` | bump/drop weak; at weak 1→0: `free_memory()` | | `upgrade()` | CAS-loop on strong, only increments if currently > 0 — **never revives a dead object** | ### SSLConfig wiring ```zig const RC = bun.ptr.ThreadSafeWeakableRefCount(@this(), "ref_count", dropContents, freeMemory, .{}); fn dropContents(this: *SSLConfig) void { // strong 1->0 GlobalRegistry.remove(this); // while content intact (map eql needs it) this.deinit(); // free strings } fn freeMemory(this: *SSLConfig) void { // weak 1->0 bun.default_allocator.destroy(this); } ``` ### Registry now holds weak refs **`intern()`** 1. `getOrPut` by content. 2. Found existing? `upgrade()` it: - **Success** (strong was > 0): got a real ref. Free `new_config`, return existing. - **Fail** (strong is 0, dying): registry `weakDeref`s the old entry, replaces the slot with `new_config`. 3. Registry `weakRef`s the winning entry. **`remove()`** (called from `dropContents`) - Look up by content hash, check **pointer identity**. If `intern()` replaced our slot while we were blocked on the mutex, the pointer won't match → no-op (intern already dropped our weak ref). - Otherwise evict and `weakDeref`. ### Why `upgrade()` on strong==0 is not a UAF The weak count holds the allocation. Registry holds a weak ref → `weak_count ≥ 1` → struct memory is live. The CAS reads a live atomic. Contents may be garbage (mid-`deinit()`), but `upgrade()` only touches `strong_count` — never the content. This is the **key difference** from a single-counter design: memory safety is compositional (refcounts alone guarantee it), not entangled with mutex ordering. The mutex here only protects map structure and the invariant that entry content is intact while in the map. ### Corrected race | HTTP thread | JS thread | Result | |---|---|---| | strong 1→0, blocked in `remove()` | `upgrade()` fails, `weakDeref` old (2→1), replace slot, `weakRef` new | old: `remove()` ptr-mismatch no-op → `deinit` → collective `weakDeref` (1→0) → free. JS thread has a fresh, valid config. | ## Also - `packages/bun-usockets/src/crypto/openssl.c`: NULL guards before `strlen(content)` in `us_ssl_ctx_use_privatekey_content` and `us_ssl_ctx_use_certificate_chain`. Defense-in-depth — turns a segfault into a clean SSL error if a NULL ever slips through. - `test/js/web/fetch/fetch-proxy-tls-intern-race.test.ts`: stress test firing overlapping waves of 8 concurrent proxy fetches with identical TLS options and `keepalive: false` (forces immediate deref on completion, no keepalive pool masking the race). --- ## Reproduction & verification See oven-sh#27863 for the full reproduction recipe. With a debug+ASAN build and `BUN_DEBUG_SSLConfig=1` (which widens the race window via stderr logging), the repro script crashes **3/3 on main** and **passes 3/3 on this branch**. **Why it proves the same crash as production:** The production segfault at `openssl.c:1173` and our debug assertion failure at `ref_count.zig:476` are the **same root cause**, caught at different points: ``` [HTTP thread] [JS thread] deref() fetchSub 1→0 ↓ (race window) intern() finds dying config ref() → old_count = 0 ┌────────────────────────────┐ │ DEBUG: assertValid → panic │ ← repro catches here │ RELEASE: silently succeeds │ └────────────────────────────┘ destroy/deinit/free returns dangling *SSLConfig stored in client.tls_props ... proxy CONNECT ... tls_props.?.* → reads freed struct strdup(garbage) → segfault at 0x0 ← production crash ``` In release, `debugAssert(old_count > 0)` is a no-op. `ref()` silently does 0→1, `intern()` returns the dangling pointer, and the crash surfaces ~200μs later when `startProxyHandshake` dereferences the freed struct. Same race, same dangling pointer — debug catches it at the source, release at the symptom. **On this branch:** `upgrade()` uses a CAS loop that refuses the 0→1 bump. When it fails, `intern()` replaces the map slot with a fresh config instead of resurrecting the dying one. The repro logs show many clean `deref 1 - 0` to different addresses (fresh allocations) with zero `ref 0 - 1`. --------- Co-authored-by: Claude <noreply@anthropic.com>

autofix-ci Bot and others added 6 commits March 6, 2026 20:16

[autofix.ci] apply automated fixes

7320ace

[autofix.ci] apply automated fixes

3819b5b

remove standalone C repro (replaced by worker-based bun test)

8ed1e2f

test: increase concurrency for SSLConfig race reproduction

cf51fae

8 workers x 20 waves x 16 concurrent fetches per wave = 2560 total. Concurrent batches maximize overlap between deref() (HTTP threads) and intern() (JS threads) across workers sharing the GlobalRegistry.

cirospaciari added a commit that referenced this pull request Mar 6, 2026

remove race test (reliable repro is debug-only, see #27863 for recipe)

50cd844

cirospaciari mentioned this pull request Mar 6, 2026

fix: SSLConfig intern/deref race causing segfault in proxy tunnel setup #27838

Merged

cirospaciari closed this Mar 27, 2026

cirospaciari deleted the claude/prove-sslconfig-uaf-race branch March 27, 2026 23:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prove: SSLConfig intern/deref race observability (do NOT merge)#27863

prove: SSLConfig intern/deref race observability (do NOT merge)#27863
cirospaciari wants to merge 8 commits into
mainfrom
claude/prove-sslconfig-uaf-race

cirospaciari commented Mar 6, 2026 •

edited

Loading

Uh oh!

robobun commented Mar 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cirospaciari commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The race

Reproduction recipe (macOS/Linux)

1. Build bun with ASAN (debug)

2. Run the repro script

Expected output (on this branch, unfixed code)

Expected output on #27838 (fixed)

Why BUN_DEBUG_SSLConfig=1 is required

Changes in this branch

Why the debug repro proves the same crash as production

Uh oh!

robobun commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cirospaciari commented Mar 6, 2026 •

edited

Loading

Why `BUN_DEBUG_SSLConfig=1` is required

robobun commented Mar 6, 2026 •

edited

Loading