Fix PathWatcherManager deadlock and UAF in deferred deinit by chrislloyd · Pull Request #27469 · oven-sh/bun

chrislloyd · 2026-02-26T17:25:52Z

Three issues in the deferred-deinit pattern used by PathWatcherManager, all triggered by rapid fs.watch() churn.

`unregisterWatcher` self-deadlock (primary)

The deinit-on-last-watcher defer was registered after the mutex.lock()/defer mutex.unlock() pair:

this.mutex.lock();
defer this.mutex.unlock();           // fires last
var watchers = this.watchers.slice();
defer {                               // fires BEFORE unlock
    if (this.deinit_on_last_watcher and this.watcher_count == 0) {
        this.deinit();                // called holding this.mutex
    }
}

Zig defers fire LIFO, so deinit() ran while still holding this.mutex. deinit() then re-acquires this.mutex when hasPendingTasks() is true (a DirectoryRegisterTask running on the workpool). os_unfair_lock is non-recursive → the thread blocks in __ulock_wait2 forever.

Trigger: onError() called (sets deinit_on_last_watcher), then last watcher closed while a workpool task is mid-flight.

Also UAF: if deinit() completes it calls destroy(this), then defer this.mutex.unlock() fires on freed memory.

Fix: hoist to a should_deinit flag checked by a defer registered before the lock. LIFO ordering now yields unlock → deinit.

`unrefPendingTask` UAF

Same shape: this.deinit() called while holding this.mutex. In this path has_pending_tasks is stored false just before, so deinit() doesn't re-lock — but if it completes, destroy(this) means the defer this.mutex.unlock() is a UAF. Same fix pattern.

`processWatcher` AB/BA lock inversion

Worker thread holds watcher.mutex (processWatcher) → calls manager._decrementPathRef() on the OOM error path, which acquires manager.mutex. Main thread unregisterWatcher holds manager.mutex → wants watcher.mutex. Classic AB/BA. Only on allocation failure during file_paths.append, but real under memory pressure.

Fix: capture append result, unlock watcher.mutex before calling _decrementPathRef.

Field report

This was observed in the wild as all concurrent processes on a machine hanging at 0% CPU when one session's auto-updater touched watched directories, triggering a cascade of file watcher events across every running process. Stack trace:

FSWatcher.close → PathWatcher.deinit → PathWatcherManager.unregisterWatcher
  → _decrementPathRefNoLock → _os_unfair_lock_lock_slow → __ulock_wait2

No regression test included

The trigger (onError() firing) requires platform-level watch loop failure. rmSync of watched paths doesn't reproduce it — inotify/kqueue return deletion events, not errors. A stress-test fixture that tried this passed on unpatched Linux and would give false confidence. The defer-ordering comments are the regression guard.

coderabbitai · 2026-02-26T17:27:53Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5e96d5b7-3c7b-4a5b-99a7-1422c96a13f6

📥 Commits

Reviewing files that changed from the base of the PR and between 5c57817 and d5d00aa.

📒 Files selected for processing (1)

src/bun.js/node/path_watcher.zig

Walkthrough

Modified path watcher implementation to add deferred deinitialization guards with a should_deinit flag. Prevents deinit() from running while locks are held by deferring cleanup operations until after mutex releases in task completion, watcher unregistration, and directory reference handling. Net change: +43/-12 lines.

Changes

Cohort / File(s)	Summary
Deferred Deinitialization Safety `src/bun.js/node/path_watcher.zig`	Adds `should_deinit` flag to defer deinitialization until after lock release in `unrefPendingTask`, `unregisterWatcher`, and `unrefPendingDirectory`. Restructures path registration to hold watcher mutex during append then unlock before error handling. Eliminates immediate `deinit()` calls that could cause deadlocks when locks interact with manager mutex.

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The description provides thorough context on three concurrency bugs, their root causes, fixes, and field evidence. However, it does not follow the required template structure with 'What does this PR do?' and 'How did you verify your code works?' sections.	Restructure the description to follow the template by adding explicit 'What does this PR do?' and 'How did you verify your code works?' sections while preserving the detailed technical content.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and concisely describes the main fix: addressing a deadlock and use-after-free bug in PathWatcherManager's deferred deinitialization pattern.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 49-53: The code only clears has_pending_tasks when
deinit_on_last_task is true, leaving has_pending_tasks stuck if all tasks finish
before deinit starts; change the logic in the pending-tasks decrement block
(where pending_tasks, deinit_on_last_task, has_pending_tasks.store, and
should_deinit are used) to always clear has_pending_tasks when pending_tasks ==
0 (call has_pending_tasks.store(false, .release)), and then set should_deinit =
true only if deinit_on_last_task is also true.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 30e609e and 1fce28bdbe6a3f577dd58e19cdc98869825572df.

📒 Files selected for processing (2)

docs/bundler/executables.mdx
src/bun.js/node/path_watcher.zig

Three issues in the deferred-deinit pattern used by PathWatcherManager, all triggered by rapid fs.watch() churn. ## unregisterWatcher self-deadlock (primary) The deinit-on-last-watcher defer was registered AFTER the mutex.lock()/defer mutex.unlock() pair: this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } Zig defers fire LIFO, so deinit() ran while still holding this.mutex. deinit() then re-acquires this.mutex when hasPendingTasks() is true (a DirectoryRegisterTask running on the workpool). os_unfair_lock is non-recursive -> the thread blocks in __ulock_wait2 forever. Trigger: onError() called (sets deinit_on_last_watcher), then last watcher closed while a workpool task is mid-flight. Also UAF: if deinit() completes it calls destroy(this), then defer this.mutex.unlock() fires on freed memory. Fix: hoist to a should_deinit flag checked by a defer registered before the lock. LIFO ordering now yields unlock -> deinit. ## unrefPendingTask UAF Same shape: this.deinit() called while holding this.mutex. In this path has_pending_tasks is stored false just before, so deinit() doesn't re-lock -- but if it completes, destroy(this) means the defer this.mutex.unlock() is a UAF. Same fix pattern. ## processWatcher AB/BA lock inversion Worker thread holds watcher.mutex (processWatcher) -> calls manager._decrementPathRef() on the OOM error path, which acquires manager.mutex. Main thread unregisterWatcher holds manager.mutex -> wants watcher.mutex. Classic AB/BA. Only on allocation failure during file_paths.append, but real under memory pressure. Fix: capture append result, unlock watcher.mutex BEFORE calling _decrementPathRef. ## Test plan - zig fmt --check + zig ast-check (Bun's zig 0.15.2) - bun run zig:check (full semantic analysis, 5/5) - Full compile to bun-zig.o (asan + noasan), symbols verified - No JS regression test: the trigger (onError() firing) requires platform-level watch loop failure which isn't reproducible via rmSync of watched paths (inotify/kqueue return deletion events, not errors). Stress tests pass on unpatched Linux. :house: Remote-Dev: homespace

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/bun.js/node/path_watcher.zig (1)

49-52: ⚠️ Potential issue | 🟠 Major

Clear has_pending_tasks whenever pending_tasks reaches zero.

At Line 50, clearing the atomic flag is gated by deinit_on_last_task. If tasks drain before deinit starts, has_pending_tasks can remain true and deinit can be deferred indefinitely with zero real tasks.

🔧 Proposed fix

         this.pending_tasks -= 1;
-        if (this.deinit_on_last_task and this.pending_tasks == 0) {
+        if (this.pending_tasks == 0) {
             this.has_pending_tasks.store(false, .release);
-            should_deinit = true;
+            if (this.deinit_on_last_task) {
+                should_deinit = true;
+            }
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/node/path_watcher.zig` around lines 49 - 52, The atomic flag
has_pending_tasks should be cleared whenever pending_tasks drops to zero, not
only when deinit_on_last_task is true; modify the block in the code that
decrements this.pending_tasks (the area around pending_tasks -= 1,
deinit_on_last_task, has_pending_tasks.store, and should_deinit) so that after
decrementing you check if this.pending_tasks == 0 and always call
this.has_pending_tasks.store(false, .release) when zero, and only set
should_deinit = true when deinit_on_last_task is true (preserving the existing
deinit behavior).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 853-856: The atomic flag has_pending_directories is only cleared
when isClosed() is true, leaving hasPendingDirectories() stale when
pending_directories drains to zero before close; update the logic in the block
around pending_directories to clear has_pending_directories (call .store(false,
.release)) whenever pending_directories == 0 regardless of isClosed(), and
ensure deinit()'s early-return condition still works by setting should_deinit
when both isClosed() and pending_directories == 0 (or when appropriate per
current deinit logic); look for the symbols has_pending_directories,
pending_directories, isClosed(), and deinit() to make the change.

---

Duplicate comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 49-52: The atomic flag has_pending_tasks should be cleared
whenever pending_tasks drops to zero, not only when deinit_on_last_task is true;
modify the block in the code that decrements this.pending_tasks (the area around
pending_tasks -= 1, deinit_on_last_task, has_pending_tasks.store, and
should_deinit) so that after decrementing you check if this.pending_tasks == 0
and always call this.has_pending_tasks.store(false, .release) when zero, and
only set should_deinit = true when deinit_on_last_task is true (preserving the
existing deinit behavior).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1fce28bdbe6a3f577dd58e19cdc98869825572df and 59962f7.

📒 Files selected for processing (1)

src/bun.js/node/path_watcher.zig

Jarred-Sumner

Can we come up with a test that reproduces the issue with ASAN enabled? It sometimes showed up with the watch-many-dirs test? Maybe we can get a more isolated version of that test?

alii · 2026-03-02T22:45:40Z

I have a test locally I will push up soon. It fails in 1.3.10 and passes with this PR

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/node/watch/fs.watch.close-midscan.test.ts`:
- Around line 103-104: The test currently asserts cleanedStderr does not contain
"panic:" (using cleanedStderr), which violates the guideline; remove that
negative panic check and instead either (A) restrict the test to debug builds by
adding an explicit skip/guard so it only runs when DebugImpl is active (use the
existing DebugImpl presence check or the test's skip condition), or (B) replace
the panic absence assertion with a behavior-based assertion such as a
timeout-based check that the watcher process exits (or does not hang) within a
bounded time; update the test logic around cleanedStderr and the skip condition
accordingly so the test provides meaningful protection in CI.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 59962f7 and 5c57817.

📒 Files selected for processing (1)

test/js/node/watch/fs.watch.close-midscan.test.ts

This reverts commit 5c57817.

alii · 2026-03-09T18:50:36Z

@robobun get CI back to green - there is a zig build error.

robobun · 2026-03-09T18:51:39Z

Build fix — all Zig builds green ✅

Fix PR: #27954 — all 12 Zig builds pass across all platforms (were all 13 red before).

Error was: src/http.zig:49 accessed props.server_name but props is SSLConfig.SharedPtr after the merge with main (#27838). Fixed with props.get().server_name.

CI: 52 passed, 12 test failures — all are pre-existing infra flakes (same 12 failures on unrelated PRs #39081, #39080). Not caused by this PR.

Ready for @chrislloyd or @alii to merge #27954 into the branch.

robobun

Zig Build Fix

The build error across all platforms is:

src/http.zig:49:19: error: no field named 'server_name' in struct 'ptr.shared.WithOptions(*bun.js.api.server.SSLConfig, ...)'

Root cause: The merge with main pulled in #27838 (SSLConfig SharedPtr refactor) which changed tls_props from a raw pointer to ?SSLConfig.SharedPtr. The getTlsHostname function (from #27891) accesses props.server_name directly, but props is now a SharedPtr wrapper, not the inner SSLConfig.

Fix — one line, src/http.zig:49:

-        if (props.server_name) |sn| {
+        if (props.get().server_name) |sn| {

See line 1481 of the same file for the correct access pattern: tls.get().*.

alii · 2026-03-09T19:33:06Z

@robobun make a new pr with this fix

robobun · 2026-03-09T20:50:47Z

Created new PR #27957 with the combined fix targeting main. Waiting for CI...

alii · 2026-03-09T22:07:59Z

replacing with #27957

robobun · 2026-03-14T06:30:28Z

🔨 Building PR debug binary. Test proof confirmed: fs.watch.concurrency.test.ts FAILS with exit code 132 (SIGILL) on baked main debug binary, confirming the UAF bug. Waiting for PR binary build to verify fix (PASS expected). CI builds are running on BuildKite #39581.

@chrislloyd

Four fixes in the deferred-deinit pattern used by PathWatcherManager, all triggered by rapid fs.watch() churn: 1. unregisterWatcher self-deadlock: deinit() was called while holding this.mutex. deinit() re-acquires the same non-recursive mutex when hasPendingTasks() is true, causing self-deadlock. Fix: hoist to a should_deinit flag checked by a defer registered before the lock. 2. unrefPendingTask UAF: same shape — deinit() called while holding mutex. If deinit() completes and calls destroy(this), the deferred mutex.unlock() writes to freed memory. Same fix pattern. 3. processWatcher AB/BA lock inversion: worker thread holds watcher.mutex then calls _decrementPathRef which acquires manager.mutex. Main thread unregisterWatcher holds manager.mutex then wants watcher.mutex. Fix: unlock watcher.mutex before calling _decrementPathRef. 4. unrefPendingDirectory self-deadlock: deinit() calls setClosed() which re-locks this.mutex. Same should_deinit pattern. 5. onError lock ordering: acquired this.mutex then default_manager_mutex, but deinit() acquires them in reverse order. Fix: release this.mutex before acquiring default_manager_mutex. Based on #27469 by @chrislloyd.

Root cause: AB/BA lock ordering between manager.mutex and main_watcher.mutex. The watcher thread holds main_watcher.mutex then acquires manager.mutex (via onFileUpdate), while _decrementPathRefNoLock calls main_watcher.remove() under manager.mutex — classic lock inversion deadlock. Fix: _decrementPathRefNoLock now returns the hash for deferred removal. All callers call main_watcher.remove() AFTER releasing manager.mutex, eliminating the inversion. Additional fixes: - unregisterWatcher: defer deinit() to after mutex release to prevent self-deadlock on non-recursive mutex - unrefPendingTask/unrefPendingDirectory: same deferred-deinit pattern to prevent UAF (destroy while mutex held) - PathWatcher.deinit: combine setClosed + hasPendingDirectories check under single mutex hold to prevent double-deinit race - onError: release manager.mutex before acquiring default_manager_mutex to maintain consistent lock ordering - deinit: check watcher_count and pending_tasks under mutex - processWatcher: release watcher.mutex before _decrementPathRef - Clone file paths in addFile/addDirectory to prevent UAF - Skip evicted entries in onFileUpdate Closes #27469

@chrislloyd

Four call sites in path_watcher.zig called deinit() while still holding this.mutex. Since deinit() re-acquires the same non-recursive mutex (and may destroy(this)), this either self-deadlocked or UAF'd on the deferred unlock(). Fix: hoist to a should_deinit flag set under the lock, checked by a defer registered *before* the lock/unlock pair. Zig defers fire LIFO, so the deinit defer fires last (after unlock). Also fixes AB/BA lock inversion in processWatcher: worker thread held watcher.mutex while calling _decrementPathRef (acquires manager.mutex), while unregisterWatcher acquires them in the opposite order. Port of #27469 by @chrislloyd onto current main.

@chrislloyd

## What does this PR do? Targeted fix for four concurrency bugs in `src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock or crash. This is the minimal port of #27469 (by @chrislloyd) onto current main — previous attempts (#27957, #28088, #28104) added scope and introduced ASAN regressions. This PR applies only the defer-ordering fix. ## Root causes ### `unregisterWatcher` self-deadlock (primary) The deinit-on-last-watcher `defer` was registered *after* the `mutex.lock()`/`defer mutex.unlock()` pair: ```zig this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } ``` Zig defers fire LIFO, so `deinit()` ran while still holding `this.mutex`. `deinit()` re-acquires `this.mutex` when `hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock (observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock detected` on Linux). Also UAF: if `deinit()` completes it calls `destroy(this)`, then the deferred `mutex.unlock()` writes to freed memory. ### `unrefPendingTask` UAF Same shape: `deinit()` called while holding `this.mutex`. If it completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF. ### `unrefPendingDirectory` self-deadlock `deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while holding → self-deadlock. ### `processWatcher` AB/BA lock inversion Worker thread holds `watcher.mutex` → calls `manager._decrementPathRef()` on the OOM error path, which acquires `manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` → wants `watcher.mutex`. Classic AB/BA deadlock. ## Fix All four use the same pattern: set a `should_deinit` flag under the lock, checked by a `defer` registered *before* the lock/unlock pair. LIFO ordering then yields unlock → deinit. For `processWatcher`, capture the append result and release `watcher.mutex` before calling `_decrementPathRef`. No semantic changes to when the `has_pending_*` atomics are cleared — the conditions remain exactly as before, only `deinit()` is moved outside the lock. ## How did you verify your code works? - `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes - Without the fix (src/ stashed), the same test fails with `panic: Deadlock detected` from the debug-build mutex checker - All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under ASAN (these regressed in #28104) - `test/regression/issue/3657.test.ts` passes under ASAN - `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing root-permission failures (same on main) Supersedes #27469, #27957, #28088, #28104. --- **Re: duplicate-bot flagging #26385** — that PR fixes a different lock inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around `main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex` self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex` inversion. They're complementary, not duplicates. **Re: find-issues-bot flagging #18919** — tested the repro; it hits a separate pre-existing use-after-poison in the File Watcher thread that also reproduces on main without this change. Not claiming that fix here.

@chrislloyd

…9391) ## What does this PR do? Targeted fix for four concurrency bugs in `src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock or crash. This is the minimal port of oven-sh#27469 (by @chrislloyd) onto current main — previous attempts (oven-sh#27957, oven-sh#28088, oven-sh#28104) added scope and introduced ASAN regressions. This PR applies only the defer-ordering fix. ## Root causes ### `unregisterWatcher` self-deadlock (primary) The deinit-on-last-watcher `defer` was registered *after* the `mutex.lock()`/`defer mutex.unlock()` pair: ```zig this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } ``` Zig defers fire LIFO, so `deinit()` ran while still holding `this.mutex`. `deinit()` re-acquires `this.mutex` when `hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock (observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock detected` on Linux). Also UAF: if `deinit()` completes it calls `destroy(this)`, then the deferred `mutex.unlock()` writes to freed memory. ### `unrefPendingTask` UAF Same shape: `deinit()` called while holding `this.mutex`. If it completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF. ### `unrefPendingDirectory` self-deadlock `deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while holding → self-deadlock. ### `processWatcher` AB/BA lock inversion Worker thread holds `watcher.mutex` → calls `manager._decrementPathRef()` on the OOM error path, which acquires `manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` → wants `watcher.mutex`. Classic AB/BA deadlock. ## Fix All four use the same pattern: set a `should_deinit` flag under the lock, checked by a `defer` registered *before* the lock/unlock pair. LIFO ordering then yields unlock → deinit. For `processWatcher`, capture the append result and release `watcher.mutex` before calling `_decrementPathRef`. No semantic changes to when the `has_pending_*` atomics are cleared — the conditions remain exactly as before, only `deinit()` is moved outside the lock. ## How did you verify your code works? - `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes - Without the fix (src/ stashed), the same test fails with `panic: Deadlock detected` from the debug-build mutex checker - All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under ASAN (these regressed in oven-sh#28104) - `test/regression/issue/3657.test.ts` passes under ASAN - `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing root-permission failures (same on main) Supersedes oven-sh#27469, oven-sh#27957, oven-sh#28088, oven-sh#28104. --- **Re: duplicate-bot flagging oven-sh#26385** — that PR fixes a different lock inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around `main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex` self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex` inversion. They're complementary, not duplicates. **Re: find-issues-bot flagging oven-sh#18919** — tested the repro; it hits a separate pre-existing use-after-poison in the File Watcher thread that also reproduces on main without this change. Not claiming that fix here.

@chrislloyd

…9391) ## What does this PR do? Targeted fix for four concurrency bugs in `src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock or crash. This is the minimal port of oven-sh#27469 (by @chrislloyd) onto current main — previous attempts (oven-sh#27957, oven-sh#28088, oven-sh#28104) added scope and introduced ASAN regressions. This PR applies only the defer-ordering fix. ## Root causes ### `unregisterWatcher` self-deadlock (primary) The deinit-on-last-watcher `defer` was registered *after* the `mutex.lock()`/`defer mutex.unlock()` pair: ```zig this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } ``` Zig defers fire LIFO, so `deinit()` ran while still holding `this.mutex`. `deinit()` re-acquires `this.mutex` when `hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock (observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock detected` on Linux). Also UAF: if `deinit()` completes it calls `destroy(this)`, then the deferred `mutex.unlock()` writes to freed memory. ### `unrefPendingTask` UAF Same shape: `deinit()` called while holding `this.mutex`. If it completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF. ### `unrefPendingDirectory` self-deadlock `deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while holding → self-deadlock. ### `processWatcher` AB/BA lock inversion Worker thread holds `watcher.mutex` → calls `manager._decrementPathRef()` on the OOM error path, which acquires `manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` → wants `watcher.mutex`. Classic AB/BA deadlock. ## Fix All four use the same pattern: set a `should_deinit` flag under the lock, checked by a `defer` registered *before* the lock/unlock pair. LIFO ordering then yields unlock → deinit. For `processWatcher`, capture the append result and release `watcher.mutex` before calling `_decrementPathRef`. No semantic changes to when the `has_pending_*` atomics are cleared — the conditions remain exactly as before, only `deinit()` is moved outside the lock. ## How did you verify your code works? - `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes - Without the fix (src/ stashed), the same test fails with `panic: Deadlock detected` from the debug-build mutex checker - All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under ASAN (these regressed in oven-sh#28104) - `test/regression/issue/3657.test.ts` passes under ASAN - `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing root-permission failures (same on main) Supersedes oven-sh#27469, oven-sh#27957, oven-sh#28088, oven-sh#28104. --- **Re: duplicate-bot flagging oven-sh#26385** — that PR fixes a different lock inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around `main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex` self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex` inversion. They're complementary, not duplicates. **Re: find-issues-bot flagging oven-sh#18919** — tested the repro; it hits a separate pre-existing use-after-poison in the File Watcher thread that also reproduces on main without this change. Not claiming that fix here.

- fs.watch deadlock no Bun: fallback para chokidar com usePolling quando detectado runtime Bun (oven-sh/bun#27469) - Keychain macOS sem timeout: adiciona timeout 1.5s em todas as chamadas ao `security find-generic-password` - Detecção de IDE travando: substitui N chamadas síncronas por getAncestorCommandsAsync único com timeout 3s - Adiciona documentação detalhada em docs/fix-hang-v0.17.0.md Co-Authored-By: OpenClaude (mimo-v2.5-pro) <openclaude@gitlawb.com>

Mirror claude-code's skillChangeDetector: force chokidar into stat() polling under Bun to avoid the fs.watch PathWatcherManager deadlock (oven-sh/bun#27469), add awaitWriteFinish, ignore .git/node_modules + special files, and tolerate permission errors. React to SKILL.md add/change/unlink under the watched root.

coderabbitai Bot reviewed Feb 26, 2026

View reviewed changes

Comment thread src/bun.js/node/path_watcher.zig

chrislloyd and others added 2 commits March 2, 2026 10:59

add one more deinit case (unrefPendingDirectory)

59962f7

alii force-pushed the path-watcher-deadlock-fix branch from a0d750f to 59962f7 Compare March 2, 2026 19:00

coderabbitai Bot reviewed Mar 2, 2026

View reviewed changes

Comment thread src/bun.js/node/path_watcher.zig Outdated

Jarred-Sumner requested changes Mar 2, 2026

View reviewed changes

regression test

5c57817

coderabbitai Bot reviewed Mar 4, 2026

View reviewed changes

Comment thread test/js/node/watch/fs.watch.close-midscan.test.ts Outdated

alii and others added 4 commits March 3, 2026 17:20

Add some bounds

72ba748

Revert "regression test"

19a2f4c

This reverts commit 5c57817.

Merge branch 'main' into path-watcher-deadlock-fix

d5d00aa

Merge branch 'main' into path-watcher-deadlock-fix

4c12c77

alii requested a review from Jarred-Sumner March 9, 2026 18:08

robobun reviewed Mar 9, 2026

View reviewed changes

robobun mentioned this pull request Mar 9, 2026

Fix PathWatcherManager deadlock and UAF in deferred deinit #27954

Closed

robobun mentioned this pull request Mar 9, 2026

fix: PathWatcherManager deadlock and UAF in deferred deinit #27957

Closed

3 tasks

alii closed this Mar 9, 2026

robobun mentioned this pull request Mar 14, 2026

Fix deadlocks and UAF in PathWatcherManager #28086

Closed

3 tasks

robobun mentioned this pull request Mar 14, 2026

Fix PathWatcherManager deadlock and UAF in deferred deinit #28088

Closed

robobun mentioned this pull request Mar 14, 2026

Fix deadlock, UAF, and lock inversion in PathWatcher #28104

Closed

3 tasks

robobun mentioned this pull request Apr 16, 2026

ai slop #29382

Closed

robobun mentioned this pull request Apr 17, 2026

Fix PathWatcherManager deadlock and UAF in deferred deinit #29391

Merged

claude Bot mentioned this pull request Apr 29, 2026

fs.watch: fix PathWatcher double-free race between close() and DirectoryRegisterTask #29936

Closed

This was referenced May 26, 2026

fix: corrige 3 travamentos no startup e runtime (v0.17.0) mat-dgruber/openclaude#2

Merged

fix: corrige 3 travamentos no startup e runtime (v0.17.0) Edwardmaster7/openclaude#8

Merged

Conversation

chrislloyd commented Feb 26, 2026

unregisterWatcher self-deadlock (primary)

unrefPendingTask UAF

processWatcher AB/BA lock inversion

Field report

No regression test included

Uh oh!

coderabbitai Bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jarred-Sumner left a comment

Choose a reason for hiding this comment

Uh oh!

alii commented Mar 2, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alii commented Mar 9, 2026

Uh oh!

robobun commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Build fix — all Zig builds green ✅

Uh oh!

robobun left a comment

Choose a reason for hiding this comment

Zig Build Fix

Uh oh!

alii commented Mar 9, 2026

Uh oh!

robobun commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alii commented Mar 9, 2026

Uh oh!

robobun commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

`unregisterWatcher` self-deadlock (primary)

`unrefPendingTask` UAF

`processWatcher` AB/BA lock inversion

coderabbitai Bot commented Feb 26, 2026 •

edited

Loading

robobun commented Mar 9, 2026 •

edited

Loading

robobun commented Mar 9, 2026 •

edited

Loading

robobun commented Mar 14, 2026 •

edited

Loading