Skip to content

Fix PathWatcherManager deadlock and UAF in deferred deinit#27469

Closed
chrislloyd wants to merge 7 commits into
oven-sh:mainfrom
chrislloyd:path-watcher-deadlock-fix
Closed

Fix PathWatcherManager deadlock and UAF in deferred deinit#27469
chrislloyd wants to merge 7 commits into
oven-sh:mainfrom
chrislloyd:path-watcher-deadlock-fix

Conversation

@chrislloyd

Copy link
Copy Markdown
Collaborator

Three issues in the deferred-deinit pattern used by PathWatcherManager, all triggered by rapid fs.watch() churn.

unregisterWatcher self-deadlock (primary)

The deinit-on-last-watcher defer was registered after the mutex.lock()/defer mutex.unlock() pair:

this.mutex.lock();
defer this.mutex.unlock();           // fires last
var watchers = this.watchers.slice();
defer {                               // fires BEFORE unlock
    if (this.deinit_on_last_watcher and this.watcher_count == 0) {
        this.deinit();                // called holding this.mutex
    }
}

Zig defers fire LIFO, so deinit() ran while still holding this.mutex. deinit() then re-acquires this.mutex when hasPendingTasks() is true (a DirectoryRegisterTask running on the workpool). os_unfair_lock is non-recursive → the thread blocks in __ulock_wait2 forever.

Trigger: onError() called (sets deinit_on_last_watcher), then last watcher closed while a workpool task is mid-flight.

Also UAF: if deinit() completes it calls destroy(this), then defer this.mutex.unlock() fires on freed memory.

Fix: hoist to a should_deinit flag checked by a defer registered before the lock. LIFO ordering now yields unlock → deinit.

unrefPendingTask UAF

Same shape: this.deinit() called while holding this.mutex. In this path has_pending_tasks is stored false just before, so deinit() doesn't re-lock — but if it completes, destroy(this) means the defer this.mutex.unlock() is a UAF. Same fix pattern.

processWatcher AB/BA lock inversion

Worker thread holds watcher.mutex (processWatcher) → calls manager._decrementPathRef() on the OOM error path, which acquires manager.mutex. Main thread unregisterWatcher holds manager.mutex → wants watcher.mutex. Classic AB/BA. Only on allocation failure during file_paths.append, but real under memory pressure.

Fix: capture append result, unlock watcher.mutex before calling _decrementPathRef.

Field report

This was observed in the wild as all concurrent processes on a machine hanging at 0% CPU when one session's auto-updater touched watched directories, triggering a cascade of file watcher events across every running process. Stack trace:

FSWatcher.close → PathWatcher.deinit → PathWatcherManager.unregisterWatcher
  → _decrementPathRefNoLock → _os_unfair_lock_lock_slow → __ulock_wait2

No regression test included

The trigger (onError() firing) requires platform-level watch loop failure. rmSync of watched paths doesn't reproduce it — inotify/kqueue return deletion events, not errors. A stress-test fixture that tried this passed on unpatched Linux and would give false confidence. The defer-ordering comments are the regression guard.

@coderabbitai

coderabbitai Bot commented Feb 26, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5e96d5b7-3c7b-4a5b-99a7-1422c96a13f6

📥 Commits

Reviewing files that changed from the base of the PR and between 5c57817 and d5d00aa.

📒 Files selected for processing (1)
  • src/bun.js/node/path_watcher.zig

Walkthrough

Modified path watcher implementation to add deferred deinitialization guards with a should_deinit flag. Prevents deinit() from running while locks are held by deferring cleanup operations until after mutex releases in task completion, watcher unregistration, and directory reference handling. Net change: +43/-12 lines.

Changes

Cohort / File(s) Summary
Deferred Deinitialization Safety
src/bun.js/node/path_watcher.zig
Adds should_deinit flag to defer deinitialization until after lock release in unrefPendingTask, unregisterWatcher, and unrefPendingDirectory. Restructures path registration to hold watcher mutex during append then unlock before error handling. Eliminates immediate deinit() calls that could cause deadlocks when locks interact with manager mutex.
🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description provides thorough context on three concurrency bugs, their root causes, fixes, and field evidence. However, it does not follow the required template structure with 'What does this PR do?' and 'How did you verify your code works?' sections. Restructure the description to follow the template by adding explicit 'What does this PR do?' and 'How did you verify your code works?' sections while preserving the detailed technical content.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and concisely describes the main fix: addressing a deadlock and use-after-free bug in PathWatcherManager's deferred deinitialization pattern.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 49-53: The code only clears has_pending_tasks when
deinit_on_last_task is true, leaving has_pending_tasks stuck if all tasks finish
before deinit starts; change the logic in the pending-tasks decrement block
(where pending_tasks, deinit_on_last_task, has_pending_tasks.store, and
should_deinit are used) to always clear has_pending_tasks when pending_tasks ==
0 (call has_pending_tasks.store(false, .release)), and then set should_deinit =
true only if deinit_on_last_task is also true.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 30e609e and 1fce28bdbe6a3f577dd58e19cdc98869825572df.

📒 Files selected for processing (2)
  • docs/bundler/executables.mdx
  • src/bun.js/node/path_watcher.zig

Comment thread src/bun.js/node/path_watcher.zig
chrislloyd and others added 2 commits March 2, 2026 10:59
Three issues in the deferred-deinit pattern used by
PathWatcherManager, all triggered by rapid fs.watch() churn.

## unregisterWatcher self-deadlock (primary)

The deinit-on-last-watcher defer was registered AFTER the
mutex.lock()/defer mutex.unlock() pair:

    this.mutex.lock();
    defer this.mutex.unlock();           // fires last
    var watchers = this.watchers.slice();
    defer {                               // fires BEFORE unlock
        if (this.deinit_on_last_watcher and this.watcher_count == 0) {
            this.deinit();                // called holding this.mutex
        }
    }

Zig defers fire LIFO, so deinit() ran while still holding
this.mutex. deinit() then re-acquires this.mutex when
hasPendingTasks() is true (a DirectoryRegisterTask running on
the workpool). os_unfair_lock is non-recursive -> the thread
blocks in __ulock_wait2 forever.

Trigger: onError() called (sets deinit_on_last_watcher), then
last watcher closed while a workpool task is mid-flight.

Also UAF: if deinit() completes it calls destroy(this), then
defer this.mutex.unlock() fires on freed memory.

Fix: hoist to a should_deinit flag checked by a defer
registered before the lock. LIFO ordering now yields
unlock -> deinit.

## unrefPendingTask UAF

Same shape: this.deinit() called while holding this.mutex. In
this path has_pending_tasks is stored false just before, so
deinit() doesn't re-lock -- but if it completes, destroy(this)
means the defer this.mutex.unlock() is a UAF. Same fix pattern.

## processWatcher AB/BA lock inversion

Worker thread holds watcher.mutex (processWatcher) -> calls
manager._decrementPathRef() on the OOM error path, which
acquires manager.mutex. Main thread unregisterWatcher holds
manager.mutex -> wants watcher.mutex. Classic AB/BA. Only on
allocation failure during file_paths.append, but real under
memory pressure.

Fix: capture append result, unlock watcher.mutex BEFORE
calling _decrementPathRef.

## Test plan

- zig fmt --check + zig ast-check (Bun's zig 0.15.2)
- bun run zig:check (full semantic analysis, 5/5)
- Full compile to bun-zig.o (asan + noasan), symbols verified
- No JS regression test: the trigger (onError() firing) requires
  platform-level watch loop failure which isn't reproducible
  via rmSync of watched paths (inotify/kqueue return deletion
  events, not errors). Stress tests pass on unpatched Linux.

:house: Remote-Dev: homespace
@alii alii force-pushed the path-watcher-deadlock-fix branch from a0d750f to 59962f7 Compare March 2, 2026 19:00

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/bun.js/node/path_watcher.zig (1)

49-52: ⚠️ Potential issue | 🟠 Major

Clear has_pending_tasks whenever pending_tasks reaches zero.

At Line 50, clearing the atomic flag is gated by deinit_on_last_task. If tasks drain before deinit starts, has_pending_tasks can remain true and deinit can be deferred indefinitely with zero real tasks.

🔧 Proposed fix
         this.pending_tasks -= 1;
-        if (this.deinit_on_last_task and this.pending_tasks == 0) {
+        if (this.pending_tasks == 0) {
             this.has_pending_tasks.store(false, .release);
-            should_deinit = true;
+            if (this.deinit_on_last_task) {
+                should_deinit = true;
+            }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/node/path_watcher.zig` around lines 49 - 52, The atomic flag
has_pending_tasks should be cleared whenever pending_tasks drops to zero, not
only when deinit_on_last_task is true; modify the block in the code that
decrements this.pending_tasks (the area around pending_tasks -= 1,
deinit_on_last_task, has_pending_tasks.store, and should_deinit) so that after
decrementing you check if this.pending_tasks == 0 and always call
this.has_pending_tasks.store(false, .release) when zero, and only set
should_deinit = true when deinit_on_last_task is true (preserving the existing
deinit behavior).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 853-856: The atomic flag has_pending_directories is only cleared
when isClosed() is true, leaving hasPendingDirectories() stale when
pending_directories drains to zero before close; update the logic in the block
around pending_directories to clear has_pending_directories (call .store(false,
.release)) whenever pending_directories == 0 regardless of isClosed(), and
ensure deinit()'s early-return condition still works by setting should_deinit
when both isClosed() and pending_directories == 0 (or when appropriate per
current deinit logic); look for the symbols has_pending_directories,
pending_directories, isClosed(), and deinit() to make the change.

---

Duplicate comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 49-52: The atomic flag has_pending_tasks should be cleared
whenever pending_tasks drops to zero, not only when deinit_on_last_task is true;
modify the block in the code that decrements this.pending_tasks (the area around
pending_tasks -= 1, deinit_on_last_task, has_pending_tasks.store, and
should_deinit) so that after decrementing you check if this.pending_tasks == 0
and always call this.has_pending_tasks.store(false, .release) when zero, and
only set should_deinit = true when deinit_on_last_task is true (preserving the
existing deinit behavior).

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1fce28bdbe6a3f577dd58e19cdc98869825572df and 59962f7.

📒 Files selected for processing (1)
  • src/bun.js/node/path_watcher.zig

Comment thread src/bun.js/node/path_watcher.zig Outdated

@Jarred-Sumner Jarred-Sumner left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we come up with a test that reproduces the issue with ASAN enabled? It sometimes showed up with the watch-many-dirs test? Maybe we can get a more isolated version of that test?

@alii

alii commented Mar 2, 2026

Copy link
Copy Markdown
Member

I have a test locally I will push up soon. It fails in 1.3.10 and passes with this PR

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/js/node/watch/fs.watch.close-midscan.test.ts`:
- Around line 103-104: The test currently asserts cleanedStderr does not contain
"panic:" (using cleanedStderr), which violates the guideline; remove that
negative panic check and instead either (A) restrict the test to debug builds by
adding an explicit skip/guard so it only runs when DebugImpl is active (use the
existing DebugImpl presence check or the test's skip condition), or (B) replace
the panic absence assertion with a behavior-based assertion such as a
timeout-based check that the watcher process exits (or does not hang) within a
bounded time; update the test logic around cleanedStderr and the skip condition
accordingly so the test provides meaningful protection in CI.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 59962f7 and 5c57817.

📒 Files selected for processing (1)
  • test/js/node/watch/fs.watch.close-midscan.test.ts

Comment thread test/js/node/watch/fs.watch.close-midscan.test.ts Outdated
@alii alii requested a review from Jarred-Sumner March 9, 2026 18:08
@alii

alii commented Mar 9, 2026

Copy link
Copy Markdown
Member

@robobun get CI back to green - there is a zig build error.

@robobun

robobun commented Mar 9, 2026

Copy link
Copy Markdown
Collaborator

Build fix — all Zig builds green ✅

Fix PR: #27954 — all 12 Zig builds pass across all platforms (were all 13 red before).

Error was: src/http.zig:49 accessed props.server_name but props is SSLConfig.SharedPtr after the merge with main (#27838). Fixed with props.get().server_name.

CI: 52 passed, 12 test failures — all are pre-existing infra flakes (same 12 failures on unrelated PRs #39081, #39080). Not caused by this PR.

Ready for @chrislloyd or @alii to merge #27954 into the branch.

@robobun robobun left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zig Build Fix

The build error across all platforms is:

src/http.zig:49:19: error: no field named 'server_name' in struct 'ptr.shared.WithOptions(*bun.js.api.server.SSLConfig, ...)'

Root cause: The merge with main pulled in #27838 (SSLConfig SharedPtr refactor) which changed tls_props from a raw pointer to ?SSLConfig.SharedPtr. The getTlsHostname function (from #27891) accesses props.server_name directly, but props is now a SharedPtr wrapper, not the inner SSLConfig.

Fix — one line, src/http.zig:49:

-        if (props.server_name) |sn| {
+        if (props.get().server_name) |sn| {

See line 1481 of the same file for the correct access pattern: tls.get().*.

@alii

alii commented Mar 9, 2026

Copy link
Copy Markdown
Member

@robobun make a new pr with this fix

@robobun

robobun commented Mar 9, 2026

Copy link
Copy Markdown
Collaborator

Created new PR #27957 with the combined fix targeting main. Waiting for CI...

@alii

alii commented Mar 9, 2026

Copy link
Copy Markdown
Member

replacing with #27957

@robobun

robobun commented Mar 14, 2026

Copy link
Copy Markdown
Collaborator

🔨 Building PR debug binary. Test proof confirmed: fs.watch.concurrency.test.ts FAILS with exit code 132 (SIGILL) on baked main debug binary, confirming the UAF bug. Waiting for PR binary build to verify fix (PASS expected). CI builds are running on BuildKite #39581.

robobun pushed a commit that referenced this pull request Mar 14, 2026
Four fixes in the deferred-deinit pattern used by PathWatcherManager,
all triggered by rapid fs.watch() churn:

1. unregisterWatcher self-deadlock: deinit() was called while holding
   this.mutex. deinit() re-acquires the same non-recursive mutex when
   hasPendingTasks() is true, causing self-deadlock. Fix: hoist to a
   should_deinit flag checked by a defer registered before the lock.

2. unrefPendingTask UAF: same shape — deinit() called while holding
   mutex. If deinit() completes and calls destroy(this), the deferred
   mutex.unlock() writes to freed memory. Same fix pattern.

3. processWatcher AB/BA lock inversion: worker thread holds
   watcher.mutex then calls _decrementPathRef which acquires
   manager.mutex. Main thread unregisterWatcher holds manager.mutex
   then wants watcher.mutex. Fix: unlock watcher.mutex before calling
   _decrementPathRef.

4. unrefPendingDirectory self-deadlock: deinit() calls setClosed()
   which re-locks this.mutex. Same should_deinit pattern.

5. onError lock ordering: acquired this.mutex then default_manager_mutex,
   but deinit() acquires them in reverse order. Fix: release this.mutex
   before acquiring default_manager_mutex.

Based on #27469 by @chrislloyd.
robobun pushed a commit that referenced this pull request Mar 14, 2026
Root cause: AB/BA lock ordering between manager.mutex and
main_watcher.mutex. The watcher thread holds main_watcher.mutex
then acquires manager.mutex (via onFileUpdate), while
_decrementPathRefNoLock calls main_watcher.remove() under
manager.mutex — classic lock inversion deadlock.

Fix: _decrementPathRefNoLock now returns the hash for deferred
removal. All callers call main_watcher.remove() AFTER releasing
manager.mutex, eliminating the inversion.

Additional fixes:
- unregisterWatcher: defer deinit() to after mutex release to
  prevent self-deadlock on non-recursive mutex
- unrefPendingTask/unrefPendingDirectory: same deferred-deinit
  pattern to prevent UAF (destroy while mutex held)
- PathWatcher.deinit: combine setClosed + hasPendingDirectories
  check under single mutex hold to prevent double-deinit race
- onError: release manager.mutex before acquiring
  default_manager_mutex to maintain consistent lock ordering
- deinit: check watcher_count and pending_tasks under mutex
- processWatcher: release watcher.mutex before _decrementPathRef
- Clone file paths in addFile/addDirectory to prevent UAF
- Skip evicted entries in onFileUpdate

Closes #27469
@robobun robobun mentioned this pull request Apr 16, 2026
robobun added a commit that referenced this pull request Apr 17, 2026
Four call sites in path_watcher.zig called deinit() while still holding
this.mutex. Since deinit() re-acquires the same non-recursive mutex (and
may destroy(this)), this either self-deadlocked or UAF'd on the deferred
unlock().

Fix: hoist to a should_deinit flag set under the lock, checked by a defer
registered *before* the lock/unlock pair. Zig defers fire LIFO, so the
deinit defer fires last (after unlock).

Also fixes AB/BA lock inversion in processWatcher: worker thread held
watcher.mutex while calling _decrementPathRef (acquires manager.mutex),
while unregisterWatcher acquires them in the opposite order.

Port of #27469 by @chrislloyd onto current main.
Jarred-Sumner pushed a commit that referenced this pull request Apr 17, 2026
## What does this PR do?

Targeted fix for four concurrency bugs in
`src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock
or crash. This is the minimal port of #27469 (by @chrislloyd) onto
current main — previous attempts (#27957, #28088, #28104) added scope
and introduced ASAN regressions. This PR applies only the defer-ordering
fix.

## Root causes

### `unregisterWatcher` self-deadlock (primary)

The deinit-on-last-watcher `defer` was registered *after* the
`mutex.lock()`/`defer mutex.unlock()` pair:

```zig
this.mutex.lock();
defer this.mutex.unlock();           // fires last
var watchers = this.watchers.slice();
defer {                               // fires BEFORE unlock
    if (this.deinit_on_last_watcher and this.watcher_count == 0) {
        this.deinit();                // called holding this.mutex
    }
}
```

Zig defers fire LIFO, so `deinit()` ran while still holding
`this.mutex`. `deinit()` re-acquires `this.mutex` when
`hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock
(observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock
detected` on Linux).

Also UAF: if `deinit()` completes it calls `destroy(this)`, then the
deferred `mutex.unlock()` writes to freed memory.

### `unrefPendingTask` UAF

Same shape: `deinit()` called while holding `this.mutex`. If it
completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF.

### `unrefPendingDirectory` self-deadlock

`deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while
holding → self-deadlock.

### `processWatcher` AB/BA lock inversion

Worker thread holds `watcher.mutex` → calls
`manager._decrementPathRef()` on the OOM error path, which acquires
`manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` →
wants `watcher.mutex`. Classic AB/BA deadlock.

## Fix

All four use the same pattern: set a `should_deinit` flag under the
lock, checked by a `defer` registered *before* the lock/unlock pair.
LIFO ordering then yields unlock → deinit. For `processWatcher`, capture
the append result and release `watcher.mutex` before calling
`_decrementPathRef`.

No semantic changes to when the `has_pending_*` atomics are cleared —
the conditions remain exactly as before, only `deinit()` is moved
outside the lock.

## How did you verify your code works?

- `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes
- Without the fix (src/ stashed), the same test fails with `panic:
Deadlock detected` from the debug-build mutex checker
- All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under
ASAN (these regressed in #28104)
- `test/regression/issue/3657.test.ts` passes under ASAN
- `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing
root-permission failures (same on main)

Supersedes #27469, #27957, #28088, #28104.

---

**Re: duplicate-bot flagging #26385** — that PR fixes a different lock
inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around
`main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex`
self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex`
inversion. They're complementary, not duplicates.

**Re: find-issues-bot flagging #18919** — tested the repro; it hits a
separate pre-existing use-after-poison in the File Watcher thread that
also reproduces on main without this change. Not claiming that fix here.
structwafel pushed a commit to structwafel/bun that referenced this pull request Apr 25, 2026
…9391)

## What does this PR do?

Targeted fix for four concurrency bugs in
`src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock
or crash. This is the minimal port of oven-sh#27469 (by @chrislloyd) onto
current main — previous attempts (oven-sh#27957, oven-sh#28088, oven-sh#28104) added scope
and introduced ASAN regressions. This PR applies only the defer-ordering
fix.

## Root causes

### `unregisterWatcher` self-deadlock (primary)

The deinit-on-last-watcher `defer` was registered *after* the
`mutex.lock()`/`defer mutex.unlock()` pair:

```zig
this.mutex.lock();
defer this.mutex.unlock();           // fires last
var watchers = this.watchers.slice();
defer {                               // fires BEFORE unlock
    if (this.deinit_on_last_watcher and this.watcher_count == 0) {
        this.deinit();                // called holding this.mutex
    }
}
```

Zig defers fire LIFO, so `deinit()` ran while still holding
`this.mutex`. `deinit()` re-acquires `this.mutex` when
`hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock
(observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock
detected` on Linux).

Also UAF: if `deinit()` completes it calls `destroy(this)`, then the
deferred `mutex.unlock()` writes to freed memory.

### `unrefPendingTask` UAF

Same shape: `deinit()` called while holding `this.mutex`. If it
completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF.

### `unrefPendingDirectory` self-deadlock

`deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while
holding → self-deadlock.

### `processWatcher` AB/BA lock inversion

Worker thread holds `watcher.mutex` → calls
`manager._decrementPathRef()` on the OOM error path, which acquires
`manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` →
wants `watcher.mutex`. Classic AB/BA deadlock.

## Fix

All four use the same pattern: set a `should_deinit` flag under the
lock, checked by a `defer` registered *before* the lock/unlock pair.
LIFO ordering then yields unlock → deinit. For `processWatcher`, capture
the append result and release `watcher.mutex` before calling
`_decrementPathRef`.

No semantic changes to when the `has_pending_*` atomics are cleared —
the conditions remain exactly as before, only `deinit()` is moved
outside the lock.

## How did you verify your code works?

- `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes
- Without the fix (src/ stashed), the same test fails with `panic:
Deadlock detected` from the debug-build mutex checker
- All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under
ASAN (these regressed in oven-sh#28104)
- `test/regression/issue/3657.test.ts` passes under ASAN
- `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing
root-permission failures (same on main)

Supersedes oven-sh#27469, oven-sh#27957, oven-sh#28088, oven-sh#28104.

---

**Re: duplicate-bot flagging oven-sh#26385** — that PR fixes a different lock
inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around
`main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex`
self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex`
inversion. They're complementary, not duplicates.

**Re: find-issues-bot flagging oven-sh#18919** — tested the repro; it hits a
separate pre-existing use-after-poison in the File Watcher thread that
also reproduces on main without this change. Not claiming that fix here.
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…9391)

## What does this PR do?

Targeted fix for four concurrency bugs in
`src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock
or crash. This is the minimal port of oven-sh#27469 (by @chrislloyd) onto
current main — previous attempts (oven-sh#27957, oven-sh#28088, oven-sh#28104) added scope
and introduced ASAN regressions. This PR applies only the defer-ordering
fix.

## Root causes

### `unregisterWatcher` self-deadlock (primary)

The deinit-on-last-watcher `defer` was registered *after* the
`mutex.lock()`/`defer mutex.unlock()` pair:

```zig
this.mutex.lock();
defer this.mutex.unlock();           // fires last
var watchers = this.watchers.slice();
defer {                               // fires BEFORE unlock
    if (this.deinit_on_last_watcher and this.watcher_count == 0) {
        this.deinit();                // called holding this.mutex
    }
}
```

Zig defers fire LIFO, so `deinit()` ran while still holding
`this.mutex`. `deinit()` re-acquires `this.mutex` when
`hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock
(observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock
detected` on Linux).

Also UAF: if `deinit()` completes it calls `destroy(this)`, then the
deferred `mutex.unlock()` writes to freed memory.

### `unrefPendingTask` UAF

Same shape: `deinit()` called while holding `this.mutex`. If it
completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF.

### `unrefPendingDirectory` self-deadlock

`deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while
holding → self-deadlock.

### `processWatcher` AB/BA lock inversion

Worker thread holds `watcher.mutex` → calls
`manager._decrementPathRef()` on the OOM error path, which acquires
`manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` →
wants `watcher.mutex`. Classic AB/BA deadlock.

## Fix

All four use the same pattern: set a `should_deinit` flag under the
lock, checked by a `defer` registered *before* the lock/unlock pair.
LIFO ordering then yields unlock → deinit. For `processWatcher`, capture
the append result and release `watcher.mutex` before calling
`_decrementPathRef`.

No semantic changes to when the `has_pending_*` atomics are cleared —
the conditions remain exactly as before, only `deinit()` is moved
outside the lock.

## How did you verify your code works?

- `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes
- Without the fix (src/ stashed), the same test fails with `panic:
Deadlock detected` from the debug-build mutex checker
- All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under
ASAN (these regressed in oven-sh#28104)
- `test/regression/issue/3657.test.ts` passes under ASAN
- `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing
root-permission failures (same on main)

Supersedes oven-sh#27469, oven-sh#27957, oven-sh#28088, oven-sh#28104.

---

**Re: duplicate-bot flagging oven-sh#26385** — that PR fixes a different lock
inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around
`main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex`
self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex`
inversion. They're complementary, not duplicates.

**Re: find-issues-bot flagging oven-sh#18919** — tested the repro; it hits a
separate pre-existing use-after-poison in the File Watcher thread that
also reproduces on main without this change. Not claiming that fix here.
Edwardmaster7 pushed a commit to Edwardmaster7/openclaude that referenced this pull request May 27, 2026
- fs.watch deadlock no Bun: fallback para chokidar com usePolling
  quando detectado runtime Bun (oven-sh/bun#27469)
- Keychain macOS sem timeout: adiciona timeout 1.5s em todas as
  chamadas ao `security find-generic-password`
- Detecção de IDE travando: substitui N chamadas síncronas por
  getAncestorCommandsAsync único com timeout 3s
- Adiciona documentação detalhada em docs/fix-hang-v0.17.0.md

Co-Authored-By: OpenClaude (mimo-v2.5-pro) <openclaude@gitlawb.com>
Raghav1428 added a commit to KnightCodeAI/knightcode that referenced this pull request Jun 18, 2026
Mirror claude-code's skillChangeDetector: force chokidar into stat() polling
under Bun to avoid the fs.watch PathWatcherManager deadlock (oven-sh/bun#27469),
add awaitWriteFinish, ignore .git/node_modules + special files, and tolerate
permission errors. React to SKILL.md add/change/unlink under the watched root.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants