Fix PathWatcherManager deadlock and UAF in deferred deinit#27469
Fix PathWatcherManager deadlock and UAF in deferred deinit#27469chrislloyd wants to merge 7 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughModified path watcher implementation to add deferred deinitialization guards with a Changes
🚥 Pre-merge checks | ✅ 1 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 49-53: The code only clears has_pending_tasks when
deinit_on_last_task is true, leaving has_pending_tasks stuck if all tasks finish
before deinit starts; change the logic in the pending-tasks decrement block
(where pending_tasks, deinit_on_last_task, has_pending_tasks.store, and
should_deinit are used) to always clear has_pending_tasks when pending_tasks ==
0 (call has_pending_tasks.store(false, .release)), and then set should_deinit =
true only if deinit_on_last_task is also true.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📥 Commits
Reviewing files that changed from the base of the PR and between 30e609e and 1fce28bdbe6a3f577dd58e19cdc98869825572df.
📒 Files selected for processing (2)
docs/bundler/executables.mdxsrc/bun.js/node/path_watcher.zig
Three issues in the deferred-deinit pattern used by
PathWatcherManager, all triggered by rapid fs.watch() churn.
## unregisterWatcher self-deadlock (primary)
The deinit-on-last-watcher defer was registered AFTER the
mutex.lock()/defer mutex.unlock() pair:
this.mutex.lock();
defer this.mutex.unlock(); // fires last
var watchers = this.watchers.slice();
defer { // fires BEFORE unlock
if (this.deinit_on_last_watcher and this.watcher_count == 0) {
this.deinit(); // called holding this.mutex
}
}
Zig defers fire LIFO, so deinit() ran while still holding
this.mutex. deinit() then re-acquires this.mutex when
hasPendingTasks() is true (a DirectoryRegisterTask running on
the workpool). os_unfair_lock is non-recursive -> the thread
blocks in __ulock_wait2 forever.
Trigger: onError() called (sets deinit_on_last_watcher), then
last watcher closed while a workpool task is mid-flight.
Also UAF: if deinit() completes it calls destroy(this), then
defer this.mutex.unlock() fires on freed memory.
Fix: hoist to a should_deinit flag checked by a defer
registered before the lock. LIFO ordering now yields
unlock -> deinit.
## unrefPendingTask UAF
Same shape: this.deinit() called while holding this.mutex. In
this path has_pending_tasks is stored false just before, so
deinit() doesn't re-lock -- but if it completes, destroy(this)
means the defer this.mutex.unlock() is a UAF. Same fix pattern.
## processWatcher AB/BA lock inversion
Worker thread holds watcher.mutex (processWatcher) -> calls
manager._decrementPathRef() on the OOM error path, which
acquires manager.mutex. Main thread unregisterWatcher holds
manager.mutex -> wants watcher.mutex. Classic AB/BA. Only on
allocation failure during file_paths.append, but real under
memory pressure.
Fix: capture append result, unlock watcher.mutex BEFORE
calling _decrementPathRef.
## Test plan
- zig fmt --check + zig ast-check (Bun's zig 0.15.2)
- bun run zig:check (full semantic analysis, 5/5)
- Full compile to bun-zig.o (asan + noasan), symbols verified
- No JS regression test: the trigger (onError() firing) requires
platform-level watch loop failure which isn't reproducible
via rmSync of watched paths (inotify/kqueue return deletion
events, not errors). Stress tests pass on unpatched Linux.
:house: Remote-Dev: homespace
a0d750f to
59962f7
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/bun.js/node/path_watcher.zig (1)
49-52:⚠️ Potential issue | 🟠 MajorClear
has_pending_taskswheneverpending_tasksreaches zero.At Line 50, clearing the atomic flag is gated by
deinit_on_last_task. If tasks drain before deinit starts,has_pending_taskscan remain true and deinit can be deferred indefinitely with zero real tasks.🔧 Proposed fix
this.pending_tasks -= 1; - if (this.deinit_on_last_task and this.pending_tasks == 0) { + if (this.pending_tasks == 0) { this.has_pending_tasks.store(false, .release); - should_deinit = true; + if (this.deinit_on_last_task) { + should_deinit = true; + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/bun.js/node/path_watcher.zig` around lines 49 - 52, The atomic flag has_pending_tasks should be cleared whenever pending_tasks drops to zero, not only when deinit_on_last_task is true; modify the block in the code that decrements this.pending_tasks (the area around pending_tasks -= 1, deinit_on_last_task, has_pending_tasks.store, and should_deinit) so that after decrementing you check if this.pending_tasks == 0 and always call this.has_pending_tasks.store(false, .release) when zero, and only set should_deinit = true when deinit_on_last_task is true (preserving the existing deinit behavior).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 853-856: The atomic flag has_pending_directories is only cleared
when isClosed() is true, leaving hasPendingDirectories() stale when
pending_directories drains to zero before close; update the logic in the block
around pending_directories to clear has_pending_directories (call .store(false,
.release)) whenever pending_directories == 0 regardless of isClosed(), and
ensure deinit()'s early-return condition still works by setting should_deinit
when both isClosed() and pending_directories == 0 (or when appropriate per
current deinit logic); look for the symbols has_pending_directories,
pending_directories, isClosed(), and deinit() to make the change.
---
Duplicate comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 49-52: The atomic flag has_pending_tasks should be cleared
whenever pending_tasks drops to zero, not only when deinit_on_last_task is true;
modify the block in the code that decrements this.pending_tasks (the area around
pending_tasks -= 1, deinit_on_last_task, has_pending_tasks.store, and
should_deinit) so that after decrementing you check if this.pending_tasks == 0
and always call this.has_pending_tasks.store(false, .release) when zero, and
only set should_deinit = true when deinit_on_last_task is true (preserving the
existing deinit behavior).
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📥 Commits
Reviewing files that changed from the base of the PR and between 1fce28bdbe6a3f577dd58e19cdc98869825572df and 59962f7.
📒 Files selected for processing (1)
src/bun.js/node/path_watcher.zig
Jarred-Sumner
left a comment
There was a problem hiding this comment.
Can we come up with a test that reproduces the issue with ASAN enabled? It sometimes showed up with the watch-many-dirs test? Maybe we can get a more isolated version of that test?
|
I have a test locally I will push up soon. It fails in 1.3.10 and passes with this PR |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/js/node/watch/fs.watch.close-midscan.test.ts`:
- Around line 103-104: The test currently asserts cleanedStderr does not contain
"panic:" (using cleanedStderr), which violates the guideline; remove that
negative panic check and instead either (A) restrict the test to debug builds by
adding an explicit skip/guard so it only runs when DebugImpl is active (use the
existing DebugImpl presence check or the test's skip condition), or (B) replace
the panic absence assertion with a behavior-based assertion such as a
timeout-based check that the watcher process exits (or does not hang) within a
bounded time; update the test logic around cleanedStderr and the skip condition
accordingly so the test provides meaningful protection in CI.
ℹ️ Review info
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
test/js/node/watch/fs.watch.close-midscan.test.ts
|
@robobun get CI back to green - there is a zig build error. |
Build fix — all Zig builds green ✅Fix PR: #27954 — all 12 Zig builds pass across all platforms (were all 13 red before). Error was: CI: 52 passed, 12 test failures — all are pre-existing infra flakes (same 12 failures on unrelated PRs #39081, #39080). Not caused by this PR. Ready for @chrislloyd or @alii to merge #27954 into the branch. |
robobun
left a comment
There was a problem hiding this comment.
Zig Build Fix
The build error across all platforms is:
src/http.zig:49:19: error: no field named 'server_name' in struct 'ptr.shared.WithOptions(*bun.js.api.server.SSLConfig, ...)'
Root cause: The merge with main pulled in #27838 (SSLConfig SharedPtr refactor) which changed tls_props from a raw pointer to ?SSLConfig.SharedPtr. The getTlsHostname function (from #27891) accesses props.server_name directly, but props is now a SharedPtr wrapper, not the inner SSLConfig.
Fix — one line, src/http.zig:49:
- if (props.server_name) |sn| {
+ if (props.get().server_name) |sn| {See line 1481 of the same file for the correct access pattern: tls.get().*.
|
@robobun make a new pr with this fix |
|
Created new PR #27957 with the combined fix targeting main. Waiting for CI... |
|
replacing with #27957 |
|
🔨 Building PR debug binary. Test proof confirmed: |
Four fixes in the deferred-deinit pattern used by PathWatcherManager, all triggered by rapid fs.watch() churn: 1. unregisterWatcher self-deadlock: deinit() was called while holding this.mutex. deinit() re-acquires the same non-recursive mutex when hasPendingTasks() is true, causing self-deadlock. Fix: hoist to a should_deinit flag checked by a defer registered before the lock. 2. unrefPendingTask UAF: same shape — deinit() called while holding mutex. If deinit() completes and calls destroy(this), the deferred mutex.unlock() writes to freed memory. Same fix pattern. 3. processWatcher AB/BA lock inversion: worker thread holds watcher.mutex then calls _decrementPathRef which acquires manager.mutex. Main thread unregisterWatcher holds manager.mutex then wants watcher.mutex. Fix: unlock watcher.mutex before calling _decrementPathRef. 4. unrefPendingDirectory self-deadlock: deinit() calls setClosed() which re-locks this.mutex. Same should_deinit pattern. 5. onError lock ordering: acquired this.mutex then default_manager_mutex, but deinit() acquires them in reverse order. Fix: release this.mutex before acquiring default_manager_mutex. Based on #27469 by @chrislloyd.
Root cause: AB/BA lock ordering between manager.mutex and main_watcher.mutex. The watcher thread holds main_watcher.mutex then acquires manager.mutex (via onFileUpdate), while _decrementPathRefNoLock calls main_watcher.remove() under manager.mutex — classic lock inversion deadlock. Fix: _decrementPathRefNoLock now returns the hash for deferred removal. All callers call main_watcher.remove() AFTER releasing manager.mutex, eliminating the inversion. Additional fixes: - unregisterWatcher: defer deinit() to after mutex release to prevent self-deadlock on non-recursive mutex - unrefPendingTask/unrefPendingDirectory: same deferred-deinit pattern to prevent UAF (destroy while mutex held) - PathWatcher.deinit: combine setClosed + hasPendingDirectories check under single mutex hold to prevent double-deinit race - onError: release manager.mutex before acquiring default_manager_mutex to maintain consistent lock ordering - deinit: check watcher_count and pending_tasks under mutex - processWatcher: release watcher.mutex before _decrementPathRef - Clone file paths in addFile/addDirectory to prevent UAF - Skip evicted entries in onFileUpdate Closes #27469
Four call sites in path_watcher.zig called deinit() while still holding this.mutex. Since deinit() re-acquires the same non-recursive mutex (and may destroy(this)), this either self-deadlocked or UAF'd on the deferred unlock(). Fix: hoist to a should_deinit flag set under the lock, checked by a defer registered *before* the lock/unlock pair. Zig defers fire LIFO, so the deinit defer fires last (after unlock). Also fixes AB/BA lock inversion in processWatcher: worker thread held watcher.mutex while calling _decrementPathRef (acquires manager.mutex), while unregisterWatcher acquires them in the opposite order. Port of #27469 by @chrislloyd onto current main.
## What does this PR do? Targeted fix for four concurrency bugs in `src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock or crash. This is the minimal port of #27469 (by @chrislloyd) onto current main — previous attempts (#27957, #28088, #28104) added scope and introduced ASAN regressions. This PR applies only the defer-ordering fix. ## Root causes ### `unregisterWatcher` self-deadlock (primary) The deinit-on-last-watcher `defer` was registered *after* the `mutex.lock()`/`defer mutex.unlock()` pair: ```zig this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } ``` Zig defers fire LIFO, so `deinit()` ran while still holding `this.mutex`. `deinit()` re-acquires `this.mutex` when `hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock (observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock detected` on Linux). Also UAF: if `deinit()` completes it calls `destroy(this)`, then the deferred `mutex.unlock()` writes to freed memory. ### `unrefPendingTask` UAF Same shape: `deinit()` called while holding `this.mutex`. If it completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF. ### `unrefPendingDirectory` self-deadlock `deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while holding → self-deadlock. ### `processWatcher` AB/BA lock inversion Worker thread holds `watcher.mutex` → calls `manager._decrementPathRef()` on the OOM error path, which acquires `manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` → wants `watcher.mutex`. Classic AB/BA deadlock. ## Fix All four use the same pattern: set a `should_deinit` flag under the lock, checked by a `defer` registered *before* the lock/unlock pair. LIFO ordering then yields unlock → deinit. For `processWatcher`, capture the append result and release `watcher.mutex` before calling `_decrementPathRef`. No semantic changes to when the `has_pending_*` atomics are cleared — the conditions remain exactly as before, only `deinit()` is moved outside the lock. ## How did you verify your code works? - `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes - Without the fix (src/ stashed), the same test fails with `panic: Deadlock detected` from the debug-build mutex checker - All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under ASAN (these regressed in #28104) - `test/regression/issue/3657.test.ts` passes under ASAN - `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing root-permission failures (same on main) Supersedes #27469, #27957, #28088, #28104. --- **Re: duplicate-bot flagging #26385** — that PR fixes a different lock inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around `main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex` self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex` inversion. They're complementary, not duplicates. **Re: find-issues-bot flagging #18919** — tested the repro; it hits a separate pre-existing use-after-poison in the File Watcher thread that also reproduces on main without this change. Not claiming that fix here.
…9391) ## What does this PR do? Targeted fix for four concurrency bugs in `src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock or crash. This is the minimal port of oven-sh#27469 (by @chrislloyd) onto current main — previous attempts (oven-sh#27957, oven-sh#28088, oven-sh#28104) added scope and introduced ASAN regressions. This PR applies only the defer-ordering fix. ## Root causes ### `unregisterWatcher` self-deadlock (primary) The deinit-on-last-watcher `defer` was registered *after* the `mutex.lock()`/`defer mutex.unlock()` pair: ```zig this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } ``` Zig defers fire LIFO, so `deinit()` ran while still holding `this.mutex`. `deinit()` re-acquires `this.mutex` when `hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock (observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock detected` on Linux). Also UAF: if `deinit()` completes it calls `destroy(this)`, then the deferred `mutex.unlock()` writes to freed memory. ### `unrefPendingTask` UAF Same shape: `deinit()` called while holding `this.mutex`. If it completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF. ### `unrefPendingDirectory` self-deadlock `deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while holding → self-deadlock. ### `processWatcher` AB/BA lock inversion Worker thread holds `watcher.mutex` → calls `manager._decrementPathRef()` on the OOM error path, which acquires `manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` → wants `watcher.mutex`. Classic AB/BA deadlock. ## Fix All four use the same pattern: set a `should_deinit` flag under the lock, checked by a `defer` registered *before* the lock/unlock pair. LIFO ordering then yields unlock → deinit. For `processWatcher`, capture the append result and release `watcher.mutex` before calling `_decrementPathRef`. No semantic changes to when the `has_pending_*` atomics are cleared — the conditions remain exactly as before, only `deinit()` is moved outside the lock. ## How did you verify your code works? - `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes - Without the fix (src/ stashed), the same test fails with `panic: Deadlock detected` from the debug-build mutex checker - All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under ASAN (these regressed in oven-sh#28104) - `test/regression/issue/3657.test.ts` passes under ASAN - `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing root-permission failures (same on main) Supersedes oven-sh#27469, oven-sh#27957, oven-sh#28088, oven-sh#28104. --- **Re: duplicate-bot flagging oven-sh#26385** — that PR fixes a different lock inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around `main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex` self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex` inversion. They're complementary, not duplicates. **Re: find-issues-bot flagging oven-sh#18919** — tested the repro; it hits a separate pre-existing use-after-poison in the File Watcher thread that also reproduces on main without this change. Not claiming that fix here.
…9391) ## What does this PR do? Targeted fix for four concurrency bugs in `src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock or crash. This is the minimal port of oven-sh#27469 (by @chrislloyd) onto current main — previous attempts (oven-sh#27957, oven-sh#28088, oven-sh#28104) added scope and introduced ASAN regressions. This PR applies only the defer-ordering fix. ## Root causes ### `unregisterWatcher` self-deadlock (primary) The deinit-on-last-watcher `defer` was registered *after* the `mutex.lock()`/`defer mutex.unlock()` pair: ```zig this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } ``` Zig defers fire LIFO, so `deinit()` ran while still holding `this.mutex`. `deinit()` re-acquires `this.mutex` when `hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock (observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock detected` on Linux). Also UAF: if `deinit()` completes it calls `destroy(this)`, then the deferred `mutex.unlock()` writes to freed memory. ### `unrefPendingTask` UAF Same shape: `deinit()` called while holding `this.mutex`. If it completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF. ### `unrefPendingDirectory` self-deadlock `deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while holding → self-deadlock. ### `processWatcher` AB/BA lock inversion Worker thread holds `watcher.mutex` → calls `manager._decrementPathRef()` on the OOM error path, which acquires `manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` → wants `watcher.mutex`. Classic AB/BA deadlock. ## Fix All four use the same pattern: set a `should_deinit` flag under the lock, checked by a `defer` registered *before* the lock/unlock pair. LIFO ordering then yields unlock → deinit. For `processWatcher`, capture the append result and release `watcher.mutex` before calling `_decrementPathRef`. No semantic changes to when the `has_pending_*` atomics are cleared — the conditions remain exactly as before, only `deinit()` is moved outside the lock. ## How did you verify your code works? - `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes - Without the fix (src/ stashed), the same test fails with `panic: Deadlock detected` from the debug-build mutex checker - All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under ASAN (these regressed in oven-sh#28104) - `test/regression/issue/3657.test.ts` passes under ASAN - `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing root-permission failures (same on main) Supersedes oven-sh#27469, oven-sh#27957, oven-sh#28088, oven-sh#28104. --- **Re: duplicate-bot flagging oven-sh#26385** — that PR fixes a different lock inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around `main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex` self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex` inversion. They're complementary, not duplicates. **Re: find-issues-bot flagging oven-sh#18919** — tested the repro; it hits a separate pre-existing use-after-poison in the File Watcher thread that also reproduces on main without this change. Not claiming that fix here.
- fs.watch deadlock no Bun: fallback para chokidar com usePolling quando detectado runtime Bun (oven-sh/bun#27469) - Keychain macOS sem timeout: adiciona timeout 1.5s em todas as chamadas ao `security find-generic-password` - Detecção de IDE travando: substitui N chamadas síncronas por getAncestorCommandsAsync único com timeout 3s - Adiciona documentação detalhada em docs/fix-hang-v0.17.0.md Co-Authored-By: OpenClaude (mimo-v2.5-pro) <openclaude@gitlawb.com>
Mirror claude-code's skillChangeDetector: force chokidar into stat() polling under Bun to avoid the fs.watch PathWatcherManager deadlock (oven-sh/bun#27469), add awaitWriteFinish, ignore .git/node_modules + special files, and tolerate permission errors. React to SKILL.md add/change/unlink under the watched root.
Three issues in the deferred-deinit pattern used by
PathWatcherManager, all triggered by rapidfs.watch()churn.unregisterWatcherself-deadlock (primary)The deinit-on-last-watcher
deferwas registered after themutex.lock()/defer mutex.unlock()pair:Zig defers fire LIFO, so
deinit()ran while still holdingthis.mutex.deinit()then re-acquiresthis.mutexwhenhasPendingTasks()is true (aDirectoryRegisterTaskrunning on the workpool).os_unfair_lockis non-recursive → the thread blocks in__ulock_wait2forever.Trigger:
onError()called (setsdeinit_on_last_watcher), then last watcher closed while a workpool task is mid-flight.Also UAF: if
deinit()completes it callsdestroy(this), thendefer this.mutex.unlock()fires on freed memory.Fix: hoist to a
should_deinitflag checked by a defer registered before the lock. LIFO ordering now yields unlock → deinit.unrefPendingTaskUAFSame shape:
this.deinit()called while holdingthis.mutex. In this pathhas_pending_tasksis storedfalsejust before, sodeinit()doesn't re-lock — but if it completes,destroy(this)means thedefer this.mutex.unlock()is a UAF. Same fix pattern.processWatcherAB/BA lock inversionWorker thread holds
watcher.mutex(processWatcher) → callsmanager._decrementPathRef()on the OOM error path, which acquiresmanager.mutex. Main threadunregisterWatcherholdsmanager.mutex→ wantswatcher.mutex. Classic AB/BA. Only on allocation failure duringfile_paths.append, but real under memory pressure.Fix: capture append result, unlock
watcher.mutexbefore calling_decrementPathRef.Field report
This was observed in the wild as all concurrent processes on a machine hanging at 0% CPU when one session's auto-updater touched watched directories, triggering a cascade of file watcher events across every running process. Stack trace:
No regression test included
The trigger (
onError()firing) requires platform-level watch loop failure.rmSyncof watched paths doesn't reproduce it — inotify/kqueue return deletion events, not errors. A stress-test fixture that tried this passed on unpatched Linux and would give false confidence. The defer-ordering comments are the regression guard.