fs.watch: fix pending_activity_count underflow leaking non-persistent FSWatcher#29907
Conversation
… FSWatcher
FSWatcher.close() set closed=true before calling refTask(), so refTask()
returned false without incrementing and the paired unrefTask() still ran,
followed by the final unrefTask(). Net effect of close() on the counter was
-2.
pending_activity_count starts at 1; initJS() only added +1 when persistent.
For {persistent:false} watchers, close() took 1 -> -1 which wraps to u32 MAX,
so hasPendingActivity() returned true forever and the native FSWatcher plus
its cached listener closure (and the JS FSWatcher EventEmitter it closes
over) were pinned as GC roots — a permanent leak per watcher instance.
Persistent watchers landed at 0 only by accident (2 - 2).
- close(): increment the counter directly instead of calling refTask() so
the pair around emitJS(close) is actually balanced.
- initJS(): drop the persistent-gated +1. The initial count of 1 already
keeps hasPendingActivity() true until close(); persistence only affects
the event-loop keep-alive (poll_ref), not GC reachability. This also
fixes the count for watchers that call ref()/unref() before close().
- unrefTask(): add a debug assertion to catch future underflow.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
Review rate limit: 0/5 reviews remaining, refill in 2 minutes and 18 seconds. Comment |
|
Updated 7:25 PM PT - Apr 28th, 2026
❌ @robobun, your commit 2fed38c has 4 failures in
🧪 To try this PR locally: bunx bun-pr 29907That installs a local version of the PR into your bun-29907 --bun |
…ance run()'s unrefTask()
There was a problem hiding this comment.
Thanks for the Windows appendAbort fix — the counter now balances on every path I traced; deferring to a human only because this touches atomic hasPendingActivity refcounting across both platform task paths.
Extended reasoning...
Overview
Fixes a pending_activity_count underflow in FSWatcher.close() that permanently leaked fs.watch(path, {persistent:false}) instances. Touches src/bun.js/node/node_fs_watcher.zig (~15 LOC: direct fetchAdd in close() instead of the no-op refTask(), drop the persistent-gated +1 in initJS(), add refTask() guard in FSWatchTaskWindows.appendAbort, add debugAssert(prev > 0) in unrefTask()) and adds a FinalizationRegistry-based regression test.
Prior feedback
My previous review flagged that FSWatchTaskWindows.appendAbort enqueued without bumping the counter, which would underflow (and trip the new assert) on Windows for pre-aborted signals. Commit 929f918 adds if (!ctx.refTask()) return; there, matching onPathUpdateWindows and the posix enqueue() path. I re-traced the abort-before-creation flow on Windows and it now lands at exactly 0.
Verification
I walked the counter through: plain close(); abort-before-creation (posix + windows); abort-after-creation via signal.listen; emitError → close; re-entrant close() from the close listener; and close() racing an already-enqueued task. All end at 0 with no underflow. doRef/doUnref only touch poll_ref, so user ref()/unref() no longer skews the count. The new debug assert is release-no-op.
Security risks
None — internal refcount bookkeeping for GC reachability; no user-controlled input reaches new code paths.
Level of scrutiny
Moderate. The diff is small and follows the existing emitAbort() pattern, but it changes the invariant governing hasPendingActivity() (which gates GC finalization) under a mutex with a concurrent watcher thread, across two divergent platform task implementations. Getting this wrong means either a leak or a use-after-free, so a maintainer familiar with the FSWatcher lifecycle should sign off.
Other factors
No CODEOWNERS entry for this file. Regression test covers both persistent values via subprocess + FinalizationRegistry. Existing abort-signal and "immediately closing" tests exercise the other paths.
… FSWatcher (oven-sh#29907) ## What does this PR do? Fixes a permanent per-instance leak of `fs.watch(path, { persistent: false })` watchers after `.close()`. ## Reproduction ```js const fs = require("fs"); let collected = 0; const registry = new FinalizationRegistry(() => collected++); for (let i = 0; i < 64; i++) { const w = fs.watch("/tmp", { persistent: false }, () => {}); registry.register(w); w.close(); } for (let i = 0; i < 30; i++) { Bun.gc(true); await Bun.sleep(10); } console.log(collected); // before: 0, after: 64 ``` Before this change, **0** of 64 are collected. With `{persistent: true}`, all 64 are collected (by accident, see below). ## Root cause `pending_activity_count` starts at `1`. `initJS()` adds `+1` only when `persistent` is true (so non-persistent stays at `1`, persistent becomes `2`). In `close()`: 1. `closed` is set to `true` under the mutex. 2. `_ = this.refTask()` is called — but `refTask()` checks `if (this.closed) return false` and does **not** increment. The return value is discarded. 3. `this.unrefTask()` runs anyway → `-1`. 4. Final `this.unrefTask()` → `-1`. Net effect of `close()` is `-2`. For persistent (start = 2) this lands at `0` by coincidence. For non-persistent (start = 1) it wraps the `u32` to `4294967295`, so `hasPendingActivity()` returns `true` forever. The native `FSWatcher`, its cached listener closure, and the JS `FSWatcher` EventEmitter it closes over are all pinned as GC roots — a permanent leak per watcher. ## Fix - **`close()`**: increment the counter directly via `fetchAdd` instead of calling `refTask()`, so the ref/unref pair around `emitJS(close)` is actually balanced (net 0). Matches the existing pattern in `emitAbort()`. - **`initJS()`**: drop the persistent-gated `+1`. The initial count of `1` already keeps `hasPendingActivity()` true until `close()`; persistence is about the event-loop keep-alive (`poll_ref`), not GC reachability. This extra `+1` was only ever balanced by the broken `refTask()`/`unrefTask()` pair in `close()`, and it produced wrong counts when the user called `ref()`/`unref()` before `close()`. - **`unrefTask()`**: add a `bun.debugAssert(prev > 0)` to catch future underflow in debug builds. With the fix, the count is `1` until `close()` releases it (plus `+1` per in-flight `FSWatchTask`), regardless of persistence or `ref()`/`unref()`. ## Verification - New test `closed FSWatcher is collectable > persistent: {false,true}` in `test/js/node/watch/fs.watch.test.ts` creates 64 watchers, closes them, and uses a `FinalizationRegistry` to assert ≥32 are collected. Without the fix, `persistent: false` collects 0. - Full `fs.watch.test.ts` suite passes (including abort-signal tests, `immediately closing` with 100× persistent + 100× non-persistent, and `calling close from close event`). - `bun run zig:check-all` passes on all targets. --------- Co-authored-by: robobun <robobun@users.noreply.github.com>
What does this PR do?
Fixes a permanent per-instance leak of
fs.watch(path, { persistent: false })watchers after.close().Reproduction
Before this change, 0 of 64 are collected. With
{persistent: true}, all 64 are collected (by accident, see below).Root cause
pending_activity_countstarts at1.initJS()adds+1only whenpersistentis true (so non-persistent stays at1, persistent becomes2).In
close():closedis set totrueunder the mutex._ = this.refTask()is called — butrefTask()checksif (this.closed) return falseand does not increment. The return value is discarded.this.unrefTask()runs anyway →-1.this.unrefTask()→-1.Net effect of
close()is-2. For persistent (start = 2) this lands at0by coincidence. For non-persistent (start = 1) it wraps theu32to4294967295, sohasPendingActivity()returnstrueforever. The nativeFSWatcher, its cached listener closure, and the JSFSWatcherEventEmitter it closes over are all pinned as GC roots — a permanent leak per watcher.Fix
close(): increment the counter directly viafetchAddinstead of callingrefTask(), so the ref/unref pair aroundemitJS(close)is actually balanced (net 0). Matches the existing pattern inemitAbort().initJS(): drop the persistent-gated+1. The initial count of1already keepshasPendingActivity()true untilclose(); persistence is about the event-loop keep-alive (poll_ref), not GC reachability. This extra+1was only ever balanced by the brokenrefTask()/unrefTask()pair inclose(), and it produced wrong counts when the user calledref()/unref()beforeclose().unrefTask(): add abun.debugAssert(prev > 0)to catch future underflow in debug builds.With the fix, the count is
1untilclose()releases it (plus+1per in-flightFSWatchTask), regardless of persistence orref()/unref().Verification
closed FSWatcher is collectable > persistent: {false,true}intest/js/node/watch/fs.watch.test.tscreates 64 watchers, closes them, and uses aFinalizationRegistryto assert ≥32 are collected. Without the fix,persistent: falsecollects 0.fs.watch.test.tssuite passes (including abort-signal tests,immediately closingwith 100× persistent + 100× non-persistent, andcalling close from close event).bun run zig:check-allpasses on all targets.