fs.watch: fix pending_activity_count underflow leaking non-persistent FSWatcher by robobun · Pull Request #29907 · oven-sh/bun

robobun · 2026-04-29T00:41:01Z

What does this PR do?

Fixes a permanent per-instance leak of fs.watch(path, { persistent: false }) watchers after .close().

Reproduction

const fs = require("fs");
let collected = 0;
const registry = new FinalizationRegistry(() => collected++);
for (let i = 0; i < 64; i++) {
  const w = fs.watch("/tmp", { persistent: false }, () => {});
  registry.register(w);
  w.close();
}
for (let i = 0; i < 30; i++) { Bun.gc(true); await Bun.sleep(10); }
console.log(collected); // before: 0, after: 64

Before this change, 0 of 64 are collected. With {persistent: true}, all 64 are collected (by accident, see below).

Root cause

pending_activity_count starts at 1. initJS() adds +1 only when persistent is true (so non-persistent stays at 1, persistent becomes 2).

In close():

closed is set to true under the mutex.
_ = this.refTask() is called — but refTask() checks if (this.closed) return false and does not increment. The return value is discarded.
this.unrefTask() runs anyway → -1.
Final this.unrefTask() → -1.

Net effect of close() is -2. For persistent (start = 2) this lands at 0 by coincidence. For non-persistent (start = 1) it wraps the u32 to 4294967295, so hasPendingActivity() returns true forever. The native FSWatcher, its cached listener closure, and the JS FSWatcher EventEmitter it closes over are all pinned as GC roots — a permanent leak per watcher.

Fix

close(): increment the counter directly via fetchAdd instead of calling refTask(), so the ref/unref pair around emitJS(close) is actually balanced (net 0). Matches the existing pattern in emitAbort().
initJS(): drop the persistent-gated +1. The initial count of 1 already keeps hasPendingActivity() true until close(); persistence is about the event-loop keep-alive (poll_ref), not GC reachability. This extra +1 was only ever balanced by the broken refTask()/unrefTask() pair in close(), and it produced wrong counts when the user called ref()/unref() before close().
unrefTask(): add a bun.debugAssert(prev > 0) to catch future underflow in debug builds.

With the fix, the count is 1 until close() releases it (plus +1 per in-flight FSWatchTask), regardless of persistence or ref()/unref().

Verification

New test closed FSWatcher is collectable > persistent: {false,true} in test/js/node/watch/fs.watch.test.ts creates 64 watchers, closes them, and uses a FinalizationRegistry to assert ≥32 are collected. Without the fix, persistent: false collects 0.
Full fs.watch.test.ts suite passes (including abort-signal tests, immediately closing with 100× persistent + 100× non-persistent, and calling close from close event).
bun run zig:check-all passes on all targets.

… FSWatcher FSWatcher.close() set closed=true before calling refTask(), so refTask() returned false without incrementing and the paired unrefTask() still ran, followed by the final unrefTask(). Net effect of close() on the counter was -2. pending_activity_count starts at 1; initJS() only added +1 when persistent. For {persistent:false} watchers, close() took 1 -> -1 which wraps to u32 MAX, so hasPendingActivity() returned true forever and the native FSWatcher plus its cached listener closure (and the JS FSWatcher EventEmitter it closes over) were pinned as GC roots — a permanent leak per watcher instance. Persistent watchers landed at 0 only by accident (2 - 2). - close(): increment the counter directly instead of calling refTask() so the pair around emitJS(close) is actually balanced. - initJS(): drop the persistent-gated +1. The initial count of 1 already keeps hasPendingActivity() true until close(); persistence only affects the event-loop keep-alive (poll_ref), not GC reachability. This also fixes the count for watchers that call ref()/unref() before close(). - unrefTask(): add a debug assertion to catch future underflow.

coderabbitai · 2026-04-29T00:41:07Z

Warning

Rate limit exceeded

@robobun has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 18 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 498e74ce-3746-4ca7-a876-a2dc1ff7f295

📥 Commits

Reviewing files that changed from the base of the PR and between c027bf3 and 929f918.

📒 Files selected for processing (2)

src/bun.js/node/node_fs_watcher.zig
test/js/node/watch/fs.watch.test.ts

_{Review rate limit: 0/5 reviews remaining, refill in 2 minutes and 18 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

robobun · 2026-04-29T00:41:12Z

^{Updated 7:25 PM PT - Apr 28th, 2026}

❌ @robobun, your commit 2fed38c has 4 failures in Build #48808 (All Failures):

test/js/bun/http/serve-stream-reject-flush-leak.test.ts - pid 2412 internal assertion failure: allocators do not match�[0m on 🪟 2019 x64
test/js/bun/http/serve-stream-reject-flush-leak.test.ts - pid 2700 internal assertion failure: allocators do not match�[0m on 🪟 11 aarch64
test/js/bun/http/serve-stream-reject-flush-leak.test.ts - pid 5592 internal assertion failure: allocators do not match�[0m on 🪟 2019 x64-baseline
test/js/bun/test/parallel/test-integration-rspack.ts - pid 9756 segmentation fault at address 0x7fff1f1db9c8 on 🪟 11 aarch64
test/js/bun/websocket/websocket-server.test.ts - code 1 on 🪟 2019 x64
test/js/bun/websocket/websocket-server.test.ts - code 1 on 🪟 2019 x64-baseline
test/js/bun/websocket/websocket-server.test.ts - code 1 on 🪟 11 aarch64
test/bake/dev/stress.test.ts - code 1 on 🪟 2019 x64-baseline

🧪 To try this PR locally:

bunx bun-pr 29907

That installs a local version of the PR into your bun-29907 executable, so you can run:

bun-29907 --bun

…ance run()'s unrefTask()

claude

Thanks for the Windows appendAbort fix — the counter now balances on every path I traced; deferring to a human only because this touches atomic hasPendingActivity refcounting across both platform task paths.

Extended reasoning...

Overview

Fixes a pending_activity_count underflow in FSWatcher.close() that permanently leaked fs.watch(path, {persistent:false}) instances. Touches src/bun.js/node/node_fs_watcher.zig (~15 LOC: direct fetchAdd in close() instead of the no-op refTask(), drop the persistent-gated +1 in initJS(), add refTask() guard in FSWatchTaskWindows.appendAbort, add debugAssert(prev > 0) in unrefTask()) and adds a FinalizationRegistry-based regression test.

Prior feedback

My previous review flagged that FSWatchTaskWindows.appendAbort enqueued without bumping the counter, which would underflow (and trip the new assert) on Windows for pre-aborted signals. Commit 929f918 adds if (!ctx.refTask()) return; there, matching onPathUpdateWindows and the posix enqueue() path. I re-traced the abort-before-creation flow on Windows and it now lands at exactly 0.

Verification

I walked the counter through: plain close(); abort-before-creation (posix + windows); abort-after-creation via signal.listen; emitError → close; re-entrant close() from the close listener; and close() racing an already-enqueued task. All end at 0 with no underflow. doRef/doUnref only touch poll_ref, so user ref()/unref() no longer skews the count. The new debug assert is release-no-op.

Security risks

None — internal refcount bookkeeping for GC reachability; no user-controlled input reaches new code paths.

Level of scrutiny

Moderate. The diff is small and follows the existing emitAbort() pattern, but it changes the invariant governing hasPendingActivity() (which gates GC finalization) under a mutex with a concurrent watcher thread, across two divergent platform task implementations. Getting this wrong means either a leak or a use-after-free, so a maintainer familiar with the FSWatcher lifecycle should sign off.

Other factors

No CODEOWNERS entry for this file. Regression test covers both persistent values via subprocess + FinalizationRegistry. Existing abort-signal and "immediately closing" tests exercise the other paths.

… FSWatcher (oven-sh#29907) ## What does this PR do? Fixes a permanent per-instance leak of `fs.watch(path, { persistent: false })` watchers after `.close()`. ## Reproduction ```js const fs = require("fs"); let collected = 0; const registry = new FinalizationRegistry(() => collected++); for (let i = 0; i < 64; i++) { const w = fs.watch("/tmp", { persistent: false }, () => {}); registry.register(w); w.close(); } for (let i = 0; i < 30; i++) { Bun.gc(true); await Bun.sleep(10); } console.log(collected); // before: 0, after: 64 ``` Before this change, **0** of 64 are collected. With `{persistent: true}`, all 64 are collected (by accident, see below). ## Root cause `pending_activity_count` starts at `1`. `initJS()` adds `+1` only when `persistent` is true (so non-persistent stays at `1`, persistent becomes `2`). In `close()`: 1. `closed` is set to `true` under the mutex. 2. `_ = this.refTask()` is called — but `refTask()` checks `if (this.closed) return false` and does **not** increment. The return value is discarded. 3. `this.unrefTask()` runs anyway → `-1`. 4. Final `this.unrefTask()` → `-1`. Net effect of `close()` is `-2`. For persistent (start = 2) this lands at `0` by coincidence. For non-persistent (start = 1) it wraps the `u32` to `4294967295`, so `hasPendingActivity()` returns `true` forever. The native `FSWatcher`, its cached listener closure, and the JS `FSWatcher` EventEmitter it closes over are all pinned as GC roots — a permanent leak per watcher. ## Fix - **`close()`**: increment the counter directly via `fetchAdd` instead of calling `refTask()`, so the ref/unref pair around `emitJS(close)` is actually balanced (net 0). Matches the existing pattern in `emitAbort()`. - **`initJS()`**: drop the persistent-gated `+1`. The initial count of `1` already keeps `hasPendingActivity()` true until `close()`; persistence is about the event-loop keep-alive (`poll_ref`), not GC reachability. This extra `+1` was only ever balanced by the broken `refTask()`/`unrefTask()` pair in `close()`, and it produced wrong counts when the user called `ref()`/`unref()` before `close()`. - **`unrefTask()`**: add a `bun.debugAssert(prev > 0)` to catch future underflow in debug builds. With the fix, the count is `1` until `close()` releases it (plus `+1` per in-flight `FSWatchTask`), regardless of persistence or `ref()`/`unref()`. ## Verification - New test `closed FSWatcher is collectable > persistent: {false,true}` in `test/js/node/watch/fs.watch.test.ts` creates 64 watchers, closes them, and uses a `FinalizationRegistry` to assert ≥32 are collected. Without the fix, `persistent: false` collects 0. - Full `fs.watch.test.ts` suite passes (including abort-signal tests, `immediately closing` with 100× persistent + 100× non-persistent, and `calling close from close event`). - `bun run zig:check-all` passes on all targets. --------- Co-authored-by: robobun <robobun@users.noreply.github.com>

github-actions Bot added the claude label Apr 29, 2026

claude Bot reviewed Apr 29, 2026

View reviewed changes

Comment thread src/bun.js/node/node_fs_watcher.zig

fs.watch(windows): refTask() in FSWatchTaskWindows.appendAbort to bal…

929f918

…ance run()'s unrefTask()

claude Bot reviewed Apr 29, 2026

View reviewed changes

Jarred-Sumner enabled auto-merge (squash) April 29, 2026 03:29

Jarred-Sumner disabled auto-merge April 29, 2026 03:29

Jarred-Sumner merged commit 702949b into main Apr 29, 2026
65 of 69 checks passed

Jarred-Sumner deleted the farm/b6eaea1d/fswatcher-close-refcount branch April 29, 2026 03:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fs.watch: fix pending_activity_count underflow leaking non-persistent FSWatcher#29907

fs.watch: fix pending_activity_count underflow leaking non-persistent FSWatcher#29907
Jarred-Sumner merged 2 commits into
mainfrom
farm/b6eaea1d/fswatcher-close-refcount

robobun commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Rate limit exceeded

Uh oh!

robobun commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robobun commented Apr 29, 2026

What does this PR do?

Reproduction

Root cause

Fix

Verification

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Uh oh!

robobun commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Prior feedback

Verification

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

robobun commented Apr 29, 2026 •

edited

Loading