Skip to content

fs.watch: fix pending_activity_count underflow leaking non-persistent FSWatcher#29907

Merged
Jarred-Sumner merged 2 commits into
mainfrom
farm/b6eaea1d/fswatcher-close-refcount
Apr 29, 2026
Merged

fs.watch: fix pending_activity_count underflow leaking non-persistent FSWatcher#29907
Jarred-Sumner merged 2 commits into
mainfrom
farm/b6eaea1d/fswatcher-close-refcount

Conversation

@robobun

@robobun robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

What does this PR do?

Fixes a permanent per-instance leak of fs.watch(path, { persistent: false }) watchers after .close().

Reproduction

const fs = require("fs");
let collected = 0;
const registry = new FinalizationRegistry(() => collected++);
for (let i = 0; i < 64; i++) {
  const w = fs.watch("/tmp", { persistent: false }, () => {});
  registry.register(w);
  w.close();
}
for (let i = 0; i < 30; i++) { Bun.gc(true); await Bun.sleep(10); }
console.log(collected); // before: 0, after: 64

Before this change, 0 of 64 are collected. With {persistent: true}, all 64 are collected (by accident, see below).

Root cause

pending_activity_count starts at 1. initJS() adds +1 only when persistent is true (so non-persistent stays at 1, persistent becomes 2).

In close():

  1. closed is set to true under the mutex.
  2. _ = this.refTask() is called — but refTask() checks if (this.closed) return false and does not increment. The return value is discarded.
  3. this.unrefTask() runs anyway → -1.
  4. Final this.unrefTask()-1.

Net effect of close() is -2. For persistent (start = 2) this lands at 0 by coincidence. For non-persistent (start = 1) it wraps the u32 to 4294967295, so hasPendingActivity() returns true forever. The native FSWatcher, its cached listener closure, and the JS FSWatcher EventEmitter it closes over are all pinned as GC roots — a permanent leak per watcher.

Fix

  • close(): increment the counter directly via fetchAdd instead of calling refTask(), so the ref/unref pair around emitJS(close) is actually balanced (net 0). Matches the existing pattern in emitAbort().
  • initJS(): drop the persistent-gated +1. The initial count of 1 already keeps hasPendingActivity() true until close(); persistence is about the event-loop keep-alive (poll_ref), not GC reachability. This extra +1 was only ever balanced by the broken refTask()/unrefTask() pair in close(), and it produced wrong counts when the user called ref()/unref() before close().
  • unrefTask(): add a bun.debugAssert(prev > 0) to catch future underflow in debug builds.

With the fix, the count is 1 until close() releases it (plus +1 per in-flight FSWatchTask), regardless of persistence or ref()/unref().

Verification

  • New test closed FSWatcher is collectable > persistent: {false,true} in test/js/node/watch/fs.watch.test.ts creates 64 watchers, closes them, and uses a FinalizationRegistry to assert ≥32 are collected. Without the fix, persistent: false collects 0.
  • Full fs.watch.test.ts suite passes (including abort-signal tests, immediately closing with 100× persistent + 100× non-persistent, and calling close from close event).
  • bun run zig:check-all passes on all targets.

… FSWatcher

FSWatcher.close() set closed=true before calling refTask(), so refTask()
returned false without incrementing and the paired unrefTask() still ran,
followed by the final unrefTask(). Net effect of close() on the counter was
-2.

pending_activity_count starts at 1; initJS() only added +1 when persistent.
For {persistent:false} watchers, close() took 1 -> -1 which wraps to u32 MAX,
so hasPendingActivity() returned true forever and the native FSWatcher plus
its cached listener closure (and the JS FSWatcher EventEmitter it closes
over) were pinned as GC roots — a permanent leak per watcher instance.
Persistent watchers landed at 0 only by accident (2 - 2).

- close(): increment the counter directly instead of calling refTask() so
  the pair around emitJS(close) is actually balanced.
- initJS(): drop the persistent-gated +1. The initial count of 1 already
  keeps hasPendingActivity() true until close(); persistence only affects
  the event-loop keep-alive (poll_ref), not GC reachability. This also
  fixes the count for watchers that call ref()/unref() before close().
- unrefTask(): add a debug assertion to catch future underflow.
@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@robobun has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 2 minutes and 18 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 498e74ce-3746-4ca7-a876-a2dc1ff7f295

📥 Commits

Reviewing files that changed from the base of the PR and between c027bf3 and 929f918.

📒 Files selected for processing (2)
  • src/bun.js/node/node_fs_watcher.zig
  • test/js/node/watch/fs.watch.test.ts

Review rate limit: 0/5 reviews remaining, refill in 2 minutes and 18 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 7:25 PM PT - Apr 28th, 2026

@robobun, your commit 2fed38c has 4 failures in Build #48808 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29907

That installs a local version of the PR into your bun-29907 executable, so you can run:

bun-29907 --bun

Comment thread src/bun.js/node/node_fs_watcher.zig

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the Windows appendAbort fix — the counter now balances on every path I traced; deferring to a human only because this touches atomic hasPendingActivity refcounting across both platform task paths.

Extended reasoning...

Overview

Fixes a pending_activity_count underflow in FSWatcher.close() that permanently leaked fs.watch(path, {persistent:false}) instances. Touches src/bun.js/node/node_fs_watcher.zig (~15 LOC: direct fetchAdd in close() instead of the no-op refTask(), drop the persistent-gated +1 in initJS(), add refTask() guard in FSWatchTaskWindows.appendAbort, add debugAssert(prev > 0) in unrefTask()) and adds a FinalizationRegistry-based regression test.

Prior feedback

My previous review flagged that FSWatchTaskWindows.appendAbort enqueued without bumping the counter, which would underflow (and trip the new assert) on Windows for pre-aborted signals. Commit 929f918 adds if (!ctx.refTask()) return; there, matching onPathUpdateWindows and the posix enqueue() path. I re-traced the abort-before-creation flow on Windows and it now lands at exactly 0.

Verification

I walked the counter through: plain close(); abort-before-creation (posix + windows); abort-after-creation via signal.listen; emitErrorclose; re-entrant close() from the close listener; and close() racing an already-enqueued task. All end at 0 with no underflow. doRef/doUnref only touch poll_ref, so user ref()/unref() no longer skews the count. The new debug assert is release-no-op.

Security risks

None — internal refcount bookkeeping for GC reachability; no user-controlled input reaches new code paths.

Level of scrutiny

Moderate. The diff is small and follows the existing emitAbort() pattern, but it changes the invariant governing hasPendingActivity() (which gates GC finalization) under a mutex with a concurrent watcher thread, across two divergent platform task implementations. Getting this wrong means either a leak or a use-after-free, so a maintainer familiar with the FSWatcher lifecycle should sign off.

Other factors

No CODEOWNERS entry for this file. Regression test covers both persistent values via subprocess + FinalizationRegistry. Existing abort-signal and "immediately closing" tests exercise the other paths.

@Jarred-Sumner Jarred-Sumner enabled auto-merge (squash) April 29, 2026 03:29
@Jarred-Sumner Jarred-Sumner disabled auto-merge April 29, 2026 03:29
@Jarred-Sumner Jarred-Sumner merged commit 702949b into main Apr 29, 2026
65 of 69 checks passed
@Jarred-Sumner Jarred-Sumner deleted the farm/b6eaea1d/fswatcher-close-refcount branch April 29, 2026 03:29
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
… FSWatcher (oven-sh#29907)

## What does this PR do?

Fixes a permanent per-instance leak of `fs.watch(path, { persistent:
false })` watchers after `.close()`.

## Reproduction

```js
const fs = require("fs");
let collected = 0;
const registry = new FinalizationRegistry(() => collected++);
for (let i = 0; i < 64; i++) {
  const w = fs.watch("/tmp", { persistent: false }, () => {});
  registry.register(w);
  w.close();
}
for (let i = 0; i < 30; i++) { Bun.gc(true); await Bun.sleep(10); }
console.log(collected); // before: 0, after: 64
```

Before this change, **0** of 64 are collected. With `{persistent:
true}`, all 64 are collected (by accident, see below).

## Root cause

`pending_activity_count` starts at `1`. `initJS()` adds `+1` only when
`persistent` is true (so non-persistent stays at `1`, persistent becomes
`2`).

In `close()`:
1. `closed` is set to `true` under the mutex.
2. `_ = this.refTask()` is called — but `refTask()` checks `if
(this.closed) return false` and does **not** increment. The return value
is discarded.
3. `this.unrefTask()` runs anyway → `-1`.
4. Final `this.unrefTask()` → `-1`.

Net effect of `close()` is `-2`. For persistent (start = 2) this lands
at `0` by coincidence. For non-persistent (start = 1) it wraps the `u32`
to `4294967295`, so `hasPendingActivity()` returns `true` forever. The
native `FSWatcher`, its cached listener closure, and the JS `FSWatcher`
EventEmitter it closes over are all pinned as GC roots — a permanent
leak per watcher.

## Fix

- **`close()`**: increment the counter directly via `fetchAdd` instead
of calling `refTask()`, so the ref/unref pair around `emitJS(close)` is
actually balanced (net 0). Matches the existing pattern in
`emitAbort()`.
- **`initJS()`**: drop the persistent-gated `+1`. The initial count of
`1` already keeps `hasPendingActivity()` true until `close()`;
persistence is about the event-loop keep-alive (`poll_ref`), not GC
reachability. This extra `+1` was only ever balanced by the broken
`refTask()`/`unrefTask()` pair in `close()`, and it produced wrong
counts when the user called `ref()`/`unref()` before `close()`.
- **`unrefTask()`**: add a `bun.debugAssert(prev > 0)` to catch future
underflow in debug builds.

With the fix, the count is `1` until `close()` releases it (plus `+1`
per in-flight `FSWatchTask`), regardless of persistence or
`ref()`/`unref()`.

## Verification

- New test `closed FSWatcher is collectable > persistent: {false,true}`
in `test/js/node/watch/fs.watch.test.ts` creates 64 watchers, closes
them, and uses a `FinalizationRegistry` to assert ≥32 are collected.
Without the fix, `persistent: false` collects 0.
- Full `fs.watch.test.ts` suite passes (including abort-signal tests,
`immediately closing` with 100× persistent + 100× non-persistent, and
`calling close from close event`).
- `bun run zig:check-all` passes on all targets.

---------

Co-authored-by: robobun <robobun@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants