Skip to content

Watcher: flush evict_list in remove() when full to prevent overflow#29939

Closed
robobun wants to merge 3 commits into
mainfrom
farm/e59010f8/fix-watcher-evict-list-overflow
Closed

Watcher: flush evict_list in remove() when full to prevent overflow#29939
robobun wants to merge 3 commits into
mainfrom
farm/e59010f8/fix-watcher-evict-list-overflow

Conversation

@robobun

@robobun robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

Follow-up to #29936 (second item in that PR's "Not fixed here" section).

What

Watcher.remove() appends the watchlist index to a fixed-size evict_list[max_eviction_count = 8096] buffer; the real removal is deferred to flushEvictions(), which was only driven by onFileUpdate() on the watcher thread — i.e. only when a filesystem event actually arrives.

fs.watch(path).close() reaches PathWatcherManager._decrementPathRefNoLock()main_watcher.remove(hash). Repeating that in a tight loop without ever modifying the filesystem means flushEvictions() never runs, and once the cumulative remove count passes 8096, removeAtIndex() writes past the end of evict_list:

panic(main thread): index out of bounds: index 8096, len 8096
Watcher.removeAtIndex  src/Watcher.zig
Watcher.remove
PathWatcherManager._decrementPathRefNoLock
PathWatcherManager.unregisterWatcher
PathWatcher.deinit

Reproduction

const fs = require("fs");
for (let i = 0; i < 8200; i++) {
  const w = fs.watch("./some-file.txt", { persistent: false }, () => {});
  w.close();
}

Panics deterministically on Linux at iteration 8097.

Fix

Drain evict_list inside remove() when it's full. remove() already holds this.mutex, matching how flushEvictions() is invoked from the platform watch loops (INotifyWatcher / KEventWatcher both lock Watcher.mutex around onFileUpdate()flushEvictions()).

Verification

  • New test test/js/node/watch/fs.watch.evict-list-overflow.test.ts: 8200× fs.watch(file).close() in a subprocess. Without the fix: fails 5/5 with the out-of-bounds panic. With the fix: passes 5/5.
  • Full fs.watch.test.ts suite: passes (3/3).
  • bun run zig:check-all: passes on all targets.

File watches are used (not directory watches) to keep the test deterministic — directory watches on Linux go through the work-pool DirectoryRegisterTask whose completion races with close() in a way that makes the number of remove() calls per cycle timing-dependent.

Watcher.remove() appends the watchlist index to a fixed-size
evict_list[8096]; the real removal is deferred to flushEvictions(),
which was only driven by onFileUpdate() on the watcher thread — i.e.
only when a filesystem event arrives.

fs.watch(path).close() reaches _decrementPathRefNoLock() →
main_watcher.remove(hash). Repeating that in a tight loop without ever
modifying the filesystem means flushEvictions() never runs, and once
the cumulative remove count passes 8096, removeAtIndex() writes past
the end of evict_list:

  panic(main thread): index out of bounds: index 8096, len 8096
  Watcher.removeAtIndex  src/Watcher.zig
  Watcher.remove
  PathWatcherManager._decrementPathRefNoLock
  PathWatcherManager.unregisterWatcher
  PathWatcher.deinit

Drain evict_list inside remove() when it's full — this.mutex is
already held there, matching how flushEvictions() is invoked from the
platform watch loops.
@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 1:02 PM PT - Apr 29th, 2026

@robobun, your commit ca9c48a has 4 failures in Build #49173 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29939

That installs a local version of the PR into your bun-29939 executable, so you can run:

bun-29939 --bun

@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e43c5411-438c-48f7-8da3-de047b13678b

📥 Commits

Reviewing files that changed from the base of the PR and between 05c8a86 and ca9c48a.

📒 Files selected for processing (2)
  • src/Watcher.zig
  • test/js/node/watch/fs.watch.evict-list-overflow.test.ts

Walkthrough

remove() now drains the fixed-size eviction queue by calling flushEvictions() when evict_list_i reaches max_eviction_count, performing the drain under the same mutex. A new regression test spawns a Bun process that repeatedly creates and closes an fs.watch to exercise the eviction behavior.

Changes

Cohort / File(s) Summary
Eviction Queue Overflow Fix
src/Watcher.zig
remove() now checks if evict_list_i >= max_eviction_count and invokes flushEvictions() while holding the existing mutex, ensuring the eviction queue is drained before enqueuing further indices and before subsequent indexOf() computations.
Regression Test
test/js/node/watch/fs.watch.evict-list-overflow.test.ts
Adds a Bun-based test that spawns a child process running a fixture which performs 8200 iterations of creating and immediately closing an fs.watch on one file; the test captures stdout/stderr/exit code, filters ASAN warning lines, and asserts expected outputs (skipped on Windows, 30s timeout).
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding proactive evict_list flushing in remove() to prevent buffer overflow.
Description check ✅ Passed The description fully addresses both template sections: explains what the PR does (the root cause, the fix) and how verification was performed (new regression test, existing test suite, build checks).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Review rate limit: 4/5 reviews remaining, refill in 12 minutes.

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread src/Watcher.zig
Comment thread test/js/node/watch/fs.watch.evict-list-overflow.test.ts Outdated
…ex isn't stale; enable test on macOS (file watches use KEventWatcher, not FSEvents)

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — both prior comments are addressed (flush now precedes indexOf(), and the test runs on macOS). Deferring final sign-off to a human since this adds a main-thread flushEvictions() call site (fd closes + kqueue re-registration) in core watcher code.

Extended reasoning...

Overview

This PR fixes an out-of-bounds write in Watcher.evict_list by draining it inside remove() when full, and adds a regression test. Two files touched: src/Watcher.zig (~3 lines of logic + a comment) and a new test file. Since my last review, commit ca9c48a moved the flush to before indexOf() (fixing the stale-index issue I flagged) and enabled the test on macOS with a corrected comment about KEventWatcher vs FSEvents.

Security risks

None identified. No untrusted input parsing, auth, or crypto. The change is internal bookkeeping in the file-watcher under an existing mutex.

Level of scrutiny

Moderate-to-high. The diff is tiny and the logic is sound, but Watcher.zig is core runtime infrastructure shared by fs.watch, hot reload, and the dev server across Linux/macOS/FreeBSD. The fix newly invokes flushEvictions() from the main thread (previously only the watcher thread), which closes fds and — on kqueue — calls kevent() to re-register entries while the watcher thread may be blocked in kevent() on the same fd. That's a normal kqueue usage pattern and the mutex discipline matches existing call sites, but cross-thread fd lifecycle in the watcher has historically been subtle, so a human glance is warranted.

Other factors

  • Both of my earlier inline comments were fully addressed.
  • The bug-hunting system found no issues on the current revision.
  • The new test is deterministic, runs in a subprocess, and now covers the kqueue path on macOS in addition to inotify on Linux.
  • robobun reported CI failures on the earlier commit 05c8a86; status of the latest commit ca9c48a wasn't visible in the timeline.

@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

CI build #49173: the alpine-3.23-aarch64 hard fail in fs.watch.test.ts (closed FSWatcher is collectable > persistent: true, segfault at 0x75622F706D742F in mimalloc) is the pre-existing PathWatcher double-free race that #29936 fixes — this branch is based on main which doesn't include that fix yet. Unrelated to the Watcher.remove() change here (that code path isn't reached in a 64-iteration test; it takes 8096+). Other failures are bun-create.test.ts GitHub 403 rate-limit and dev-and-prod.test.ts bake/HMR flakes on Windows.

@robobun

robobun commented May 1, 2026

Copy link
Copy Markdown
Collaborator Author

Closing — obsoleted by #29952, which rewrote path_watcher.zig to stop using bun.Watcher entirely.

This PR fixed Watcher.remove() overflowing its fixed-size evict_list[8096] when called repeatedly without fs events. The only call site was PathWatcherManager._decrementPathRefNoLock()main_watcher.remove(hash), reached via fs.watch().close(). After #29952, fs.watch() no longer routes through bun.Watcher at all, so Watcher.remove() has no remaining callers on main — the regression test here (8200× fs.watch(file).close()) no longer reaches evict_list.

The defensive flush-before-indexOf() change itself is harmless, but it targets now-dead code and the test can't gate it anymore.

@robobun robobun closed this May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant