Watcher: flush evict_list in remove() when full to prevent overflow#29939
Watcher: flush evict_list in remove() when full to prevent overflow#29939robobun wants to merge 3 commits into
Conversation
Watcher.remove() appends the watchlist index to a fixed-size evict_list[8096]; the real removal is deferred to flushEvictions(), which was only driven by onFileUpdate() on the watcher thread — i.e. only when a filesystem event arrives. fs.watch(path).close() reaches _decrementPathRefNoLock() → main_watcher.remove(hash). Repeating that in a tight loop without ever modifying the filesystem means flushEvictions() never runs, and once the cumulative remove count passes 8096, removeAtIndex() writes past the end of evict_list: panic(main thread): index out of bounds: index 8096, len 8096 Watcher.removeAtIndex src/Watcher.zig Watcher.remove PathWatcherManager._decrementPathRefNoLock PathWatcherManager.unregisterWatcher PathWatcher.deinit Drain evict_list inside remove() when it's full — this.mutex is already held there, matching how flushEvictions() is invoked from the platform watch loops.
|
Updated 1:02 PM PT - Apr 29th, 2026
❌ @robobun, your commit ca9c48a has 4 failures in
🧪 To try this PR locally: bunx bun-pr 29939That installs a local version of the PR into your bun-29939 --bun |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
Walkthrough
Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
…ex isn't stale; enable test on macOS (file watches use KEventWatcher, not FSEvents)
There was a problem hiding this comment.
Thanks — both prior comments are addressed (flush now precedes indexOf(), and the test runs on macOS). Deferring final sign-off to a human since this adds a main-thread flushEvictions() call site (fd closes + kqueue re-registration) in core watcher code.
Extended reasoning...
Overview
This PR fixes an out-of-bounds write in Watcher.evict_list by draining it inside remove() when full, and adds a regression test. Two files touched: src/Watcher.zig (~3 lines of logic + a comment) and a new test file. Since my last review, commit ca9c48a moved the flush to before indexOf() (fixing the stale-index issue I flagged) and enabled the test on macOS with a corrected comment about KEventWatcher vs FSEvents.
Security risks
None identified. No untrusted input parsing, auth, or crypto. The change is internal bookkeeping in the file-watcher under an existing mutex.
Level of scrutiny
Moderate-to-high. The diff is tiny and the logic is sound, but Watcher.zig is core runtime infrastructure shared by fs.watch, hot reload, and the dev server across Linux/macOS/FreeBSD. The fix newly invokes flushEvictions() from the main thread (previously only the watcher thread), which closes fds and — on kqueue — calls kevent() to re-register entries while the watcher thread may be blocked in kevent() on the same fd. That's a normal kqueue usage pattern and the mutex discipline matches existing call sites, but cross-thread fd lifecycle in the watcher has historically been subtle, so a human glance is warranted.
Other factors
- Both of my earlier inline comments were fully addressed.
- The bug-hunting system found no issues on the current revision.
- The new test is deterministic, runs in a subprocess, and now covers the kqueue path on macOS in addition to inotify on Linux.
- robobun reported CI failures on the earlier commit 05c8a86; status of the latest commit ca9c48a wasn't visible in the timeline.
|
CI build #49173: the |
|
Closing — obsoleted by #29952, which rewrote This PR fixed The defensive flush-before- |
Follow-up to #29936 (second item in that PR's "Not fixed here" section).
What
Watcher.remove()appends the watchlist index to a fixed-sizeevict_list[max_eviction_count = 8096]buffer; the real removal is deferred toflushEvictions(), which was only driven byonFileUpdate()on the watcher thread — i.e. only when a filesystem event actually arrives.fs.watch(path).close()reachesPathWatcherManager._decrementPathRefNoLock()→main_watcher.remove(hash). Repeating that in a tight loop without ever modifying the filesystem meansflushEvictions()never runs, and once the cumulative remove count passes 8096,removeAtIndex()writes past the end ofevict_list:Reproduction
Panics deterministically on Linux at iteration 8097.
Fix
Drain
evict_listinsideremove()when it's full.remove()already holdsthis.mutex, matching howflushEvictions()is invoked from the platform watch loops (INotifyWatcher/KEventWatcherboth lockWatcher.mutexaroundonFileUpdate()→flushEvictions()).Verification
test/js/node/watch/fs.watch.evict-list-overflow.test.ts: 8200×fs.watch(file).close()in a subprocess. Without the fix: fails 5/5 with the out-of-bounds panic. With the fix: passes 5/5.fs.watch.test.tssuite: passes (3/3).bun run zig:check-all: passes on all targets.File watches are used (not directory watches) to keep the test deterministic — directory watches on Linux go through the work-pool
DirectoryRegisterTaskwhose completion races withclose()in a way that makes the number ofremove()calls per cycle timing-dependent.