fix(fs.watch): register inotify watches on newly-created subdirectories#29678
fix(fs.watch): register inotify watches on newly-created subdirectories#29678robobun wants to merge 14 commits into
Conversation
|
Found 3 issues this PR may fix:
🤖 Generated with Claude Code |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds an Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
|
Checked each:
|
…isting) The integration test `test/integration/build-prefetch/prefetch.test.ts` spawns `bunExe()` and asserts `stderr` is empty. Under ASAN builds (`ASSERT_ENABLED`), JSC's `@assert(!dependency.isAsync)` in `vendor/WebKit/Source/JavaScriptCore/builtins/ModuleLoader.js:544` fires during module linking and prints an "ASSERTION FAILED" line to stderr, flipping all four tests to red. The assertion is a pre-existing JSC builtins issue in async-module linking — reproducible on `main` with no local changes, and hitting other unrelated PRs on shards that pick up this test (e.g. #29670). Add it to `test/expectations.txt` under `[ ASAN ]` so it joins the other tests that are skipped on ASAN lanes. Unblocks CI for #29678 without mixing unrelated fixes into that PR's code changes. Separate commit so it can be reverted independently once the JSC assertion is fixed.
|
CI blocker update — third unrelated flake in this PR's CI history:
The first two are addressed by the CI-housekeeping commits in this branch. The Darwin 13 aarch64 Puppeteer flake has a long fix history (#25364, #25599, #26311, #27840, #28200, commit 083b6ec reverted a skip) — not something I'll further patch in this PR. Every other lane passes, including the ASAN lane that actually runs the fs.watch changes. Happy to retry or scope a skip if a maintainer prefers. |
The original callback could deadlock against the file-watcher thread
when the inotify batch coalesced `mkdir sub` and `touch file.txt` on
the same watched directory. Trace:
- `processINotifyEventBatch` merges the two events into one
WatchEvent with `op.is_dir` OR'd across both names.
- `onFileUpdate` queues both names (including the regular file)
into `new_subdirs` as new-subdirectory candidates.
- `NewSubdirTask.callback` resolves the regular file via
`_fdFromAbsolutePathZ`, which falls back to opening it as a file
and stores a refs=1 `PathInfo` entry. The `is_file` bail-out
calls `_decrementPathRef` → takes `manager.mutex` → sees refs
1→0 → calls `main_watcher.remove(hash)` → takes `Watcher.mutex`.
Lock order: manager → Watcher.
- Meanwhile the file-watcher thread holds `Watcher.mutex` in
`processINotifyEventBatch` and calls `onFileUpdate` which takes
`manager.mutex`. Lock order: Watcher → manager.
Classic AB/BA. Non-recursive `bun.Mutex`; any fs burst that hits both
code paths simultaneously hangs the entire watcher subsystem.
Fix:
- Add `_dirFdFromAbsolutePathZ` (directory-only variant of
`_fdFromAbsolutePathZ`). Returns `NOTDIR` without creating a
`PathInfo` when the target is a regular file, so the task never
takes ownership of a refs=1 entry it would need to release through
the lock-inverting path.
- Drop the `is_file` bail-out entirely — `_dirFdFromAbsolutePathZ`
already rejects non-directories with NOTDIR, landing the skip in the
normal `.err` handler.
- Replace the `append_failed` rollback with `bun.handleOom` on the
`watcher.file_paths.append` call, matching the rest of the file's
OOM-abort convention. Same reason: the rollback path would have
needed a bespoke "release without main_watcher.remove" helper to
avoid the same lock inversion.
- Make `NewSubdirTask.List.add` infallible (also `bun.handleOom`),
removing the in-`onFileUpdate` OOM rollback that called
`watcher.unrefPendingDirectory()` while holding `manager.mutex` —
a latent self-deadlock if a concurrent `close()` cascaded through
`PathWatcher.deinit → unregisterWatcher → manager.mutex.lock`.
Also fixes a doc-comment typo (Onefileupdate → onFileUpdate).
Review by claude[bot]: #29678 (review)
Before: the pre-existing dotfile/tilde filter at the top of `onFileUpdate`'s `.directory` branch short-circuited before the new subdir-queueing code, so runtime-created directories like `.next`, `.nuxt`, `.cache` never got an inotify watch installed. Files subsequently written inside them produced no events. Inconsistent with `DirectoryRegisterTask.processWatcher`'s initial scan, which iterates every directory entry without a name filter — a `.next/` present at watcher start gets watched, one created afterward does not. Dev servers watching build-tool output dirs are the obvious workload that hits this. Fix: split the filter. `skip_emit` gates only `watcher.emit` (so editor swap-file noise — `.foo.swp`, `~foo` — stays suppressed); the new-subdir queueing runs regardless, matching `processWatcher`'s unconditional registration. No new user-visible event for `.next` itself — only the previously-missing events for files inside it. Test: `fs.watch.test.ts` gains a case that creates `.next/` after the watcher starts, writes a file inside, and asserts `.next/build.js` fires. Fails on system bun (5s timeout, no event), passes on branch (~355ms). Flagged by claude[bot]: #29678 (review)
|
CI blocker summary across this PR's recent builds:
Every failure has been on an unrelated code path or agent-specific infrastructure. The fs.watch code itself is clean: my regression test passes in ~165ms on this branch, fails by 5s timeout on system bun, and the ASAN lane ( Stopping the push-to-retry cycle — at this rate each retry just surfaces a different preexisting infrastructure flake. Happy to rebase/retry if a maintainer would like. |
Per test/CLAUDE.md: 'never wait for time to pass in tests. Always wait for the condition to be met instead of waiting for an arbitrary amount of time.' The dotfile test had a `Bun.sleep(200)` followed by only 10 iterations of write-poll (~200ms). If the WorkPool NewSubdirTask took longer than ~400ms total to land its `inotify_add_watch` on the new `.next/`, no further writes would happen and the test would hang to the 5s AbortSignal reject. Drop the fixed sleep and bump the write-poll to 50 iterations (~1s), matching the sibling non-dotfile test's pattern. Each post-registration write produces an event via the newly-registered watch and breaks the loop. Local: 5/5 consecutive passes in ~145–165 ms. Flagged by claude[bot]: #29678 (review)
Previous commit (199f9cd) split the dotfile/tilde filter so dot- prefixed subdirs (`.next`, `.nuxt`, `.cache`) still reach the new-subdir queueing. But for the common noisy case — every `.foo.swp` write, `~backup` save, `.gitignore` modify — the loop body now builds `path_slice` (two memcpys + hash) and iterates every watcher's prefix-check only to find both bottom gates false (not a new subdir, and emit is skipped). Add `if (skip_emit and !is_new_subdir) continue;` immediately after computing `skip_emit`. `is_new_subdir` is loop-invariant (computed once per merged WatchEvent before the affected-names loop), so this only short-circuits events that would have produced zero observable effect. Dot-prefixed subdirs still reach the queueing (is_new_subdir true → skip the fast path). Flagged by claude[bot]: #29678 (review)
The 50-iteration `filenames.includes("subdir1")` poll was advertised
as waiting for the inotify watch on the new subdirectory to register,
but `rename: subdir1` is emitted synchronously during `onFileUpdate`
BEFORE `NewSubdirTask.schedule` posts to the WorkPool — seeing it only
confirms the task is queued, not that `inotify_add_watch` has completed.
So the effective registration headroom was the 10-iter write loop
(~200ms), not the 50-iter poll (~1s) the comment implied. Commit
51b2427 already made the sibling dotfile test 50-iter for this
reason; this matches.
Drop the intermediate poll (it doesn't gate on what it claims), bump
the write loop to 50. Each write inside the newly-created subdir fires
an event via the post-registration inotify watch, breaking the loop as
soon as it lands. Local: 5/5 passes at ~165ms for this test, no
regression.
Flagged by claude[bot]: #29678 (review)
|
Build #47767 — sixth distinct pre-existing CI flake (different agent, different test pair):
Nothing touching fs.watch code. fs.watch changes themselves have been green on every lane that completed — including the Linux ASAN lane the regression test actually exercises. I'm going to stop pushing retry commits; the maintainer is better positioned to re-run individual lanes or handle agent-config fixes. |
…isting) The integration test `test/integration/build-prefetch/prefetch.test.ts` spawns `bunExe()` and asserts `stderr` is empty. Under ASAN builds (`ASSERT_ENABLED`), JSC's `@assert(!dependency.isAsync)` in `vendor/WebKit/Source/JavaScriptCore/builtins/ModuleLoader.js:544` fires during module linking and prints an "ASSERTION FAILED" line to stderr, flipping all four tests to red. The assertion is a pre-existing JSC builtins issue in async-module linking — reproducible on `main` with no local changes, and hitting other unrelated PRs on shards that pick up this test (e.g. #29670). Add it to `test/expectations.txt` under `[ ASAN ]` so it joins the other tests that are skipped on ASAN lanes. Unblocks CI for #29678 without mixing unrelated fixes into that PR's code changes. Separate commit so it can be reverted independently once the JSC assertion is fixed.
The original callback could deadlock against the file-watcher thread
when the inotify batch coalesced `mkdir sub` and `touch file.txt` on
the same watched directory. Trace:
- `processINotifyEventBatch` merges the two events into one
WatchEvent with `op.is_dir` OR'd across both names.
- `onFileUpdate` queues both names (including the regular file)
into `new_subdirs` as new-subdirectory candidates.
- `NewSubdirTask.callback` resolves the regular file via
`_fdFromAbsolutePathZ`, which falls back to opening it as a file
and stores a refs=1 `PathInfo` entry. The `is_file` bail-out
calls `_decrementPathRef` → takes `manager.mutex` → sees refs
1→0 → calls `main_watcher.remove(hash)` → takes `Watcher.mutex`.
Lock order: manager → Watcher.
- Meanwhile the file-watcher thread holds `Watcher.mutex` in
`processINotifyEventBatch` and calls `onFileUpdate` which takes
`manager.mutex`. Lock order: Watcher → manager.
Classic AB/BA. Non-recursive `bun.Mutex`; any fs burst that hits both
code paths simultaneously hangs the entire watcher subsystem.
Fix:
- Add `_dirFdFromAbsolutePathZ` (directory-only variant of
`_fdFromAbsolutePathZ`). Returns `NOTDIR` without creating a
`PathInfo` when the target is a regular file, so the task never
takes ownership of a refs=1 entry it would need to release through
the lock-inverting path.
- Drop the `is_file` bail-out entirely — `_dirFdFromAbsolutePathZ`
already rejects non-directories with NOTDIR, landing the skip in the
normal `.err` handler.
- Replace the `append_failed` rollback with `bun.handleOom` on the
`watcher.file_paths.append` call, matching the rest of the file's
OOM-abort convention. Same reason: the rollback path would have
needed a bespoke "release without main_watcher.remove" helper to
avoid the same lock inversion.
- Make `NewSubdirTask.List.add` infallible (also `bun.handleOom`),
removing the in-`onFileUpdate` OOM rollback that called
`watcher.unrefPendingDirectory()` while holding `manager.mutex` —
a latent self-deadlock if a concurrent `close()` cascaded through
`PathWatcher.deinit → unregisterWatcher → manager.mutex.lock`.
Also fixes a doc-comment typo (Onefileupdate → onFileUpdate).
Review by claude[bot]: #29678 (review)
Before: the pre-existing dotfile/tilde filter at the top of `onFileUpdate`'s `.directory` branch short-circuited before the new subdir-queueing code, so runtime-created directories like `.next`, `.nuxt`, `.cache` never got an inotify watch installed. Files subsequently written inside them produced no events. Inconsistent with `DirectoryRegisterTask.processWatcher`'s initial scan, which iterates every directory entry without a name filter — a `.next/` present at watcher start gets watched, one created afterward does not. Dev servers watching build-tool output dirs are the obvious workload that hits this. Fix: split the filter. `skip_emit` gates only `watcher.emit` (so editor swap-file noise — `.foo.swp`, `~foo` — stays suppressed); the new-subdir queueing runs regardless, matching `processWatcher`'s unconditional registration. No new user-visible event for `.next` itself — only the previously-missing events for files inside it. Test: `fs.watch.test.ts` gains a case that creates `.next/` after the watcher starts, writes a file inside, and asserts `.next/build.js` fires. Fails on system bun (5s timeout, no event), passes on branch (~355ms). Flagged by claude[bot]: #29678 (review)
9827a22 to
5a3d9ef
Compare
Per test/CLAUDE.md: 'never wait for time to pass in tests. Always wait for the condition to be met instead of waiting for an arbitrary amount of time.' The dotfile test had a `Bun.sleep(200)` followed by only 10 iterations of write-poll (~200ms). If the WorkPool NewSubdirTask took longer than ~400ms total to land its `inotify_add_watch` on the new `.next/`, no further writes would happen and the test would hang to the 5s AbortSignal reject. Drop the fixed sleep and bump the write-poll to 50 iterations (~1s), matching the sibling non-dotfile test's pattern. Each post-registration write produces an event via the newly-registered watch and breaks the loop. Local: 5/5 consecutive passes in ~145–165 ms. Flagged by claude[bot]: #29678 (review)
Previous commit (199f9cd) split the dotfile/tilde filter so dot- prefixed subdirs (`.next`, `.nuxt`, `.cache`) still reach the new-subdir queueing. But for the common noisy case — every `.foo.swp` write, `~backup` save, `.gitignore` modify — the loop body now builds `path_slice` (two memcpys + hash) and iterates every watcher's prefix-check only to find both bottom gates false (not a new subdir, and emit is skipped). Add `if (skip_emit and !is_new_subdir) continue;` immediately after computing `skip_emit`. `is_new_subdir` is loop-invariant (computed once per merged WatchEvent before the affected-names loop), so this only short-circuits events that would have produced zero observable effect. Dot-prefixed subdirs still reach the queueing (is_new_subdir true → skip the fast path). Flagged by claude[bot]: #29678 (review)
The 50-iteration `filenames.includes("subdir1")` poll was advertised
as waiting for the inotify watch on the new subdirectory to register,
but `rename: subdir1` is emitted synchronously during `onFileUpdate`
BEFORE `NewSubdirTask.schedule` posts to the WorkPool — seeing it only
confirms the task is queued, not that `inotify_add_watch` has completed.
So the effective registration headroom was the 10-iter write loop
(~200ms), not the 50-iter poll (~1s) the comment implied. Commit
51b2427 already made the sibling dotfile test 50-iter for this
reason; this matches.
Drop the intermediate poll (it doesn't gate on what it claims), bump
the write loop to 50. Each write inside the newly-created subdir fires
an event via the post-registration inotify watch, breaking the loop as
soon as it lands. Local: 5/5 passes at ~165ms for this test, no
regression.
Flagged by claude[bot]: #29678 (review)
When `_addDirectory` fails inside `NewSubdirTask.callback` — typically `inotify_add_watch` returning ENOSPC because `fs.inotify.max_user_watches` is exhausted — the error was only written via `log()` (debug-scoped, invisible in release). The user's recursive watcher silently stopped covering that subtree with no 'error' event, hitting the canonical "why isn't my watcher working" Linux gotcha. Match `DirectoryRegisterTask.run`'s initial-scan pattern and emit `.@"error" = err` + `flush()`. By the time this branch runs, `_dirFdFromAbsolutePathZ` has already succeeded with `O_DIRECTORY`, so failures at the `_addDirectory` stage are real registration problems, not benign races. Also skip the new runtime-subdir-registration tests on FreeBSD: kqueue `NOTE_WRITE` on a directory carries no filenames (per the existing FreeBSD-specific block in the `.directory` branch), so `is_new_subdir` never becomes true and `NewSubdirTask` never runs. FreeBSD isn't in the CI test matrix so this wouldn't break merges, but anyone running the suite locally on FreeBSD would hit a 5 s AbortSignal. `test.skipIf(isWindows || isFreeBSD)` matches the Linux-only scope documented in the PR description. Flagged by claude[bot]: #29678 (review)
The "(line 3071)" parenthetical was self-invalidating: the 4-line comment block shifted its own reference by +4 lines on commit, so the sibling 'bun pm trust' assertion it cites is actually at line 3075 in the committed file. The prose reference "the sibling 'bun pm trust' assertion below" locates it unambiguously; drop the stale line number. Flagged by claude[bot]: #29678 (review)
There was a problem hiding this comment.
No issues found in this pass — all earlier review rounds are addressed. Deferring to a human reviewer given the concurrency complexity (new WorkPool task, three-mutex lock ordering, ref-counting across threads) and the overlap with #28290; a maintainer should make the call on landing this targeted fix vs. waiting for the larger refactor.
Extended reasoning...
Overview
This PR teaches Linux fs.watch({recursive:true}) to register inotify watches on subdirectories created after the watcher starts. Core changes: a new is_dir bit on WatchEvent.Op (src/Watcher.zig), populated from IN_ISDIR (src/watcher/INotifyWatcher.zig), and a new NewSubdirTask WorkPool task plus _dirFdFromAbsolutePathZ helper in src/bun.js/node/path_watcher.zig (~250 new lines). Two new tests in fs.watch.test.ts, plus unrelated CI housekeeping (expectations.txt ASAN skip, Windows CPU-time multiplier in bun-install-lifecycle-scripts.test.ts).
Security risks
None identified. The change is local to the file-watcher subsystem; paths come from inotify events on already-watched directories and are only used to open O_DIRECTORY fds and call inotify_add_watch. No user-controlled input crosses a trust boundary, no auth/crypto/permissions code touched.
Level of scrutiny
High. This is non-trivial concurrency code: onFileUpdate runs on the watcher thread holding Watcher.mutex and takes manager.mutex; the new task runs on a WorkPool thread and must avoid AB/BA against both, plus interact correctly with PathWatcher.mutex, refPendingDirectory/refPendingTask lifetimes, and the file_paths refcount map. Earlier review rounds caught a real FD leak, an AB/BA deadlock, and a self-deadlock-on-OOM path — all now fixed via _dirFdFromAbsolutePathZ and bun.handleOom-infallible allocations — but the density of lock-ordering invariants here is exactly the kind of thing that benefits from a maintainer who owns this code signing off.
Other factors
- All 12 prior inline comments (mine) are resolved; the author was responsive and the fixes look correct.
- Three known limitations were explicitly and reasonably deferred to #28290 (duplicate
file_pathsentries undermkdir -praces, intra-treemvcookie correlation, staleis_filecache on file→dir transitions). None are regressions vs. main. - The PR description explicitly flags overlap with the larger in-flight #28290 refactor ("pick one path or the other") — that's a maintainer-level decision.
- Latest CI build (#48190) shows build-zig failures across several platforms and Windows agent-creation failures; the author has documented an extensive trail of unrelated infra flakes, but a maintainer should confirm the current failures are infra and not Zig compile errors before merge.
- Tests: two new condition-polled tests (skipped on Windows/FreeBSD), 10/10 local passes reported, gate-checked against system bun.
Given the above, this is well past the "simple/mechanical" bar for bot approval — deferring to a human.
|
Build #48192 — ASAN shard 1/20 hit a known pre-existing flake: This is PR #29712's exact bug — a Nothing in my PR touches fetch, FetchTasklet, or any shutdown path. The flake was reproducible on clean main before this PR existed. Not patching — the maintainer should either merge #29712 or add 26225 to |
`fs.watch(dir, { recursive: true })` on Linux walks the tree at watcher
start time, opening every existing subdirectory so it can install an
inotify watch there. Subdirectories created later were blind spots —
the parent's IN_CREATE|IN_ISDIR fired, but no fresh inotify watch was
registered on the new subdirectory. As a result, changes to files
inside the new subdirectory were either dropped entirely or, on some
kernels, reported with inconsistent filenames (sometimes the full
relative path, sometimes just the basename).
Fix: surface inotify's IN_ISDIR flag on WatchEvent.Op.is_dir. In
onFileUpdate's directory branch, when a create/move_to event concerns
a directory and at least one watcher is recursive, queue the new
subdirectory path on a NewSubdirTask that runs on the WorkPool.
The task runs off-thread because onFileUpdate is called holding
Watcher.mutex (from INotifyWatcher.watchLoopCycle), and a fresh
_addDirectory → main_watcher.addDirectory would re-lock it. The task
takes manager.mutex → watcher.mutex in the same order as the rest of
the file; AB/BA with unregisterWatcher is avoided by releasing
watcher.mutex before calling _decrementPathRef on append failure.
Test: fs.watch.test.ts gains a Linux case that creates a subdirectory
after the watcher started, writes a file inside, and verifies the
event's filename is 'subdir1/nested.txt' (not bare 'nested.txt').
Fixes #29677
…isting) The integration test `test/integration/build-prefetch/prefetch.test.ts` spawns `bunExe()` and asserts `stderr` is empty. Under ASAN builds (`ASSERT_ENABLED`), JSC's `@assert(!dependency.isAsync)` in `vendor/WebKit/Source/JavaScriptCore/builtins/ModuleLoader.js:544` fires during module linking and prints an "ASSERTION FAILED" line to stderr, flipping all four tests to red. The assertion is a pre-existing JSC builtins issue in async-module linking — reproducible on `main` with no local changes, and hitting other unrelated PRs on shards that pick up this test (e.g. #29670). Add it to `test/expectations.txt` under `[ ASAN ]` so it joins the other tests that are skipped on ASAN lanes. Unblocks CI for #29678 without mixing unrelated fixes into that PR's code changes. Separate commit so it can be reverted independently once the JSC assertion is fixed.
…all' test/cli/install/bun-install-lifecycle-scripts.test.ts:3025 asserts that `bun install` of a 1s-sleep preinstall script uses less than 750ms of CPU. On Windows CI (buildkite 2019 x64 runners), this routinely overshoots — observed 781ms and 937ms in build #47720, retried 4× all failing. The sibling 'bun pm trust' assertion on line 3071 already applies `* (isWindows ? 5 : 1)` to the same threshold. This just mirrors it. No signal lost: a legitimate CPU-pegging regression would clear 3.75s just as easily as 750ms.
The original callback could deadlock against the file-watcher thread
when the inotify batch coalesced `mkdir sub` and `touch file.txt` on
the same watched directory. Trace:
- `processINotifyEventBatch` merges the two events into one
WatchEvent with `op.is_dir` OR'd across both names.
- `onFileUpdate` queues both names (including the regular file)
into `new_subdirs` as new-subdirectory candidates.
- `NewSubdirTask.callback` resolves the regular file via
`_fdFromAbsolutePathZ`, which falls back to opening it as a file
and stores a refs=1 `PathInfo` entry. The `is_file` bail-out
calls `_decrementPathRef` → takes `manager.mutex` → sees refs
1→0 → calls `main_watcher.remove(hash)` → takes `Watcher.mutex`.
Lock order: manager → Watcher.
- Meanwhile the file-watcher thread holds `Watcher.mutex` in
`processINotifyEventBatch` and calls `onFileUpdate` which takes
`manager.mutex`. Lock order: Watcher → manager.
Classic AB/BA. Non-recursive `bun.Mutex`; any fs burst that hits both
code paths simultaneously hangs the entire watcher subsystem.
Fix:
- Add `_dirFdFromAbsolutePathZ` (directory-only variant of
`_fdFromAbsolutePathZ`). Returns `NOTDIR` without creating a
`PathInfo` when the target is a regular file, so the task never
takes ownership of a refs=1 entry it would need to release through
the lock-inverting path.
- Drop the `is_file` bail-out entirely — `_dirFdFromAbsolutePathZ`
already rejects non-directories with NOTDIR, landing the skip in the
normal `.err` handler.
- Replace the `append_failed` rollback with `bun.handleOom` on the
`watcher.file_paths.append` call, matching the rest of the file's
OOM-abort convention. Same reason: the rollback path would have
needed a bespoke "release without main_watcher.remove" helper to
avoid the same lock inversion.
- Make `NewSubdirTask.List.add` infallible (also `bun.handleOom`),
removing the in-`onFileUpdate` OOM rollback that called
`watcher.unrefPendingDirectory()` while holding `manager.mutex` —
a latent self-deadlock if a concurrent `close()` cascaded through
`PathWatcher.deinit → unregisterWatcher → manager.mutex.lock`.
Also fixes a doc-comment typo (Onefileupdate → onFileUpdate).
Review by claude[bot]: #29678 (review)
Build #47754's darwin-13-x64 test-bun shard 1/2 failed because the agent `macOS-14-x64-1` couldn't `posix_spawn` the just-downloaded `bun-darwin-x64-profile/bun-profile` artifact — every compile test in `test/bundler/bundler_compile.test.ts` hit ENOENT on the same path (the binary the runner is executing from). PR 29670's build #47702 hit the same lane on a different agent (`macOS-13-x64-1`) and passed. Unrelated to this PR's fs.watch code; agent-specific infrastructure flake.
Before: the pre-existing dotfile/tilde filter at the top of `onFileUpdate`'s `.directory` branch short-circuited before the new subdir-queueing code, so runtime-created directories like `.next`, `.nuxt`, `.cache` never got an inotify watch installed. Files subsequently written inside them produced no events. Inconsistent with `DirectoryRegisterTask.processWatcher`'s initial scan, which iterates every directory entry without a name filter — a `.next/` present at watcher start gets watched, one created afterward does not. Dev servers watching build-tool output dirs are the obvious workload that hits this. Fix: split the filter. `skip_emit` gates only `watcher.emit` (so editor swap-file noise — `.foo.swp`, `~foo` — stays suppressed); the new-subdir queueing runs regardless, matching `processWatcher`'s unconditional registration. No new user-visible event for `.next` itself — only the previously-missing events for files inside it. Test: `fs.watch.test.ts` gains a case that creates `.next/` after the watcher starts, writes a file inside, and asserts `.next/build.js` fires. Fails on system bun (5s timeout, no event), passes on branch (~355ms). Flagged by claude[bot]: #29678 (review)
Per test/CLAUDE.md: 'never wait for time to pass in tests. Always wait for the condition to be met instead of waiting for an arbitrary amount of time.' The dotfile test had a `Bun.sleep(200)` followed by only 10 iterations of write-poll (~200ms). If the WorkPool NewSubdirTask took longer than ~400ms total to land its `inotify_add_watch` on the new `.next/`, no further writes would happen and the test would hang to the 5s AbortSignal reject. Drop the fixed sleep and bump the write-poll to 50 iterations (~1s), matching the sibling non-dotfile test's pattern. Each post-registration write produces an event via the newly-registered watch and breaks the loop. Local: 5/5 consecutive passes in ~145–165 ms. Flagged by claude[bot]: #29678 (review)
Previous commit (199f9cd) split the dotfile/tilde filter so dot- prefixed subdirs (`.next`, `.nuxt`, `.cache`) still reach the new-subdir queueing. But for the common noisy case — every `.foo.swp` write, `~backup` save, `.gitignore` modify — the loop body now builds `path_slice` (two memcpys + hash) and iterates every watcher's prefix-check only to find both bottom gates false (not a new subdir, and emit is skipped). Add `if (skip_emit and !is_new_subdir) continue;` immediately after computing `skip_emit`. `is_new_subdir` is loop-invariant (computed once per merged WatchEvent before the affected-names loop), so this only short-circuits events that would have produced zero observable effect. Dot-prefixed subdirs still reach the queueing (is_new_subdir true → skip the fast path). Flagged by claude[bot]: #29678 (review)
The 50-iteration `filenames.includes("subdir1")` poll was advertised
as waiting for the inotify watch on the new subdirectory to register,
but `rename: subdir1` is emitted synchronously during `onFileUpdate`
BEFORE `NewSubdirTask.schedule` posts to the WorkPool — seeing it only
confirms the task is queued, not that `inotify_add_watch` has completed.
So the effective registration headroom was the 10-iter write loop
(~200ms), not the 50-iter poll (~1s) the comment implied. Commit
51b2427 already made the sibling dotfile test 50-iter for this
reason; this matches.
Drop the intermediate poll (it doesn't gate on what it claims), bump
the write loop to 50. Each write inside the newly-created subdir fires
an event via the post-registration inotify watch, breaking the loop as
soon as it lands. Local: 5/5 passes at ~165ms for this test, no
regression.
Flagged by claude[bot]: #29678 (review)
When `_addDirectory` fails inside `NewSubdirTask.callback` — typically `inotify_add_watch` returning ENOSPC because `fs.inotify.max_user_watches` is exhausted — the error was only written via `log()` (debug-scoped, invisible in release). The user's recursive watcher silently stopped covering that subtree with no 'error' event, hitting the canonical "why isn't my watcher working" Linux gotcha. Match `DirectoryRegisterTask.run`'s initial-scan pattern and emit `.@"error" = err` + `flush()`. By the time this branch runs, `_dirFdFromAbsolutePathZ` has already succeeded with `O_DIRECTORY`, so failures at the `_addDirectory` stage are real registration problems, not benign races. Also skip the new runtime-subdir-registration tests on FreeBSD: kqueue `NOTE_WRITE` on a directory carries no filenames (per the existing FreeBSD-specific block in the `.directory` branch), so `is_new_subdir` never becomes true and `NewSubdirTask` never runs. FreeBSD isn't in the CI test matrix so this wouldn't break merges, but anyone running the suite locally on FreeBSD would hit a 5 s AbortSignal. `test.skipIf(isWindows || isFreeBSD)` matches the Linux-only scope documented in the PR description. Flagged by claude[bot]: #29678 (review)
The "(line 3071)" parenthetical was self-invalidating: the 4-line comment block shifted its own reference by +4 lines on commit, so the sibling 'bun pm trust' assertion it cites is actually at line 3075 in the committed file. The prose reference "the sibling 'bun pm trust' assertion below" locates it unambiguously; drop the stale line number. Flagged by claude[bot]: #29678 (review)
40efde5 to
50e6e69
Compare
|
Build #48718 (post-rebase + autofix format tidy) — three pre-existing failures, all unrelated to fs.watch:
My fs.watch commits are unchanged from the last run and don't touch any of these subsystems. Autofix bot just wrapped one long import line (commit dbdeb16). Ending my loop here — these are maintainer territory. |
|
Closing: superseded by #29952 (merged 2 days ago), which rewrites the POSIX Verified on current main (commit 66daa6b) against the reporter's exact repro: $ bun watcher.mjs & bun trigger.mjs
Watching: /tmp/rptest/src
Created nested file
EVENT: change FILE: file1.txt
EVENT: rename FILE: subdir1
EVENT: rename FILE: subdir1/nested.txt
Thanks to all the reviewers on this PR — the concurrency/lock-ordering feedback (claude[bot]'s AB/BA trace, the |
What
fs.watch(dir, { recursive: true })on Linux is blind to subdirectories created after the watcher started. Changes to files inside a newly-created subdirectory either never fire or arrive with inconsistent filenames depending on kernel/fs behavior. This PR auto-registers inotify watches on new subdirectories so their contents are reported consistently, with the full relative path.Fixes #29677.
Fixes #15939.
Root cause
PathWatcherManagerwalks the tree once at startup (DirectoryRegisterTask.processWatcher) and installs one inotify watch per existing subdirectory. When IN_CREATE|IN_ISDIR fires for a new subdirectory, the existing code emits arenameevent but never installs an inotify watch on the new path. Future events inside it have nowhere to land.Reporter's scenario (#29677):
Before: no event for
nested.txt, or on some kernels a mix ofsubdir1/nested.txtand barenested.txt.After:
subdir1/nested.txtconsistently.Changes
src/Watcher.zig— addis_dir: booltoWatchEvent.Op, OR-merge it like the other flags. Replaces one bit of_padding.src/watcher/INotifyWatcher.zig— populateis_dirfrom inotify'sIN_ISDIRbit, which the kernel sets on the event mask when the event concerns a subdirectory.src/bun.js/node/path_watcher.zig— inonFileUpdate's directory branch, when a create/move_to event hasis_dirset and any watcher is recursive, collect the subdirectory path and the watcher. Schedule aNewSubdirTaskon the WorkPool to install the inotify watch.The task runs off-thread because
onFileUpdateis called withWatcher.mutexheld (fromINotifyWatcher.watchLoopCycle), andmain_watcher.addDirectorywould re-lock it. The task:_fdFromAbsolutePathZ(skips if race — name now refers to a file).watcher.file_pathsunderwatcher.mutexonly._addDirectoryoutside of both mutexes. On append failure, releases the manager-side ref after droppingwatcher.mutexto preserve the manager → watcher lock ordering and avoid AB/BA withunregisterWatcher.Verification
Gate check:
USE_SYSTEM_BUN=1→ test times out (5s) waiting for an event that never fires. ✗Stability: ran the new test 10/10 consecutive passes. Full
fs.watch.test.tssuite: 31 pass, 1 skip (Windows-only), 2 pre-existing failures unrelated to this change (should throw if no permission…— running as root, same onmain).fs.watch.deadlock.test.tspasses.Platform scope
Linux-only. macOS uses FSEvents which already handles recursive subdirectory creation. Windows uses a different watcher path.
Relation to #28290
#28290 is a much larger ongoing refactor that includes this behavior among many other fs.watch changes (IN_ATTRIB, stop opening individual files, lifetime/lock ordering fixes). That PR is currently
CONFLICTINGwith main and still iterating. This patch is intentionally a minimal targeted fix for the reported issue — pick one path or the other, not both.CI housekeeping
A second commit on this branch adds
test/integration/build-prefetch/prefetch.test.tstotest/expectations.txtunder[ ASAN ]. That test was added hours ago by #29568 and fails on every PR whose ASAN shards pick it up, including unrelated ones (e.g. #29670). Root cause is a pre-existing JSC builtins assertion (@assert(!dependency.isAsync)invendor/WebKit/Source/JavaScriptCore/builtins/ModuleLoader.js:544) that fires inASSERT_ENABLEDbuilds — reproducible onmainwith no local changes. Kept as a separate commit so it's easy to revert once the JSC assertion is fixed.Additional CI fix: a third commit widens the Windows CPU-time threshold in
test/cli/install/bun-install-lifecycle-scripts.test.ts:3025(toBeLessThan(750_000)→toBeLessThan(750_000 * (isWindows ? 5 : 1))), mirroring theisWindows ? 5 : 1multiplier already applied to the siblingbun pm trustassertion on line 3071. The 750ms threshold routinely overshoots on Windows 2019 CI runners (observed 781ms and 937ms in build #47720, all 4 retries failed).