fs.watch: fully initialize ConcurrentTask in FSWatchTask.enqueue#29950
fs.watch: fully initialize ConcurrentTask in FSWatchTask.enqueue#29950robobun wants to merge 7 commits into
Conversation
|
Updated 11:55 AM PT - May 4th, 2026
❌ @robobun, your commit daa04db has 3 failures in
🧪 To try this PR locally: bunx bun-pr 29950That installs a local version of the PR into your bun-29950 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds an assertion (guarded by Changes
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
CI failures across builds #49196 / #49229 / #49235 are pre-existing flakes unrelated to this change:
The fs.watch.test.ts crash (GDB trace from 49189 shows The new test Review feedback addressed: |
There was a problem hiding this comment.
LGTM — the one-line switch to ConcurrentTask.from(that, .manual_deinit) matches every other enqueue site, and the debug-only assertion is a nice guardrail. Two nits inline (test comment wording + local test timeout), neither blocking.
Extended reasoning...
Overview
This PR fixes latent UB in FSWatchTaskPosix.enqueue() (src/bun.js/node/node_fs_watcher.zig) where the embedded concurrent_task.next field was left undefined, replacing a partial field write with the canonical ConcurrentTask.from(that, .manual_deinit) initializer used by every other concurrent-enqueue call site (FetchTasklet, WorkTask, napi, s3, etc.). It also adds a debug-only assertion in EventLoop.enqueueTaskConcurrent (src/bun.js/event_loop.zig) that the pointer bits of .next are zero before push, plus a new POSIX stress test.
Security risks
None. No user input handling, auth, crypto, or permissions are touched. The runtime change is a struct-initialization correctness fix and a debug-build-only assertion; release behavior is identical except that .next is now deterministically .none instead of undefined.
Level of scrutiny
Moderate-low. While event_loop.zig is core infrastructure, the addition there is purely additive and gated by Environment.allow_assert (no release-build impact). The node_fs_watcher.zig change is a 2→1 line mechanical refactor to the established pattern — I verified ConcurrentTask.from (ConcurrentTask.zig:99-107) writes both .task and .next, which is exactly what's needed. The PR description's analysis of PackedNextPtr.setPtr preserving the low bit and the interior-pointer-free hazard is sound.
Other factors
All three of my prior inline comments (macOS FSEvents flakiness, unused path import, getPtr() alignment panic before assertf) were addressed in commits 0672f5a, cbd4571, and e9bd128 respectively and are resolved. CI failures were triaged by the author as pre-existing flakes; the new test and assertion passed on all platforms. The two remaining inline nits from this run — a stale comment referencing getPtr() and the 10s phase-1 fallback exceeding the 5s local default test timeout — are cosmetic / local-dev-only and do not affect correctness or CI.
There was a problem hiding this comment.
LGTM — all prior feedback addressed; the one-line .from(that, .manual_deinit) fix matches the established pattern across the codebase and the debug-only assertion is correctly gated.
Extended reasoning...
Overview
This PR fixes latent UB in FSWatchTaskPosix.enqueue() where concurrent_task.next (a PackedNextPtr whose low bit encodes auto_delete) was left undefined. Three files touched:
src/bun.js/node/node_fs_watcher.zig: 1-line production fix — replace partial field assignment withthat.concurrent_task.from(that, .manual_deinit), identical to the pattern used in napi.zig, FetchTasklet.zig, s3/, WorkTask.zig, ConcurrentPromiseTask.zig, etc. I verifiedConcurrentTask.from(ConcurrentTask.zig:99) writes both.taskand.next.src/bun.js/event_loop.zig: debug-only assertion inside the existingEnvironment.allow_assertblock inenqueueTaskConcurrent, checking(@intFromEnum(task.next) & ~1) == 0. No release-build code path is affected.test/js/node/watch/fs.watch.concurrent-task.test.ts: new POSIX-only stress test with explicit 30s timeout.
Security risks
None. No user-input handling, auth, crypto, or network surface is touched. The change tightens initialization of an internal queue node and adds a debug invariant check.
Level of scrutiny
The production change is mechanical: it swaps a partial struct write for the canonical helper used at every other enqueueTaskConcurrent call site in the tree. The event-loop assertion is gated behind Environment.allow_assert and sits alongside an existing has_terminated panic in the same block, so it cannot affect release builds. The reasoning in the PR description (interior-pointer free via preserved low bit in setPtr) is sound and matches the PackedNextPtr/tickConcurrentWithCount code.
Other factors
I left three rounds of inline feedback on earlier revisions (macOS FSEvents latency flake, getPtr() alignment-panic vs. raw-bits check, stale test comment, missing explicit test timeout). All were addressed in 0672f5a, e9bd128, and ca83cc9; the final 12264f8 is a prettier-only autofix of the test file. The bug-hunting system found no issues in the current revision.
The robobun CI comment for ca83cc9 lists widespread build-zig/build-cpp failures across all platforms; since ca83cc9 only edited a TypeScript test file (and build-cpp does not compile Zig or TS tests), those are infrastructure failures unrelated to this change — consistent with the earlier robobun analysis on build #49196.
FSWatchTaskPosix.enqueue() was only setting .task on concurrent_task, leaving .next (PackedNextPtr) undefined. PackedNextPtr.setPtr preserves the low auto_delete bit from the existing value, so if the undefined memory had the low bit set, tickConcurrentWithCount would call bun.destroy on the embedded ConcurrentTask — an interior pointer inside the heap-allocated FSWatchTask — corrupting the mimalloc heap. Use ConcurrentTask.from(that, .manual_deinit) to fully initialize both .task and .next, matching every other enqueueTaskConcurrent call site. Add a debug assertion in enqueueTaskConcurrent that .next.getPtr() is null before push, to catch this class of partial-initialization bug at any call site going forward.
…tency) On macOS, directory fs.watch routes through FSEvents which has ~50ms coalescing latency and async stream registration via the CFRunLoop thread. A fixed 50-round setImmediate loop can complete before any events are delivered. Wait for received > 0 before starting the stress phase, and drop the received === 0 failure check — the regression signal is the debug assertion / no heap corruption in FSWatchTask.enqueue(), not event delivery count.
getPtr() does @ptrFromInt(addr); for misaligned garbage (e.g. 0xAA..AA which is 2 mod 8), Zig's alignment safety check panics with 'incorrect alignment' before the assertf diagnostic can print. Test the raw bits directly so the intended message is delivered, and include the actual value in the output.
The enqueueTaskConcurrent assertion now checks raw bits rather than calling getPtr(); update the test header comment to match. Pass an explicit 30s test timeout so the 10s phase-1 fallback (for environments where fs events never arrive) is reachable under the 5s local default — keeps local behavior consistent with CI.
12264f8 to
ebfa3ff
Compare
close() runs on the JS thread and emits 'close' via emitJS() directly; it does not route through FSWatchTask.enqueue()/enqueueTaskConcurrent(). Reword the two fixture comments that implied otherwise.
a82b09e to
daa04db
Compare
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
src/bun.js/bindings/GeneratedBindings.zig:1-3— Commit a82b09e accidentally added two stale codegen artifacts (src/bun.js/bindings/GeneratedBindings.zigandGeneratedJS2Native.zig, 779 lines) under a phantomsrc/bun.js/directory that no longer exists post-reorg — they're unrelated to the comment fix or this PR's fs.watch change..gitignorecovers these names atsrc/jsc/bindings/andsrc/bun.zigimports them from./jsc/bindings/, so the committed copies are unreferenced dead code; please drop both files from the PR.Extended reasoning...
What was committed
Commit a82b09e ("test: correct comments — FSWatcher.close() emits synchronously on POSIX") added two new files:
src/bun.js/bindings/GeneratedBindings.zig(391 lines)src/bun.js/bindings/GeneratedJS2Native.zig(388 lines)
The commit's stated purpose was rewording two test comments. These 779 lines of generated Zig have nothing to do with that, nor with this PR's actual change (the
FSWatchTask.enqueue()ConcurrentTask init fix).Why these are stale dead code, not real source
-
.gitignorealready excludes them — at the correct path. Lines 128-129 of.gitignoreignoresrc/jsc/bindings/GeneratedJS2Native.zigandsrc/jsc/bindings/GeneratedBindings.zig. The committed copies landed atsrc/bun.js/bindings/instead, which the ignore pattern does not cover, so git picked them up. -
Nothing references them.
src/bun.zig:209,214importsGeneratedBindings.zigandGeneratedJS2Native.zigfrom./jsc/bindings/, not./bun.js/bindings/. No file in the tree imports fromsrc/bun.js/bindings/. -
The directory itself is a phantom. After the source-tree reorg (
src/bun.js/→src/jsc/+src/runtime/),src/bun.js/no longer exists.ls src/bun.js/shows it now contains onlybindings/with these two files — the commit recreated a removed directory tree solely to hold them. -
They self-identify as generated.
GeneratedJS2Native.zig:1reads "This file is generated by src/codegen/generate-js2native.ts". And the relative@importpaths inside (e.g.../node/node_fs_binding.zig,../api/ffi.zig,../event_loop.zig) resolve relative tosrc/bun.js/, where those subdirectories no longer exist — so the files would not even compile if anything did reference them.
Step-by-step proof
git log --diff-filter=A -- src/bun.js/bindings/GeneratedBindings.zig→ first appears in a82b09e.git show --stat a82b09e→ commit message is "test: correct comments…", but the stat shows +779 lines undersrc/bun.js/bindings/plus the two-line test-comment edit.grep -rn 'bun.js/bindings/Generated' src/→ no hits;grep -rn 'jsc/bindings/GeneratedBindings' src/bun.zig→ matches at lines 209, 214. The build usessrc/jsc/bindings/, not these copies.ls src/bun.js/→bindings/and nothing else;ls src/bun.js/bindings/→ exactly the two generated files..gitignore:128-129→src/jsc/bindings/GeneratedJS2Native.zigandsrc/jsc/bindings/GeneratedBindings.zigare ignored;src/bun.js/bindings/*is not.
The most likely cause is a stale local codegen run (from before the
src/bun.js/→src/jsc/rename) that wrote to the old output path, then got swept up bygit add -Awhen staging the comment fix.Impact
No build breakage — the files are unreferenced — but this commits 779 lines of stale generated code and resurrects a directory tree that the codebase deliberately removed. It will confuse anyone reading the source layout and risks future stale-codegen drift if the path ever gets wired up again. Worth removing before merge.
Fix
Delete both files (and the now-empty
src/bun.js/bindings//src/bun.js/directories) from the PR:git rm -r src/bun.js/
Optionally, broaden the
.gitignoreentry to**/GeneratedBindings.zig/**/GeneratedJS2Native.zigso a stale codegen output at the old path can't slip in again.
|
Re the stale No |
There was a problem hiding this comment.
LGTM — all review feedback addressed; the fix matches the established ConcurrentTask.from pattern and the new assertion is debug-only.
Extended reasoning...
Overview
This PR fixes latent UB in FSWatchTaskPosix.enqueue() where concurrent_task.next was left undefined before being pushed onto the concurrent task queue. Three files touched:
src/runtime/node/node_fs_watcher.zig(-2/+1): replaces partial field assignment withthat.concurrent_task.from(that, .manual_deinit). I verifiedConcurrentTask.from(src/event_loop/ConcurrentTask.zig:99) fully initializes both.taskand.next, and this exact call shape is used at 8+ other sites (s3, fetch, napi, WorkTask, ConcurrentPromiseTask, etc.).src/jsc/event_loop.zig(+10): adds a debug-only assertion inside the existingif (comptime Environment.allow_assert)block inenqueueTaskConcurrent, checking the raw pointer bits of.nextare zero. No release-build impact.test/js/node/watch/fs.watch.concurrent-task.test.ts(new, +89): POSIX-only stress test spawning a child with 64 watchers × 50 write rounds.
Security risks
None. No user-controlled input reaches the changed code paths; the fix tightens initialization of an internal struct and adds a defensive debug assertion. No auth/crypto/permissions involved.
Level of scrutiny
Low-to-moderate. While enqueueTaskConcurrent is on the hot event-loop path, the only addition there is inside a comptime-gated debug block — release builds are byte-identical. The watcher change is a mechanical one-liner conforming to the codebase's universal pattern for embedded ConcurrentTask enqueue. The PR description correctly identifies this as the only call site that partially initialized a jsc.ConcurrentTask.
Other factors
- This PR has been through five review rounds. Every inline comment I raised (macOS FSEvents flakiness, unused import,
getPtr()alignment-panic in the assertion, test timeout vs. local default, two comment-accuracy nits) was addressed in follow-up commits and all threads are marked resolved. The latest commit (daa04db) fixed my final doc-only nit. - robobun confirmed the new test passes on all platforms across multiple builds and the new assertion did not fire anywhere in the existing suite; remaining CI failures are documented pre-existing flakes unrelated to this change.
- No CODEOWNERS cover the touched files.
- The bug-hunting system found no issues this round.
What
FSWatchTaskPosix.enqueue()was writing only.taskon the embeddedconcurrent_task, leaving.next(the packedPackedNextPtrthat carries theauto_deleteflag in its low bit) undefined:UnboundedQueue.pushBatchroutes throughPackedNextPtr.setPtr, which preserves the existing low bit:If that bit reads as 1,
EventLoop.tickConcurrentWithCountseesautoDelete() == trueand callsdest.deinit()→bun.destroy(&fswatch_task.concurrent_task). Butconcurrent_taskis an embedded field (non-zero offset) inside the heap-allocatedFSWatchTask, so this is an interior-pointer free → mimalloc heap corruption.This was the only call site in the codebase that partially initialized a
jsc.ConcurrentTask. Every other site usesConcurrentTask.from(this, .manual_deinit)or full struct initialization, both of which set.next.Fix
Use
that.concurrent_task.from(that, .manual_deinit)to fully initialize both.taskand.next, matchinghot_reloader.zig,s3/,shell/, and all other enqueue sites.Also add a debug assertion in
enqueueTaskConcurrentthattask.next.getPtr() == nullbefore push. Any future caller that leaves.nextundefined with observable garbage pointer bits will trip this assertion in debug builds instead of silently corrupting the heap in release.Test note
The regression test at
test/js/node/watch/fs.watch.concurrent-task.test.tsstress-tests fs.watch under load (64 watchers × 50 rounds of writes → hundreds ofFSWatchTask.enqueuecalls). It provides coverage for the enqueue path and will fail if the new assertion fires.I verified empirically that in the current debug build, Zig happens to lower the
= undefineddefault such that.nextreads as0x0(the compiler's temporary for the struct literal lands on zeroed memory in this codegen), and0xAAwhere written has low bit 0 anyway — so the UB is presently benign in both debug and release. This means the test does not deterministically crash without the fix; the bug is latent UB that could surface with any compiler/allocator change. The fix + assertion together eliminate it and guard against recurrence.