Skip to content

fs.watch: fully initialize ConcurrentTask in FSWatchTask.enqueue#29950

Open
robobun wants to merge 7 commits into
mainfrom
farm/ddb03758/fswatch-concurrent-task-init
Open

fs.watch: fully initialize ConcurrentTask in FSWatchTask.enqueue#29950
robobun wants to merge 7 commits into
mainfrom
farm/ddb03758/fswatch-concurrent-task-init

Conversation

@robobun

@robobun robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator

What

FSWatchTaskPosix.enqueue() was writing only .task on the embedded concurrent_task, leaving .next (the packed PackedNextPtr that carries the auto_delete flag in its low bit) undefined:

var that = FSWatchTask.new(this.*);            // copies undefined concurrent_task
that.concurrent_task.task = jsc.Task.init(that); // sets only .task; .next stays undefined
this.ctx.enqueueTaskConcurrent(&that.concurrent_task);

UnboundedQueue.pushBatch routes through PackedNextPtr.setPtr, which preserves the existing low bit:

pub inline fn setPtr(self: *PackedNextPtr, ptr: ?*ConcurrentTask) void {
    const auto_del = @intFromEnum(self.*) & 1;   // reads undefined memory
    ...
}

If that bit reads as 1, EventLoop.tickConcurrentWithCount sees autoDelete() == true and calls dest.deinit()bun.destroy(&fswatch_task.concurrent_task). But concurrent_task is an embedded field (non-zero offset) inside the heap-allocated FSWatchTask, so this is an interior-pointer free → mimalloc heap corruption.

This was the only call site in the codebase that partially initialized a jsc.ConcurrentTask. Every other site uses ConcurrentTask.from(this, .manual_deinit) or full struct initialization, both of which set .next.

Fix

Use that.concurrent_task.from(that, .manual_deinit) to fully initialize both .task and .next, matching hot_reloader.zig, s3/, shell/, and all other enqueue sites.

Also add a debug assertion in enqueueTaskConcurrent that task.next.getPtr() == null before push. Any future caller that leaves .next undefined with observable garbage pointer bits will trip this assertion in debug builds instead of silently corrupting the heap in release.

Test note

The regression test at test/js/node/watch/fs.watch.concurrent-task.test.ts stress-tests fs.watch under load (64 watchers × 50 rounds of writes → hundreds of FSWatchTask.enqueue calls). It provides coverage for the enqueue path and will fail if the new assertion fires.

I verified empirically that in the current debug build, Zig happens to lower the = undefined default such that .next reads as 0x0 (the compiler's temporary for the struct literal lands on zeroed memory in this codegen), and 0xAA where written has low bit 0 anyway — so the UB is presently benign in both debug and release. This means the test does not deterministically crash without the fix; the bug is latent UB that could surface with any compiler/allocator change. The fix + assertion together eliminate it and guard against recurrence.

@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 11:55 AM PT - May 4th, 2026

@robobun, your commit daa04db has 3 failures in Build #51156 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29950

That installs a local version of the PR into your bun-29950 executable, so you can run:

bun-29950 --bun

@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds an assertion (guarded by Environment.allow_assert) in EventLoop.enqueueTaskConcurrent that validates ConcurrentTask.next by reading raw enum bits; refactors POSIX FSWatchTask enqueue to construct jsc.ConcurrentTask in-place; adds a POSIX-only fs.watch concurrency regression test.

Changes

Cohort / File(s) Summary
Event loop assertion
src/bun.js/event_loop.zig
Under Environment.allow_assert, enqueueTaskConcurrent now validates ConcurrentTask.next initialization by casting the enum to raw integer bits and asserting the preserved low auto_delete pointer-bit pattern, avoiding a getPtr() call that could trigger Zig alignment checks.
FS watcher enqueue refactor
src/bun.js/node/node_fs_watcher.zig
POSIX enqueue path updated to construct jsc.ConcurrentTask in-place using from(that, .manual_deinit), replacing the prior separate init-then-enqueue sequence; control flow and behavior unchanged.
Regression test (POSIX-only)
test/js/node/watch/fs.watch.concurrent-task.test.ts
Adds a POSIX-only test that spawns a Bun child creating 64 fs.watch watchers, performs initial warm-up writes then 50 stress rounds of rewrites with setImmediate pacing, collects callback counts, and asserts empty stderr, stdout starting with OK , and exit code 0.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main fix: fully initializing ConcurrentTask in FSWatchTask.enqueue, which is the primary change across all files.
Description check ✅ Passed The PR description comprehensively covers the what, why, and how, with detailed technical context and fix explanation; verification is addressed through the regression test.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread test/js/node/watch/fs.watch.concurrent-task.test.ts Outdated
Comment thread test/js/node/watch/fs.watch.concurrent-task.test.ts Outdated
@robobun

robobun commented Apr 29, 2026

Copy link
Copy Markdown
Collaborator Author

CI failures across builds #49196 / #49229 / #49235 are pre-existing flakes unrelated to this change:

Test Failure Also seen in
test/cli/install/bun-create.test.ts GitHub 403 rate limit builds 49189, 49185, 49180, 49173, 49229, 49235
test/js/web/fetch/fetch-http2-client.test.ts ASSERTION FAILED: wasRemoved in WTF::AtomStringImpl::remove builds 49181, 49177
test/js/node/watch/fs.watch.test.ts closed FSWatcher is collectable segfault in mimalloc free-list during PathWatcher.init builds 49189, 49181, 49177, 49173
x64-musl / x64-musl-baseline link-stage link bun-profile failed (infra) build 49227
test/cli/inspect/HTTPServerAgent.test.ts flaky, passed on retry

The fs.watch.test.ts crash (GDB trace from 49189 shows mi_page_thread_collect_to_local corruption at 0x75622F706D742F = ASCII "/tmp/bu") is a pre-existing PathWatcher lifecycle issue in the test added by #29907 — likely related to the race being fixed in #29936. That test doesn't write files so it never reaches FSWatchTask.enqueue() (the path this PR fixes).

The new test fs.watch.concurrent-task.test.ts passed on all platforms across all builds. The new enqueueTaskConcurrent assertion did not fire anywhere in the suite.


Review feedback addressed:

  • 0672f5a: test waits for first event before stress phase (macOS FSEvents latency)
  • e9bd128: assertion checks raw bits instead of getPtr() (avoids @ptrFromInt alignment panic)
  • ca83cc9: test header comment synced, explicit 30s timeout added

Comment thread src/bun.js/event_loop.zig Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the one-line switch to ConcurrentTask.from(that, .manual_deinit) matches every other enqueue site, and the debug-only assertion is a nice guardrail. Two nits inline (test comment wording + local test timeout), neither blocking.

Extended reasoning...

Overview

This PR fixes latent UB in FSWatchTaskPosix.enqueue() (src/bun.js/node/node_fs_watcher.zig) where the embedded concurrent_task.next field was left undefined, replacing a partial field write with the canonical ConcurrentTask.from(that, .manual_deinit) initializer used by every other concurrent-enqueue call site (FetchTasklet, WorkTask, napi, s3, etc.). It also adds a debug-only assertion in EventLoop.enqueueTaskConcurrent (src/bun.js/event_loop.zig) that the pointer bits of .next are zero before push, plus a new POSIX stress test.

Security risks

None. No user input handling, auth, crypto, or permissions are touched. The runtime change is a struct-initialization correctness fix and a debug-build-only assertion; release behavior is identical except that .next is now deterministically .none instead of undefined.

Level of scrutiny

Moderate-low. While event_loop.zig is core infrastructure, the addition there is purely additive and gated by Environment.allow_assert (no release-build impact). The node_fs_watcher.zig change is a 2→1 line mechanical refactor to the established pattern — I verified ConcurrentTask.from (ConcurrentTask.zig:99-107) writes both .task and .next, which is exactly what's needed. The PR description's analysis of PackedNextPtr.setPtr preserving the low bit and the interior-pointer-free hazard is sound.

Other factors

All three of my prior inline comments (macOS FSEvents flakiness, unused path import, getPtr() alignment panic before assertf) were addressed in commits 0672f5a, cbd4571, and e9bd128 respectively and are resolved. CI failures were triaged by the author as pre-existing flakes; the new test and assertion passed on all platforms. The two remaining inline nits from this run — a stale comment referencing getPtr() and the 10s phase-1 fallback exceeding the 5s local default test timeout — are cosmetic / local-dev-only and do not affect correctness or CI.

Comment thread test/js/node/watch/fs.watch.concurrent-task.test.ts
Comment thread test/js/node/watch/fs.watch.concurrent-task.test.ts Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all prior feedback addressed; the one-line .from(that, .manual_deinit) fix matches the established pattern across the codebase and the debug-only assertion is correctly gated.

Extended reasoning...

Overview

This PR fixes latent UB in FSWatchTaskPosix.enqueue() where concurrent_task.next (a PackedNextPtr whose low bit encodes auto_delete) was left undefined. Three files touched:

  • src/bun.js/node/node_fs_watcher.zig: 1-line production fix — replace partial field assignment with that.concurrent_task.from(that, .manual_deinit), identical to the pattern used in napi.zig, FetchTasklet.zig, s3/, WorkTask.zig, ConcurrentPromiseTask.zig, etc. I verified ConcurrentTask.from (ConcurrentTask.zig:99) writes both .task and .next.
  • src/bun.js/event_loop.zig: debug-only assertion inside the existing Environment.allow_assert block in enqueueTaskConcurrent, checking (@intFromEnum(task.next) & ~1) == 0. No release-build code path is affected.
  • test/js/node/watch/fs.watch.concurrent-task.test.ts: new POSIX-only stress test with explicit 30s timeout.

Security risks

None. No user-input handling, auth, crypto, or network surface is touched. The change tightens initialization of an internal queue node and adds a debug invariant check.

Level of scrutiny

The production change is mechanical: it swaps a partial struct write for the canonical helper used at every other enqueueTaskConcurrent call site in the tree. The event-loop assertion is gated behind Environment.allow_assert and sits alongside an existing has_terminated panic in the same block, so it cannot affect release builds. The reasoning in the PR description (interior-pointer free via preserved low bit in setPtr) is sound and matches the PackedNextPtr/tickConcurrentWithCount code.

Other factors

I left three rounds of inline feedback on earlier revisions (macOS FSEvents latency flake, getPtr() alignment-panic vs. raw-bits check, stale test comment, missing explicit test timeout). All were addressed in 0672f5a, e9bd128, and ca83cc9; the final 12264f8 is a prettier-only autofix of the test file. The bug-hunting system found no issues in the current revision.

The robobun CI comment for ca83cc9 lists widespread build-zig/build-cpp failures across all platforms; since ca83cc9 only edited a TypeScript test file (and build-cpp does not compile Zig or TS tests), those are infrastructure failures unrelated to this change — consistent with the earlier robobun analysis on build #49196.

robobun and others added 6 commits May 4, 2026 10:27
FSWatchTaskPosix.enqueue() was only setting .task on concurrent_task,
leaving .next (PackedNextPtr) undefined. PackedNextPtr.setPtr preserves
the low auto_delete bit from the existing value, so if the undefined
memory had the low bit set, tickConcurrentWithCount would call
bun.destroy on the embedded ConcurrentTask — an interior pointer inside
the heap-allocated FSWatchTask — corrupting the mimalloc heap.

Use ConcurrentTask.from(that, .manual_deinit) to fully initialize both
.task and .next, matching every other enqueueTaskConcurrent call site.

Add a debug assertion in enqueueTaskConcurrent that .next.getPtr() is
null before push, to catch this class of partial-initialization bug at
any call site going forward.
…tency)

On macOS, directory fs.watch routes through FSEvents which has ~50ms
coalescing latency and async stream registration via the CFRunLoop
thread. A fixed 50-round setImmediate loop can complete before any
events are delivered.

Wait for received > 0 before starting the stress phase, and drop the
received === 0 failure check — the regression signal is the debug
assertion / no heap corruption in FSWatchTask.enqueue(), not event
delivery count.
getPtr() does @ptrFromInt(addr); for misaligned garbage (e.g. 0xAA..AA
which is 2 mod 8), Zig's alignment safety check panics with 'incorrect
alignment' before the assertf diagnostic can print. Test the raw bits
directly so the intended message is delivered, and include the actual
value in the output.
The enqueueTaskConcurrent assertion now checks raw bits rather than
calling getPtr(); update the test header comment to match.

Pass an explicit 30s test timeout so the 10s phase-1 fallback (for
environments where fs events never arrive) is reachable under the
5s local default — keeps local behavior consistent with CI.
@Jarred-Sumner Jarred-Sumner force-pushed the farm/ddb03758/fswatch-concurrent-task-init branch from 12264f8 to ebfa3ff Compare May 4, 2026 10:27
Comment thread test/js/node/watch/fs.watch.concurrent-task.test.ts Outdated
close() runs on the JS thread and emits 'close' via emitJS() directly;
it does not route through FSWatchTask.enqueue()/enqueueTaskConcurrent().
Reword the two fixture comments that implied otherwise.
@robobun robobun force-pushed the farm/ddb03758/fswatch-concurrent-task-init branch from a82b09e to daa04db Compare May 4, 2026 10:55

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🔴 src/bun.js/bindings/GeneratedBindings.zig:1-3 — Commit a82b09e accidentally added two stale codegen artifacts (src/bun.js/bindings/GeneratedBindings.zig and GeneratedJS2Native.zig, 779 lines) under a phantom src/bun.js/ directory that no longer exists post-reorg — they're unrelated to the comment fix or this PR's fs.watch change. .gitignore covers these names at src/jsc/bindings/ and src/bun.zig imports them from ./jsc/bindings/, so the committed copies are unreferenced dead code; please drop both files from the PR.

    Extended reasoning...

    What was committed

    Commit a82b09e ("test: correct comments — FSWatcher.close() emits synchronously on POSIX") added two new files:

    • src/bun.js/bindings/GeneratedBindings.zig (391 lines)
    • src/bun.js/bindings/GeneratedJS2Native.zig (388 lines)

    The commit's stated purpose was rewording two test comments. These 779 lines of generated Zig have nothing to do with that, nor with this PR's actual change (the FSWatchTask.enqueue() ConcurrentTask init fix).

    Why these are stale dead code, not real source

    1. .gitignore already excludes them — at the correct path. Lines 128-129 of .gitignore ignore src/jsc/bindings/GeneratedJS2Native.zig and src/jsc/bindings/GeneratedBindings.zig. The committed copies landed at src/bun.js/bindings/ instead, which the ignore pattern does not cover, so git picked them up.

    2. Nothing references them. src/bun.zig:209,214 imports GeneratedBindings.zig and GeneratedJS2Native.zig from ./jsc/bindings/, not ./bun.js/bindings/. No file in the tree imports from src/bun.js/bindings/.

    3. The directory itself is a phantom. After the source-tree reorg (src/bun.js/src/jsc/ + src/runtime/), src/bun.js/ no longer exists. ls src/bun.js/ shows it now contains only bindings/ with these two files — the commit recreated a removed directory tree solely to hold them.

    4. They self-identify as generated. GeneratedJS2Native.zig:1 reads "This file is generated by src/codegen/generate-js2native.ts". And the relative @import paths inside (e.g. ../node/node_fs_binding.zig, ../api/ffi.zig, ../event_loop.zig) resolve relative to src/bun.js/, where those subdirectories no longer exist — so the files would not even compile if anything did reference them.

    Step-by-step proof

    1. git log --diff-filter=A -- src/bun.js/bindings/GeneratedBindings.zig → first appears in a82b09e.
    2. git show --stat a82b09e → commit message is "test: correct comments…", but the stat shows +779 lines under src/bun.js/bindings/ plus the two-line test-comment edit.
    3. grep -rn 'bun.js/bindings/Generated' src/ → no hits; grep -rn 'jsc/bindings/GeneratedBindings' src/bun.zig → matches at lines 209, 214. The build uses src/jsc/bindings/, not these copies.
    4. ls src/bun.js/bindings/ and nothing else; ls src/bun.js/bindings/ → exactly the two generated files.
    5. .gitignore:128-129src/jsc/bindings/GeneratedJS2Native.zig and src/jsc/bindings/GeneratedBindings.zig are ignored; src/bun.js/bindings/* is not.

    The most likely cause is a stale local codegen run (from before the src/bun.js/src/jsc/ rename) that wrote to the old output path, then got swept up by git add -A when staging the comment fix.

    Impact

    No build breakage — the files are unreferenced — but this commits 779 lines of stale generated code and resurrects a directory tree that the codebase deliberately removed. It will confuse anyone reading the source layout and risks future stale-codegen drift if the path ever gets wired up again. Worth removing before merge.

    Fix

    Delete both files (and the now-empty src/bun.js/bindings/ / src/bun.js/ directories) from the PR:

    git rm -r src/bun.js/

    Optionally, broaden the .gitignore entry to **/GeneratedBindings.zig / **/GeneratedJS2Native.zig so a stale codegen output at the old path can't slip in again.

@robobun

robobun commented May 4, 2026

Copy link
Copy Markdown
Collaborator Author

Re the stale src/bun.js/bindings/Generated*.zig files flagged in the review of a82b09e: already caught and removed — that commit was amended to daa04db before the review landed. Current PR diff is exactly 3 files:

src/jsc/event_loop.zig                             | 10 +++
src/runtime/node/node_fs_watcher.zig               |  3 +-
test/js/node/watch/fs.watch.concurrent-task.test.ts | 89 +++

No src/bun.js/ artifacts in the diff.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all review feedback addressed; the fix matches the established ConcurrentTask.from pattern and the new assertion is debug-only.

Extended reasoning...

Overview

This PR fixes latent UB in FSWatchTaskPosix.enqueue() where concurrent_task.next was left undefined before being pushed onto the concurrent task queue. Three files touched:

  • src/runtime/node/node_fs_watcher.zig (-2/+1): replaces partial field assignment with that.concurrent_task.from(that, .manual_deinit). I verified ConcurrentTask.from (src/event_loop/ConcurrentTask.zig:99) fully initializes both .task and .next, and this exact call shape is used at 8+ other sites (s3, fetch, napi, WorkTask, ConcurrentPromiseTask, etc.).
  • src/jsc/event_loop.zig (+10): adds a debug-only assertion inside the existing if (comptime Environment.allow_assert) block in enqueueTaskConcurrent, checking the raw pointer bits of .next are zero. No release-build impact.
  • test/js/node/watch/fs.watch.concurrent-task.test.ts (new, +89): POSIX-only stress test spawning a child with 64 watchers × 50 write rounds.

Security risks

None. No user-controlled input reaches the changed code paths; the fix tightens initialization of an internal struct and adds a defensive debug assertion. No auth/crypto/permissions involved.

Level of scrutiny

Low-to-moderate. While enqueueTaskConcurrent is on the hot event-loop path, the only addition there is inside a comptime-gated debug block — release builds are byte-identical. The watcher change is a mechanical one-liner conforming to the codebase's universal pattern for embedded ConcurrentTask enqueue. The PR description correctly identifies this as the only call site that partially initialized a jsc.ConcurrentTask.

Other factors

  • This PR has been through five review rounds. Every inline comment I raised (macOS FSEvents flakiness, unused import, getPtr() alignment-panic in the assertion, test timeout vs. local default, two comment-accuracy nits) was addressed in follow-up commits and all threads are marked resolved. The latest commit (daa04db) fixed my final doc-only nit.
  • robobun confirmed the new test passes on all platforms across multiple builds and the new assertion did not fire anywhere in the existing suite; remaining CI failures are documented pre-existing flakes unrelated to this change.
  • No CODEOWNERS cover the touched files.
  • The bug-hunting system found no issues this round.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant