Skip to content

module-loader: narrow TLA re-entrancy skip to dynamic-import initiator (#30651)#30656

Closed
robobun wants to merge 14 commits into
mainfrom
farm/7009659f/fix-tla-cross-evaluate-dynamic-import
Closed

module-loader: narrow TLA re-entrancy skip to dynamic-import initiator (#30651)#30656
robobun wants to merge 14 commits into
mainfrom
farm/7009659f/fix-tla-cross-evaluate-dynamic-import

Conversation

@robobun

@robobun robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator

Fixes #30651. Supersedes #30639 — that PR drops the re-entrancy skip entirely, which regresses the Nitro self-deadlock tests (marked .todo there). This PR keeps both the deadlock avoidance and the new TDZ fix working.

Paired with oven-sh/WebKit#230; WEBKIT_VERSION points at its preview build.

Repro

// driver.mjs
const p1 = import("./entry1.mjs");
await new Promise(r => setTimeout(r, 10));
const p2 = import("./entry2.mjs");
await Promise.all([p1, p2]);

// entry1.mjs  +  entry2.mjs (each)
import { foo } from "./tla.mjs";
console.log(foo);

// tla.mjs
await new Promise(r => setTimeout(r, 100));
export const foo = 123;

On 1.3.14-canary.1+314ffe307 entry2 throws ReferenceError: Cannot access 'foo' before initialization; Node runs it correctly. User-reported variant: @lexical/react 0.44+ (#30634) and node-llama-cpp (#30651).

Cause

Same TDZ hole as #30259, but reached across two separate dynamic imports rather than static siblings inside one Evaluate(). The #30262 fix added three conditions to the Bun-specific re-entrancy skip at AbstractModuleRecord::innerModuleEvaluation 11.c.v:

  • depWasAlreadyEvaluatingAsync
  • asyncEvaluationOrder < asyncOrderWatermark
  • pendingAsyncDependencies == 0

All three fire identically for the Nitro self-deadlock (skip needed — waiting would deadlock the import's own continuation) and the cross-Evaluate() parallel-dynamic-import bug (skip wrong — the spec wait is what we want; the dep's await is independent of our evaluation). The watermark check was designed for same-Evaluate() re-entrancy; it can't distinguish cross-Evaluate() cases where each dynamic import gets a fresh watermark.

Fix

Add a fourth discriminator: dep == dynamic-import-initiator. The initiator is the CyclicModuleRecord whose JS body is awaiting the current import's result. Only skip if the dep is that module.

  • Nitro deadlock: target statically re-imports the initiator → match → still skip (back-compat preserved).
  • Parallel dynamic imports: dep and initiator are unrelated modules → no match → spec wait fires → importer correctly blocks until the TLA dep settles post-await.

Plumbing (all #if USE(BUN_JSC_ADDITIONS)-gated; non-Bun JSC builds unchanged):

  1. JSModuleLoader::requestImportModule resolves the referrer URL to a CyclicModuleRecord via the module registry.
  2. loadModule (dynamic overload) stashes it on the freshly-created ModuleLoadingContext.
  3. moduleLoadTopSettled copies it onto the ModuleLoaderPayload that outlives the context through the dynamic-import pipeline.
  4. dynamicImportLoadSettled pushes it onto VM::m_modulesAwaitingDynamicImport (a HashCountedSet) immediately before the target's evaluate() call, and pops after.
  5. innerModuleEvaluation 11.c.v reads vm.isModuleAwaitingDynamicImport(cyclic) as the fourth condition alongside the existing three.

The bun-side change (ZigGlobalObject::moduleLoaderImportModule) passes the sourceOrigin URL (converted to a file-system path, matching the module registry's key format) as the referrer to JSC::importModule. Previously we passed an empty Identifier, so the initiator was never resolvable.

Tests

test/js/bun/resolve/dynamic-import-tla-cycle.test.ts:

  • NEW: parallel dynamic imports of the same TLA dep wait instead of running against TDZ bindings — fails on main, passes here.
  • Existing dynamic import inside TLA whose target imports the awaiter back does not deadlock — still passes (Nitro back-compat preserved).
  • Existing dynamic import inside TLA of a non-entry module whose target imports it back does not deadlock — still passes.
  • Existing three static-sibling tests from Fix TLA sibling import running before shared dep settles (#30259) #30262 — still pass.

Rebase notes

This PR tracks a paired WebKit preview (oven-sh/WebKit#230), so WEBKIT_VERSION has to be re-pinned whenever main bumps WebKit. Most rebases only conflict on that one constant. The rebase onto the upstream WebKit 24362e675175 bump (oven-sh/WebKit#251) was the one non-trivial case: upstream dropped the JSGlobalObject* parameter from JSPromise::performPromiseThenWithInternalMicrotask. The two JSC commits were cherry-picked onto the new WebKit main with one manual resolution in JSModuleLoader.cpp::continueDynamicImport — kept the #if USE(BUN_JSC_ADDITIONS) routing via dynamicPayload (needed so dynamicImportLoadSettled can recover the initiator) while adopting the new no-globalObject signature. Everything else auto-merged.

@robobun robobun mentioned this pull request May 14, 2026
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 4:22 PM PT - Jun 16th, 2026

@robobun, your commit 0302850 has 1 failures in Build #62990 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 30656

That installs a local version of the PR into your bun-30656 executable, so you can run:

bun-30656 --bun

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Pass a computed non-empty referrer key from moduleLoaderImportModule into JSC::importModule so the import registry can coordinate parallel dynamic imports of the same top-level-await module; add a regression test reproducing the parallel TLA import case; update the WebKit prebuilt identifier; add missing include.

Changes

Module Import and Build Changes

Layer / File(s) Summary
Module import referrer tracking
src/jsc/bindings/ZigGlobalObject.cpp, test/js/bun/resolve/dynamic-import-tla-cycle.test.ts
moduleLoaderImportModule now computes a non-empty referrerKey from sourceOrigin.url() and passes it to JSC::importModule instead of an empty JSC::Identifier(). Adds a test that spawns two staggered parallel dynamic imports sharing a TLA dependency and asserts both entries receive foo = 123, with empty stderr and exit code 0.
Add assert include in WTF bindings
src/jsc/bindings/wtf-bindings.cpp
Adds #include <cassert> so assert(...) is available for the file’s terminal/raw-mode helper code path.
WebKit prebuilt version update
scripts/build/deps/webkit.ts
The exported WEBKIT_VERSION constant was changed to autobuild-preview-pr-230-783ba19a, updating the prebuilt WebKit release tag selection used by build scripts.
  • Possibly related PRs:
    • oven-sh/bun#29770: Adds/updates dynamic-import/TLA cycle tests related to module-loader async sibling-import behavior.
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description includes detailed technical context but lacks explicit sections matching the repository's template (What does this PR do? and How did you verify your code works?). Restructure the description to explicitly include the two required template sections: a concise 'What does this PR do?' summary and a 'How did you verify your code works?' section documenting test results.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and specifically describes the main change: narrowing TLA re-entrancy skip to only the dynamic-import initiator, with the issue number reference.
Linked Issues check ✅ Passed The PR's code changes directly implement the necessary fixes for #30651: restore TDZ-safe semantics for parallel dynamic imports while preserving Bun's deadlock avoidance via the new dynamic-import initiator discriminator.
Out of Scope Changes check ✅ Passed All changes are scoped to the TLA re-entrancy issue: WebKit version bump for paired PR, module loader referrer resolution, TLA cycle test addition, and cassert include—no extraneous changes present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/bun/resolve/dynamic-import-tla-cycle.test.ts`:
- Around line 176-190: Replace the timer-based coordination with a deterministic
promise gate: in tla.mjs export a "entered" promise (and a resolver, e.g.,
enterResolve) that is resolved as soon as tla.mjs begins execution, and remove
the await new Promise(setTimeout...) there; in the test/driver where p1 and p2
are created (the code that does import("./entry1.mjs") and
import("./entry2.mjs")), await the exported entered promise from tla.mjs after
starting p1 and before starting p2 so the second import only begins once tla.mjs
has entered; reference the module symbol names (tla.mjs exported
entered/enterResolve, p1/p2, import("./entry2.mjs"), entry1.mjs/entry2.mjs) to
locate and update the code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bfb5d0bf-d81b-4707-bc3a-4963f93e7039

📥 Commits

Reviewing files that changed from the base of the PR and between b8ecc78 and 1f04780.

📒 Files selected for processing (3)
  • scripts/build/deps/webkit.ts
  • src/jsc/bindings/ZigGlobalObject.cpp
  • test/js/bun/resolve/dynamic-import-tla-cycle.test.ts

Comment thread test/js/bun/resolve/dynamic-import-tla-cycle.test.ts
@github-actions

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Dynamic Import fails when multiple Vite plugins import same file #22367 - Parallel dynamic imports of the same file (from multiple Vite plugins) produce ReferenceError: Cannot access 'default' before initialization, matching the exact bug pattern this PR fixes

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #22367

🤖 Generated with Claude Code

@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. Fix TLA sibling dynamic-import TDZ regression (#30634) #30639 - Fixes the same TLA sibling dynamic-import TDZ regression ([1.3.14] ESM TDZ error when importing Lexical React modules that re-export through top-level await #30634), touching the same files (ZigGlobalObject.cpp, webkit.ts, dynamic-import-tla-cycle test); PR module-loader: narrow TLA re-entrancy skip to dynamic-import initiator (#30651) #30656 explicitly supersedes this one

🤖 Generated with Claude Code

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but this changes dynamic-import referrer plumbing in the core module loader and pins WEBKIT_VERSION to a preview build of an unmerged WebKit PR (oven-sh/WebKit#230) — worth a human look, and the version pin presumably needs to move to a real commit hash once #230 lands.

Extended reasoning...

Overview

This PR narrows Bun's TLA re-entrancy skip in innerModuleEvaluation to only fire when the EvaluatingAsync dependency is the dynamic-import initiator module, fixing a TDZ bug where parallel dynamic imports sharing a suspended TLA dep ran the second importer too early. The Bun-side diff is small — ZigGlobalObject::moduleLoaderImportModule now derives a registry key from sourceOrigin (file-system path for file://, stripped prefix for builtin://, raw string otherwise) and passes it as the referrer to JSC::importModule instead of an empty Identifier. A regression test is added in dynamic-import-tla-cycle.test.ts, and WEBKIT_VERSION is bumped to autobuild-preview-pr-230-6f24f5f7.

Security risks

None apparent. The referrer key is derived from the already-trusted sourceOrigin of the calling module and is used only for a registry lookup inside JSC; it doesn't flow into resolution, file-system access, or anything user-observable.

Level of scrutiny

High. Module evaluation ordering / TLA / cycle handling is one of the most subtle areas in the runtime, with a recent history of regressions (#30259, #30262, #30639, #30651). The actual semantic change lives in oven-sh/WebKit#230 and isn't visible in this diff, so a reviewer needs to evaluate both PRs together. Additionally, every dynamic import() in Bun now passes a non-empty referrer where it previously passed empty — even though the WebKit side only uses it for the new initiator lookup, that's a behavior change on a very hot path that deserves human eyes.

Other factors

  • WEBKIT_VERSION is set to a preview tag of an unmerged PR rather than a merged commit hash. That's fine for CI validation but presumably shouldn't land on main as-is — someone should confirm the intended sequencing (merge WebKit#230 → repoint to the real autobuild hash → merge this).
  • Test coverage is good: the new parallel-dynamic-import test plus the five existing tests in the same file cover both the Nitro self-deadlock back-compat and the new TDZ case.
  • The referrer-key derivation mirrors the existing resolve() path per the comment, but I didn't independently verify it matches every key format the registry uses (e.g. data URLs, virtual modules from plugins) — a mismatch would just mean the initiator isn't found and the skip stays disabled, which is the safe direction, but worth a glance.

@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Build-cpp failures on darwin-aarch64, darwin-x64, and windows-aarch64 are the WebKit preview-tarball-timing thing — WEBKIT_VERSION points at autobuild-preview-pr-230-298067bc and oven-sh/WebKit#230's build is still in_progress across all 39 lanes (started 03:04 UTC, ~30-45min per lane). Once the preview tarball publishes for each platform, bun CI can fetch and link. Same coordination pattern as #30639 → WebKit#228 → #30262 → WebKit#215.

Diff itself is green: all 6 tests in dynamic-import-tla-cycle.test.ts pass locally; under partial revert (src/ stashed while vendor/WebKit stays patched) the Nitro deadlock tests stay green too, thanks to VM::hasPendingDynamicImport() gating in the latest WebKit push.

Comment thread scripts/build/deps/webkit.ts Outdated
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

WebKit PR #230's CI at 38/41 completed — new preview tarball autobuild-preview-pr-230-298067bc should publish in ~10min (typical gap between last build and release-publish step). A maintainer can retry Buildkite #54170 once that release lists its Linux/macOS/Windows assets. The diff itself is green — all 6 TLA tests pass locally with-fix, and under partial revert the existing Nitro deadlock tests still pass thanks to the VM::hasPendingDynamicImport() gate.

Handing off to a maintainer: all red lanes are *-build-cpp, same WebKit-tarball-404 pattern that blocks every cross-repo-coordinated PR on first push (see #30639/WebKit#228, #30262/WebKit#215). Not pushing another commit — CI will go green on the next build once WebKit publishes.

@robobun robobun force-pushed the farm/7009659f/fix-tla-cross-evaluate-dynamic-import branch from 69d60ca to 1e63c2f Compare May 14, 2026 17:30
Comment thread src/jsc/bindings/ZigGlobalObject.cpp Outdated
@robobun

robobun commented May 14, 2026

Copy link
Copy Markdown
Collaborator Author

Current CI state at build #54351: WebKit preview tarball published (builds pass), but 8 lanes fail:

Unrelated to this diff:

Possibly related to my WebKit preview build:

Still handing off to a maintainer — the diff itself is green (6/6 TLA tests pass locally, gate-equivalent confirms without-fix fails with 1 expected failure, with-fix passes 6/6). The baseline-verify lane may need either (a) a merged WebKit#230 that triggers a fresh prebuilt, or (b) an allowlist entry, or (c) re-verification under the maintainer's release-build toolchain. Not pushing code — I've exhausted what I can diagnose from here.

@robobun

robobun commented May 17, 2026

Copy link
Copy Markdown
Collaborator Author

Issue #30949 is the same bug reached via the Payload CMS @payloadcms/richtext-lexical barrel (which pulls @lexical/react + lexical). Reporter's diagnosis (no TLA in the graph) was based on the "module" entry; the "exports" map's node condition actually routes to Lexical.node.mjs, which is the TLA wrapper this PR fixes.

@robobun robobun force-pushed the farm/7009659f/fix-tla-cross-evaluate-dynamic-import branch from 1e63c2f to b71d858 Compare May 18, 2026 05:47

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/build/deps/webkit.ts`:
- Line 6: The WEBKIT_VERSION constant was set to a non-existent preview tag
which will break downloads; update the value or guard its use: either revert
WEBKIT_VERSION back to a published release tag (replace the string in the
WEBKIT_VERSION export) or add a runtime fallback/validation before download that
checks the GitHub release exists and refuses to proceed (or uses an alternate
known-good tag) so code that references WEBKIT_VERSION (export const
WEBKIT_VERSION) will not attempt to download from a missing release.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d6850955-7744-4aee-bbb7-6c86b7c3595c

📥 Commits

Reviewing files that changed from the base of the PR and between 69d60ca and b71d858.

📒 Files selected for processing (1)
  • scripts/build/deps/webkit.ts

Comment thread scripts/build/deps/webkit.ts Outdated
Comment thread test/js/bun/resolve/dynamic-import-tla-cycle.test.ts Outdated
@robobun

robobun commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

Build #55708 fails on build-cpp across all platforms — same WebKit-preview-tarball-timing pattern as before. oven-sh/WebKit#230's CI is 41/42 complete, the final Build / release step (which publishes the tarballs) is still IN_PROGRESS (~10min ETA). Once it publishes autobuild-preview-pr-230-783ba19a, a maintainer can retry this build. Diff itself is still green: all 6 TLA tests pass locally, gate-sim confirms without-fix fails correctly and with-fix passes. Not pushing again.

Comment thread src/jsc/bindings/ZigGlobalObject.cpp Outdated
Comment on lines +3533 to +3547
JSC::Identifier referrerKey;
auto referrerURL = sourceOrigin.url();
if (!referrerURL.isEmpty()) {
String referrerKeyString;
if (referrerURL.protocolIsFile())
referrerKeyString = referrerURL.fileSystemPath();
else if (referrerURL.protocol() == "builtin"_s && referrerURL.string().startsWith("builtin://"_s))
referrerKeyString = referrerURL.string().substring(10);
else
referrerKeyString = referrerURL.string();
if (!referrerKeyString.isEmpty())
referrerKey = JSC::Identifier::fromString(vm, referrerKeyString);
}
auto result = JSC::importModule(globalObject, resolvedIdentifier,
JSC::Identifier(), WTF::move(parameters), nullptr);
referrerKey, WTF::move(parameters), nullptr);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new dep == initiator discriminator only matches the immediate dynamic-import initiator, so a depth-2 chain (M awaits import(T); T awaits import(U); U statically imports M) now deadlocks where pre-PR it did not — when U's DFS visits M, m_modulesAwaitingDynamicImport = {T} (M was popped when T.evaluate() returned), so condition 4 fails and U takes the spec wait, closing a wait-cycle. Both existing Nitro tests are depth-1 (chunk re-imports its own initiator), so this isn't covered; worth adding a 3-module M→T→U→M test to dynamic-import-tla-cycle.test.ts, and on the WebKit side either keeping the initiator pushed until its import promise settles or walking the pending chain transitively.

Extended reasoning...

What this is

The PR narrows the #30262 deadlock-avoidance skip at innerModuleEvaluation 11.c.v by adding a fourth condition: vm.isModuleAwaitingDynamicImport(dep). Per the PR description, m_modulesAwaitingDynamicImport is a HashCountedSet that dynamicImportLoadSettled pushes the initiator onto immediately before the target's evaluate() call, and pops after — i.e. a synchronous bracket around evaluate(). That means the set only ever contains the immediate initiator during its target's DFS; ancestors further up a dynamic-import chain have already been popped by the time a grand-child's evaluate() runs.

The depth-1 Nitro shape (entry awaits import(chunk); chunk statically imports entry back) still works because the chunk's back-edge target is its immediate initiator. But the depth-2 generalisation — entry → await import(T)await import(U) → static import {x} from entry — now deadlocks, where pre-PR it completed.

Step-by-step proof

// M.mjs (entry)
export const x = 1;
await import('./T.mjs');

// T.mjs
await import('./U.mjs');

// U.mjs
import { x } from './M.mjs';
console.log(x);
  1. M.evaluate(): body runs, x=1, hits await import(T) → M is EvaluatingAsync, pendingAsyncDependencies==0. moduleLoaderImportModule is called with sourceOrigin = M, so the new bun-side code at lines 3533–3545 computes referrerKey = M's file-system path.
  2. T's load settles. dynamicImportLoadSettled pushes M onto m_modulesAwaitingDynamicImport (set = {M}), calls T.evaluate().
  3. T has no static deps; T's body runs, hits await import(U) → T is EvaluatingAsync. moduleLoaderImportModule called with sourceOrigin = T, referrerKey = T.
  4. T.evaluate() returns synchronously (DFS done, body suspended). M is popped (set = {}). M's await import(T) is still pending — p_T resolves only when T reaches Evaluated.
  5. U's load settles (later macrotask, after async file I/O). dynamicImportLoadSettled pushes T (set = {T}), calls U.evaluate().
  6. U's DFS visits M. The four conditions at 11.c.v:
    • (1) depWasAlreadyEvaluatingAsync — M was EvaluatingAsync before this pass ✓
    • (2) M.asyncEvaluationOrder < U's watermark — M's order assigned two Evaluate() passes earlier ✓
    • (3) M.pendingAsyncDependencies == 0 — M has no static async deps ✓
    • (4) vm.isModuleAwaitingDynamicImport(M) — set = {T}, M ∉ set → false
  7. Condition 4 fails → take the spec wait. U.pendingAsyncDependencies = 1, U registered as M's asyncParentModule.

Result: U waits on M; M's body is suspended on p_T; p_T resolves when T → Evaluated; T's body is suspended on p_U; p_U resolves when U → Evaluated; U waits on M. Cycle = deadlock.

Pre-PR: conditions (1)(2)(3) all hold for M at step 6, so the skip fires. U's body runs immediately, reads x=1, U → Evaluated, T resumes, M resumes. Works.

Why existing code doesn't prevent it

The VM::hasPendingDynamicImport() coarse fallback (per robobun's partial-revert observation and the existing claude[bot] review at line 3545) only fires when the initiator is unknown — empty referrer / registry-lookup miss. It cannot be an unconditional disjunct, because in the #30651 repro p1 = import(entry1) is still in flight when entry2.evaluate() visits tla.mjs; an unconditional fallback would re-break the very test this PR adds. Here U's initiator is known (T is file-backed, referrerKey non-empty after this PR's bun-side change), so the precise check is used and misses M.

Both existing Nitro tests ("dynamic import inside TLA whose target imports the awaiter back" and the non-entry variant) are depth-1: the chunk directly re-imports its own initiator, so dep == initiator holds trivially. There is no test where the back-edge target is two or more dynamic-import hops above.

Impact

This is a behavioral regression in Bun's deliberate divergence from spec for the Nitro self-deadlock class — depth-1 working while depth-2 hangs is an inconsistent surprise. Nested code-splitting (entry → lazy chunk → sub-chunk that imports a shared util from the entry) is a realistic Rollup/Nitro output shape, exactly the chunk-splitting pattern that produced the original Nitro deadlock just one level deeper. Node also deadlocks here (it's a spec deadlock), but Bun explicitly chose to break the cycle for this class per the test file's own header comment.

Caveat: I cannot directly inspect oven-sh/WebKit#230's source (vendor/WebKit not checked out here), so this rests on the PR description's stated push/pop semantics — but those are specific, internally consistent, and corroborated by the review thread.

Suggested fix

WebKit-side, either (a) keep the initiator pushed until its dynamic-import promise settles (not just until evaluate() returns), so the set accumulates the whole pending dynamic-import chain; or (b) check transitively — if dep ∉ set but dep is suspended at an await import whose target's initiator chain leads back into the set. At minimum, add a depth-2 regression test (the 3-module M→T→U→M chain above) to dynamic-import-tla-cycle.test.ts to pin behavior — it would either confirm the regression or document that the actual implementation handles it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trace is correct — pushing the initiator only around evaluate() leaves the depth-2 chain (M → T → U → M static) and the depth-1 criss-cross variant (midA/midB awaiting swapped chunks that back-edge into each other, see the follow-up thread below) regressing relative to pre-PR behavior.

Deliberately out of scope for this PR. The ask here is narrow: stop the #30651 TDZ read for parallel dynamic imports of a shared TLA dep (Vite / Nitro / Payload users hit this), without reopening the Nitro self-deadlock #30262 fixed. The four-condition discriminator does exactly that: the reported repro now waits, the existing Nitro depth-1 tests still skip.

The cases you describe — depth-≥2 chains and criss-cross sibling back-edges — are spec deadlocks everywhere except Bun (Node deadlocks them too, per the test file's own header note about our deliberate divergence). They're also not drawn from the user reports driving this PR. Closing them properly takes a larger WebKit-side change (tracking the full set of EvaluatingAsync modules with outstanding dynamic-import promises, keyed off execution context rather than lexical sourceOrigin) that I'd rather do as a follow-up with its own test matrix.

For now: added the 3-module M→T→U→M chain and the 5-module criss-cross to my local todo for the follow-up, linked from a tracking issue once the main fix lands. Happy to pin them as .skip-d fixtures in this PR if that helps — LMK.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scoping makes sense — agreed the proper fix needs the larger WebKit-side change you describe (execution-context-keyed set, held until each import promise settles), and that belongs in its own PR with a dedicated test matrix.

Re the offer: yes please — pinning the 3-module M→T→U→M chain and the 5-module criss-cross as test.todo fixtures in dynamic-import-tla-cycle.test.ts would help. That gives the follow-up concrete in-tree repros to flip active, and documents the known gap right next to the file's header note about Bun's deliberate spec divergence — more durable than a tracking-issue todo that can drift from the actual fixture shapes. test.todo over .skip so CI surfaces it if a future WebKit bump happens to close one of them.

@robobun robobun force-pushed the farm/7009659f/fix-tla-cross-evaluate-dynamic-import branch from 114cdd0 to 5a3f1a3 Compare May 23, 2026 12:48
@robobun

robobun commented May 23, 2026

Copy link
Copy Markdown
Collaborator Author

WebKit CI for oven-sh/WebKit#230 at d57ee3d2 failed on the Linux arm64 Release lane (all other Linux builds were cancelled as a consequence), so no autobuild-preview-pr-230-d57ee3d2 tarball was published. All bun-side build-cpp lanes fail with HTTP 404 when fetching the tarball — same cross-repo coordination pattern as before. Diff is green locally (6/6 TLA tests pass on local-built debug-asan against my vendor/WebKit branch). Maintainer can either retry the Linux arm64 Release job on WebKit#230 to unblock the tarball build, or diagnose the arm64 failure if it's real. Not pushing another commit.

Comment thread src/jsc/bindings/ZigGlobalObject.cpp Outdated
@robobun robobun force-pushed the farm/7009659f/fix-tla-cross-evaluate-dynamic-import branch from 5a3f1a3 to 83a3b06 Compare May 25, 2026 10:12
@robobun

robobun commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author

Status: all 29 build-cpp lanes failed fetching autobuild-preview-pr-230-36cc1283.tar.gz — the WebKit preview build (PR #230) hit GPG key not reachable at https://apt.llvm.org/llvm-snapshot.gpg.key in the Linux arm64 Debug Dockerfile step, which cancelled the other Linux builds and blocked the tarball from publishing. Pure infra transient.

Triggered preview workflow Attempt 2 via gh run rerun; once it succeeds the tarball publishes and a retry of this PR's CI (empty push or re-run) will complete build-cpp. No code change needed from this side.

Comment thread src/jsc/bindings/ZigGlobalObject.cpp Outdated
Comment on lines +3555 to +3569
JSC::Identifier referrerKey;
auto referrerURL = sourceOrigin.url();
if (!referrerURL.isEmpty()) {
String referrerKeyString;
if (referrerURL.protocolIsFile())
referrerKeyString = referrerURL.fileSystemPath();
else if (referrerURL.protocol() == "builtin"_s && referrerURL.string().startsWith("builtin://"_s))
referrerKeyString = referrerURL.string().substring(10);
else
referrerKeyString = referrerURL.string();
if (!referrerKeyString.isEmpty())
referrerKey = JSC::Identifier::fromString(vm, referrerKeyString);
}
auto result = JSC::importModule(globalObject, resolvedIdentifier,
JSC::Identifier(), WTF::move(parameters), nullptr);
referrerKey, WTF::move(parameters), nullptr);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Two more regression shapes for the depth-2 finding above, which together rule out both fixes that comment proposed: (1) depth-1 criss-cross siblings — entry imports midA+midB; midA awaits import(chunkA) which back-edges to midB; midB awaits import(chunkB) which back-edges to midA. At chunkA's DFS, set={midA} so condition (4) misses midB; symmetric for chunkB → wait cycle. This is depth-1 (so the earlier comment under-scopes the regression) and rules out fix (b) "walk the pending chain transitively" — midA/midB are siblings with no ancestor relationship. (2) delegated import()router.mjs: export const loadFoo = () => import('./foo.mjs'); app.mjs: await loadFoo(); foo.mjs back-edges to app. sourceOrigin identifies the module lexically containing the import() keyword (router), not the module whose TLA body awaits the result (app), so set={router} during foo.evaluate() and app ∉ {router} → spec wait → deadlock. This is a set-identity bug orthogonal to the set-lifetime bug in the depth-2/criss-cross cases, and fix (a) "keep initiator pushed until promise settles" does NOT fix it since router (not app) is what gets pushed. Suggest adding both as fixtures alongside the depth-2 test, and on the WebKit side tracking the set of all EvaluatingAsync modules with an outstanding dynamic-import promise, independent of which file the import() keyword sits in.

Extended reasoning...

What this is

This extends the depth-2 finding already posted at this line (inline comment id 3256858258). That comment established that the new dep == initiator discriminator only matches the immediate dynamic-import initiator, and proposed two WebKit-side fixes: (a) keep the initiator pushed until its import promise settles, or (b) walk the pending chain transitively. Two further regression shapes — both of which worked pre-PR and deadlock under the PR description's stated push/pop-around-evaluate() semantics — show that neither fix as stated is sufficient, and together constrain the WebKit-side fix to something stronger than either.

This is filed as an addendum to the existing 🔴 comment rather than a fully independent finding: same root cause (the fourth condition's set under-approximates the modules whose progress depends on the current evaluate()), same code location, same WebKit-side fix area. The added value is (i) two distinct test fixtures the existing tests don't cover, and (ii) ruling out both of the earlier comment's suggested fixes, which materially shapes the WebKit#230 design.

Shape (1): depth-1 criss-cross siblings

entry.mjs:  import "./midA.mjs"; import "./midB.mjs";
midA.mjs:   export const a = 1; await import("./chunkA.mjs");
midB.mjs:   export const b = 2; await import("./chunkB.mjs");
chunkA.mjs: import { b } from "./midB.mjs"; export const handler = () => b;
chunkB.mjs: import { a } from "./midA.mjs"; export const handler = () => a;

Step-by-step trace (per the PR description's "push immediately before target's evaluate(), pop after"):

  1. entry.evaluate() DFS: midA's body runs (a=1), hits await import(chunkA) → midA EvaluatingAsync, pendingAsyncDependencies=0. midB's body runs (b=2), hits await import(chunkB) → midB EvaluatingAsync, pendingAsyncDependencies=0. entry has pendingAsyncDependencies=2.
  2. chunkA's load settles → dynamicImportLoadSettled pushes midA, set={midA}. chunkA.evaluate() DFS visits midB. Conditions: (1) midB was already EvaluatingAsync ✓; (2) midB.asyncOrder < chunkA's watermark ✓; (3) midB.pendingAsync==0 ✓; (4) midB ∈ {midA}? NO → spec wait. chunkA.pendingAsyncDependencies=1. Pop midA, set={}.
  3. chunkB's load settles → push midB, set={midB}. chunkB.evaluate() DFS visits midA. (4) midA ∈ {midB}? NO → spec wait. chunkB.pendingAsyncDependencies=1. Pop midB.

Cycle: chunkA waits midB; midB awaits import(chunkB); chunkB waits midA; midA awaits import(chunkA). Deadlock.

Pre-PR: at both step 2 and step 3, conditions (1)(2)(3) alone fire → both skip → chunkA reads b=2 (initialized before midB's await), chunkB reads a=1 → all complete.

Why this isn't covered by the existing comment: The depth-2 comment says "Both existing Nitro tests are depth-1 (chunk re-imports its own initiator), so this isn't covered" — implying depth-1 is safe. The criss-cross is depth-1 (each await import is one hop) and also regresses, so the regression is broader than "depth ≥ 2". More importantly, it rules out fix (b) "walk the pending chain transitively": at step 2, midB's await target is chunkB whose initiator chain is {midB}, disjoint from {midA}. midA and midB are siblings with no ancestor relationship, so no transitive walk along the dynamic-import chain from chunkA reaches midB. Only fix (a) handles this shape: if midA stays pushed until import(chunkA) settles, then at step 3 set={midA, midB}, midA matches → skip → chunkB completes → midB completes → chunkA's wait on midB resolves.

Shape (2): delegated import()

// router.mjs — synchronous, no TLA
export const loadFoo = () => import("./foo.mjs");

// app.mjs — entry, TLA awaiter
import { loadFoo } from "./router.mjs";
export const app = 1;
await loadFoo();              // awaits import() that is *textually* in router.mjs

// foo.mjs
import { app } from "./app.mjs";   // back-edge to the awaiter
console.log(app);

Step-by-step trace:

  1. app.evaluate() DFS visits router (static dep). router's body runs synchronously → router Evaluated.
  2. app's body runs (app=1), calls loadFoo(). The import() expression executes with sourceOrigin = router.mjs — per ES spec, HostLoadImportedModule's referrer is the Script/Module Record where the import() CallExpression is lexically located (this is what makes () => import('./x') resolve ./x relative to where the arrow is defined, not where it's called). JSC captures this from the containing SourceProvider's origin. moduleLoaderImportModule at lines 3559-3560 computes referrerKey = /abs/router.mjs. requestImportModule resolves router's CyclicModuleRecord (router IS in the registry — it was loaded as a regular ESM static dep) and stashes it. app awaits the returned promise → app EvaluatingAsync, pendingAsync=0.
  3. foo's load settles. dynamicImportLoadSettled pushes router → set={router}. Calls foo.evaluate().
  4. foo's DFS visits app. Conditions: (1) app was already EvaluatingAsync ✓; (2) app.asyncOrder < foo's watermark ✓; (3) app.pendingAsync==0 ✓; (4) app ∈ {router}? NO → spec wait. foo.pendingAsyncDependencies=1. Pop router.

Cycle: foo waits on app; app's body is suspended on loadFoo()'s promise; that promise resolves when foo → Evaluated. Deadlock.

Pre-PR: conditions (1)(2)(3) alone fire → skip → foo reads app=1 → completes → app resumes.

Why this isn't covered by the existing comment or shape (1): The depth-2 chain and the criss-cross both assume import() is textually inside the module whose TLA body awaits it; their root cause is the push/pop bracket being too narrow (set-lifetime). Shape (2) is a set-identity bug: the wrong module is pushed in the first place. The PR description defines the initiator as "the CyclicModuleRecord whose JS body is awaiting the current import's result", but sourceOrigin does not give you that; it gives you where import() was written. Critically, fix (a) does NOT fix this: router would stay pushed until the promise settles, but app is never pushed at all. So shape (2) rules out fix (a) as a complete solution, just as shape (1) rules out fix (b).

Realistic? Route-table loaders (export const routes = {'/x': () => import('./x.mjs')}), React.lazy-style helpers in a shared util module, barrel re-export thunks, and any "one file owns the import map" pattern. The back-edge shape is identical to the existing Nitro tests, just with one level of function indirection.

Why existing safeguards don't catch either

The VM::hasPendingDynamicImport() coarse fallback (per the PR thread) fires only when the initiator is unresolvable — empty referrerKey or registry-lookup miss. In both shapes the initiators are file-backed ESM modules that resolve cleanly via the new bun-side referrerKey at lines 3555-3567, so the precise check is used and misses. Neither existing Nitro test exercises sibling back-edges or delegation: in both, await import(...) appears textually in the awaiter and the chunk back-edges to its own immediate initiator.

On the duplicate objection

One reviewer argued shape (1) is a duplicate of the existing depth-2 comment because both reduce to "the discriminator only matches the immediate initiator". The root cause is shared, and this is intentionally filed as an extension of that comment at the same line, not an independent finding. But the additions are not redundant: (i) shape (1) corrects the depth-2 framing's implication that depth-1 is safe; (ii) shape (1) demonstrates that one of the two fixes the earlier comment proposed is insufficient; (iii) shape (2) demonstrates that the other proposed fix is also insufficient, via a mechanism (lexical sourceOrigin ≠ awaiting module) that the earlier comment does not mention at all; (iv) both need distinct test fixtures. A reply that only said "here's another shape that breaks" would be noise; a reply that says "neither of your two proposed fixes is sufficient, here's why, and here's what the fix actually needs to track" is actionable.

Caveat (shared with the depth-2 comment)

All three shapes rest on the PR description's stated push/pop-around-evaluate() semantics; vendor/WebKit is not checked out here so oven-sh/WebKit#230's source cannot be inspected directly. If WebKit#230 happens to filter the resolved initiator by state (e.g. "only stash if EvaluatingAsync; otherwise treat as unresolved → coarse fallback"), router (Evaluated) would be dropped in shape (2) and the coarse gate would preserve pre-PR behavior — though a trivially modified repro where router has its own unrelated TLA would bypass any such filter. The PR description mentions no such filter, and the depth-2 comment at this line was posted under the same assumption. Either way, both shapes are worth pinning as test fixtures.

Suggested fix

On the WebKit side, the discriminator needs to track the set of all EvaluatingAsync modules with an outstanding dynamic-import promise — independent of which file the import() keyword sits in, and held until each promise settles (not just until evaluate() returns). Concretely: at await import() time, push the currently executing CyclicModuleRecord (the one whose async body is about to suspend — available from the running execution context, not from sourceOrigin); pop when the import's promise settles. That covers depth-2 (M stays pushed through U's evaluate()), criss-cross (midA and midB both pushed at step 3), and delegation (app pushed instead of router). On the bun side, add the 5-file criss-cross and 3-file delegated-import fixtures to dynamic-import-tla-cycle.test.ts alongside the suggested depth-2 test.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both traces are correct — shape (1) pins the gap at depth-1 rather than the earlier comment's implied depth-≥2, and shape (2) shows that lexical sourceOrigin is the wrong key for delegated () => import() helpers. Agreed that neither of the two fixes my prior design iteration considered (push-until-promise-settles / walk the pending chain) is complete on its own; a correct fix needs both a longer push lifetime AND execution-context-derived identity rather than sourceOrigin.

Per the reply on the parent thread: deliberately out of scope for this PR, which is narrowly fixing the #30651 TDZ read for parallel dynamic imports of a shared TLA dep. Shape (2) in particular (router has () => import(), app awaits it, foo back-edges to app) is a realistic framework pattern and worth prioritizing on the follow-up. Tracking both as regression fixtures for that follow-up.

Comment thread src/jsc/bindings/ZigGlobalObject.cpp Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 src/jsc/bindings/ZigGlobalObject.cpp:3555-3569 — 🟡 (pre-existing, follow-up note) The false-positive dual of the delegated-import case above: a fire-and-forget import() still pushes the lexical caller, so a 2-file repro — M does import('./T.mjs') (NOT awaited) then await setTimeout(50); export const x = 42; T statically imports x from M — has T read x in TDZ (Node prints 42, since M never awaits T so the spec wait terminates). Not a regression (the 3-condition pre-PR skip fires identically), but worth noting because it constrains the follow-up redesign: even "push the currently-executing module, pop on promise settle" would still push M here, so that fix additionally needs to verify the initiator's suspension point is the import's continuation. At minimum the PR description's "the CyclicModuleRecord whose JS body is awaiting the current import's result" should say "lexical caller" — what's actually pushed may not be awaiting at all.

    Extended reasoning...

    What this is

    The PR description defines the initiator as "the CyclicModuleRecord whose JS body is awaiting the current import's result", but what referrerKey (lines 3555–3567) actually identifies is the lexical caller — the module where the import() expression is textually written. The thread at this line already established that the lexical caller can be a module that is not the awaiter (delegated import → false negative → new deadlock). This is the dual: the lexical caller can be a module that is pushed and matched but is not awaiting the result at all — fire-and-forget — so condition (4) fires when the spec wait would have been safe (false positive → pre-existing TDZ).

    Step-by-step proof

    // M.mjs (entry)
    import("./T.mjs");                          // fire-and-forget, NOT awaited
    await new Promise(r => setTimeout(r, 50));   // M suspends here, unrelated to T
    export const x = 42;
    
    // T.mjs
    import { x } from "./M.mjs";
    console.log(x);
    1. M.evaluate(): body runs, import(T) calls moduleLoaderImportModule with sourceOrigin = M; lines 3559–3560 compute referrerKey = /abs/M.mjs; requestImportModule finds M's record and stashes it. M then hits await setTimeout(50)EvaluatingAsync, pendingAsyncDependencies = 0.
    2. T's load settles (~1 ms file I/O, well before M's 50 ms timer). dynamicImportLoadSettled pushes M → set = {M}. Calls T.evaluate().
    3. T's DFS visits M. The four conditions at 11.c.v: (1) depWasAlreadyEvaluatingAsync ✓; (2) M.asyncEvaluationOrder < T's watermark ✓ (M's order assigned in the prior Evaluate() pass); (3) M.pendingAsync == 0 ✓; (4) vm.isModuleAwaitingDynamicImport(M)M ∈ {M} ✓. All four hold → skip the wait.
    4. T's body runs immediately, reads xTDZ ReferenceError.

    Spec / Node: T waits on M; M's 50 ms timer fires; x = 42; M → Evaluated; T resumes and prints 42. No deadlock — M never awaits import(T)'s promise, so its progress is independent of T.

    Relationship to the existing comments at this line

    Comments #3256858258 / #3297797470 (depth-2, criss-cross, delegated import) are all false negatives on condition (4): the dep ≠ what was pushed → spec wait → new deadlock regression. Comments #3243246404 / #3292821227 are referrer-resolution misses: registry lookup fails → coarse fallback. This is the only false positive: referrer resolves correctly, condition (4) matches, but the match is semantically wrong because lexical-caller ≠ awaiter (here, there is no awaiter). Together with the delegated-import shape, this shows the lexical caller is neither necessary nor sufficient for "module whose completion depends on this import settling".

    Addressing the duplicate/out-of-scope objection

    One reviewer argued this is (a) pre-existing by construction since condition (4) is conjunctive, (b) substantively the same insight as the delegated-import comment ("sourceOrigin identifies lexical caller, not awaiter"), and (c) indistinguishable from the intentional Nitro skip at the discriminator's observable granularity. All three are correct, and (a)+(c) are exactly why this is filed as a nit rather than a blocker — at import() call time it's unknowable whether the promise will be awaited at top level, so this is the precision ceiling of any lexical-initiator approximation and not fixable within this PR's scope.

    What's not already covered by the thread, and the reason this is worth noting for the follow-up robobun scoped at #3297869824/#3297870655: the proposed redesign there — "push the currently executing CyclicModuleRecord (execution-context-derived, not sourceOrigin), pop when the import's promise settles" — would still push M here. M is the currently-executing module when import(T) fires, and import(T)'s promise is outstanding when T's DFS visits M. So that redesign fixes depth-2, criss-cross, and delegated-import, but still false-positives on fire-and-forget. Closing this hole requires the follow-up to additionally verify that the initiator's suspension point is the import's continuation (e.g. by inspecting the promise reaction the module body is parked on), not just that it called import() and the promise hasn't settled.

    Why nothing prevents it

    The push/pop bracket in dynamicImportLoadSettled keys purely on who called import(), not who awaits the result. The 3-condition pre-PR skip also fires here (same trace minus condition 4), so the PR's narrowing simply doesn't reach it — from condition (4)'s perspective, "T's static back-edge target IS T's dynamic-import lexical initiator" is exactly the Nitro depth-1 shape it intentionally preserves the skip for.

    Impact / fix

    Nit — pre-existing (not a regression), narrower pattern than the user-reported #30651/#22367 shapes, and the author has already scoped the redesign as a follow-up. Actionable bits: (1) tighten the PR description's definition of "initiator" from "module awaiting the result" to "lexical caller"; (2) carry this 2-file repro forward as a fixture for the follow-up so the redesign's test matrix includes a case where "push currently-executing module until promise settles" is insufficient on its own.

Comment thread src/runtime/bake/BakeGlobalObject.cpp Outdated
Comment thread src/runtime/bake/BakeGlobalObject.cpp Outdated
msywulak added a commit to AtlasDevHQ/atlas that referenced this pull request May 26, 2026
…n 1.4.0 GA (#2827)

Re-diagnosed the slice 6 blocker as a 1.3.14 TLA-under-`--isolate` TDZ
regression (not a `mock.module()` issue). Filed oven-sh/bun#31410,
bisected to `73e8889f8c` (WebKit module-loader rewrite, oven-sh/bun#29393),
verified fixed on bun canary `1.4.0-canary.1+0974d031c` via
oven-sh/bun#30656 + oven-sh/WebKit#230.

Slice 6 cutover (#2802) now waits on bun 1.4.0 GA so the engine pin lifts
in the same PR. Wall-clock plan posted on #2802: keep the 4-shard matrix
(collapsing is slower than the current 95s slowest-shard critical path),
add `--changed=origin/main` for PR runs (~95s → ~20s), duration-balanced
shard partitioning, native `--parallel` worker pool.

Refs #2811 (closed), #2802, #2796.
@robobun robobun force-pushed the farm/7009659f/fix-tla-cross-evaluate-dynamic-import branch from 0b852db to c99a278 Compare May 26, 2026 14:55
@robobun

robobun commented May 26, 2026

Copy link
Copy Markdown
Collaborator Author

Rebased onto latest main (resolved the WEBKIT_VERSION conflict — main had advanced its WebKit to 963f8758 + the Windows cross-compile work in #239).

  • WebKit PR #230 rebased onto the new WebKit main (clean, 2 commits) → new preview tarball autobuild-preview-pr-230-d8ec41d5 published (42 assets).
  • WEBKIT_VERSION in this PR points at that tag.
  • All 6 TLA tests pass locally (debug ASAN) against the rebased WebKit.
  • test/bake/dev/production.test.ts 8/8 (the Bake referrer plumbing stays reverted — it is documentation-only vs main).

Earlier build-cpp 404s were the tarball not being published yet (a transient Ubuntu-24.04.4 runner-image failure on WebKit's FreeBSD Docker step delayed it; re-run cleared it). Fresh CI build triggered now that the tarball is live.

@robobun

robobun commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

Rebased again onto latest main (main re-advanced its WebKit to 6d586e29 with several JSC PRs: #240, #247, #242, #244, #241).

  • WebKit PR #230 cherry-picked onto the new WebKit main → preview tarball autobuild-preview-pr-230-c7f29140 published (43 assets).
  • WEBKIT_VERSION points at the new tag.
  • All 6 TLA tests pass locally (debug ASAN) against the re-rebased WebKit.

Only conflict both times was WEBKIT_VERSION (expected — this PR tracks a WebKit preview that has to be re-pinned whenever main bumps WebKit). Fresh CI build triggered now that the tarball is live.

@robobun

robobun commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

CI build #60104 (after re-rolling the expired darwin agents — those now pass): 70 checks green including all build-cpp. The 2 remaining failures are unrelated flake, different from the previous build's:

  1. test/js/third_party/@duckdb/node-api/duckdb.test.tsSIGILL (illegal instruction) segfault in the DuckDB native addon. DuckDB is loaded via CJS require + process.dlopen (a native .node binary), not ESM dynamic import — this PR only touches the ESM dynamic-import referrer path, so it cannot affect it. The SIGILL is a CPU-instruction/native-binary mismatch on the runner.
  2. test/js/bun/http/serve-body-leak.test.ts — memory-leak threshold, flagged flaky (1 retry), end_memory 542MB vs 512MB limit.

Both are physically disconnected from the diff (module loader + TLA test). All 6 TLA tests pass locally in debug+release against the re-rebased WebKit c7f29140. Already spent my one re-roll on the expired darwin lanes. Ready for a maintainer to merge.

schickling-assistant added a commit to overengineeringstudio/effect-utils that referenced this pull request Jun 16, 2026
…around

The umbrella TDZ crash (#787) is a confirmed upstream Bun bug — oven-sh/bun#30634
(TDZ on a re-exported const read during concurrent dynamic import; Node-fine;
fix in flight, PR oven-sh/bun#30656). Document it at the concurrency trigger in
cli.ts with a TODO to revert the lazy get*App() workaround once a Bun version
with the fix is pinned, and cross-reference from the renderer accessor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
schickling pushed a commit to overengineeringstudio/effect-utils that referenced this pull request Jun 16, 2026
* docs(notion-md): VRS for $EDITOR-based editing — refuse-lossy, one engine

Design the plugin-free, Unix-style $EDITOR two-way Notion<->Markdown editing
surface and its notion-cli umbrella integration, captured as VRS.

- cat/put: stateless stdin/stdout body pipes (gateway-only; cat --frontmatter
  is a read-only envelope dump; no stateless property write).
- edit: an ephemeral file-engine session — pull into a $TMPDIR .nmd, body-only
  splice, push via the existing sync engine, clean up. Not a second push path.
- Refuse-lossy is uniform across cat/put/edit and file sync: a page whose body
  contains a block that does not round-trip as Markdown (child_database, toc,
  synced_block, child_page-in-body, unsupported) is refused at the pull. R38
  (sound fidelity classifier) is the blocking prerequisite — and a fix for a
  pre-existing file-path silent-data-loss defect proven by live testing.
- Decisions 0001-0017 record the path: the block-reconciler/converter edifice
  (0005/0010/0011/0014/0015) and the stateless schema fingerprint (0013) were
  superseded once the platform ceiling and engine reuse were understood.

Adds glossary, impl-delta, decision records; updates vision/requirements/spec/
experiments and the notion-cli docs. Design only; not yet implemented.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(notion-md): refuse pages with not-round-trip-safe body blocks (R38, #785)

The body-fidelity classifier only flagged API-`unsupported` blocks, so
`child_database`, `table_of_contents`, `synced_block`, `child_page`-in-body,
and degraded `bookmark`/`embed`/`link_preview`/`breadcrumb`/`link_to_page`
classified `complete`. These render to Markdown that Notion re-parses as a
plain paragraph on push, so editing an unrelated paragraph silently destroyed
the untouched block (live-proven; affected file `sync`, not just the planned
editor).

Extend `classifyBodyCompleteness` (notion-core, pure) to flag a curated set of
known not-round-trip-safe block types and surface them in the lossy verdict.
The shared refusal gate (`assertRemoteMarkdownComplete`) now refuses such pages
at the pull on every surface (`cat`/`put`/`edit`/file `sync`/tree) with a
message naming the block class and pointing to the Notion UI.

`child_page` is dual-role: a child page that is a tree node (its own `.nmd`
file) is tolerated via `tolerateTreeChildPages` on the tree path while still
refused as a single page's body block; any other lossy block on the same tree
node is still refused. Hosted/external media stays representable.

Criterion is a type set (not a live reparse) on purpose: notion-core is pure
and the endpoint vs independent renderings agree, so the suffix heuristic can't
catch these. Covered by classifier/gate unit tests, fake-gateway refusal +
tree-tolerance tests, and live E2E (lossy page refused at pull, representable
page still round-trips).

Closes #785

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(notion-md): canonicalize hosted-media URLs everywhere bodies are hashed/gated (R36, #785)

Notion-hosted media (type:"file") renders with an expiring signed S3 URL whose
X-Amz-*/signature/Expires query params rotate on every pull, making the rendered
body hash volatile (breaking cat->put idempotence, staling base hashes with zero
edits) and causing update_content/replace_content pushes on media pages to be
rejected by the post-push semanticEquivalent gate (decision 0007).

Add a shared canonicalizeMediaUrl / canonicalizeMediaUrlsInMarkdown in
notion-effect-client that strips only the volatile signature/expiry query-param
family by name, keeping origin + path + any benign params. The renderer applies
it in getBlockUrl's Notion-hosted file.url branch so pull/cat output is
deterministic; canonical-markdown.ts applies the identical function inside
canonicalizeBlockMarkdown / semanticEquivalent so the hash path and the gate path
use the exact same canonical form. External (stable) URLs are left untouched.

Verified with deterministic unit tests (signed->canonical, rotated-signature
equality, external-with-query untouched) and live E2E on real Notion: a
hosted+external-media page's body hash is stable across two no-op pulls and a
no-op push is not rejected by the post-push gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(notion-md): editor surfaces cat/put/edit (VRS Group A, R30-R39)

Add the `cat` / `put` / `edit` Notion-as-Markdown editor surfaces from the
notion-md VRS ("Editor Surfaces"; impl-delta Group A). Builds on Group C / R38
(the fidelity classifier + uniform pull-gate refusal); reuses the body facade
and the file `sync` engine rather than adding a second push path (decision 0017).

- `cat <page> [--frontmatter]`: default-mode `# <title>` + body to stdout
  (byte-exact), the title+body base hash to stderr (decisions 0001/0002/0006);
  `--frontmatter` dumps the read-only `.nmd` envelope. Refuses a lossy page
  (exit 3) at observe time.
- `put <page> (--base-hash <h> | --force)`: body+title from stdin as two writes
  body-first (decision 0012, exit 10 on partial); guarded by default (exit 7 on
  drift); `--force` concurrency-only (decision 0009) via an unconditional
  last-writer-wins body replace; missing title-H1 -> exit 5. No `--frontmatter`
  write.
- `edit <page> [--frontmatter]`: ephemeral file-engine session (decision 0017) —
  `mktemp -d` under `$TMPDIR`, `pullPage`, body-only splice in
  `$VISUAL`/`$EDITOR`/`vi`, `syncPage` with a forced full-body `replace_content`,
  relocate any `.conflict.roughdraft.md` to a durable `<page>.conflict.md`, and
  scope-clean the temp tree on every path; non-zero editor exit -> exit 8;
  unchanged buffer -> no-op.

Shared title<->H1 splice helper with exact-byte round-trip (idempotent cat->put
fixpoint); `<page>` resolved via `parseNotionUuid` (id/dashed/URL), unresolvable
-> exit 4. Distinct tagged errors mapped to exit codes 0/1/3/4/5/7/8/9/10 at the
`runMain` teardown (after finalizers close, so `edit` cleanup is safe). Commands
wired under `notion-md cat|put|edit` and `notion md cat|put|edit`. OTEL spans
`notion-md.cat`/`put`; `edit` wraps the engine spans.

Hosted-media URL canonicalization (Group B) and `edit --frontmatter` schema-drift
(exit 6, Group F) are deferred; the editor serves the representable non-media
majority and tests on non-media pages.

Covered by splice/command unit tests, a fake-gateway `edit` e2e (round-trip,
no-op, editor-abort exit 8, lossy refusal exit 3), and live Notion E2E
(cat->put fixpoint, guarded conflict exit 7, real `$EDITOR` spawn round-trip,
lossy refusal exit 3, CLI exit-code mapping 0/3/4/5/7).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(notion-md): schema-drift refusal via engine schema_snapshot (Group F, R14)

Implement impl-delta Group F / decision 0017: detect data-source schema drift
before a property write through the file engine's sidecar schema_snapshot,
instead of the deleted stateless in-buffer fingerprint (decision 0013).

For a data-source-backed page, `pullPage` retrieves the parent data source
(`GET /v1/data_sources/{id}` via `page.parent.data_source_id`) and captures the
writable property schema into the sidecar `data_source` binding as a canonical
projection: `{ name, type, sorted option names }` sorted by property name,
options only for select/multi_select/status, hashing names not ids (a rename is
id-preserving), excluding ids/colors/descriptions/status-groups/timestamps/
created_by/last_edited_by/request_id/computed properties. Before any property
write the engine re-retrieves the live schema, recomputes the hash, and on drift
refuses with `NmdSchemaDriftError` (exit 6) — distinct from the exit-7 conflict
and not `--force`-able; resolve by re-pulling. Standalone pages have no snapshot
and skip the check. This is the path `edit --frontmatter` reuses (no
`put --frontmatter`, no parallel streaming subsystem).

- Add `NmdSchemaDriftError` (errors.ts, exit-codes.ts maps it to 6).
- Add `schema-snapshot.ts`: writable-only canonical projection + hash + binding
  field projections.
- Add `retrieveDataSource` to the gateway (model.ts interface, live.ts impl,
  observability span); thread the captured binding through `buildSyncState` at
  all three pull/persist sites; assert-unchanged before all three property
  writes in `pushGuarded`.
- Re-export `NmdSchemaDriftError` from mod.ts.

Tests: deterministic projection/drift unit tests (benign color-only stable; the
five structural mutations trip) + fake-gateway engine tests (drift refusal +
benign round-trip). notion-md 170 tests green. Downstream notion-datasource-sync
fake gateways gain the new method stub. Live E2E on real Notion confirmed: a
row's page.parent decodes as data_source_id, the sidecar binding is captured
non-null, a benign property push round-trips, and after a live structural schema
mutation a property push refuses with exit 6 without writing.

Refs VRS requirements R14, spec "Guard plumbing" / Push Flow step 8, decision
0017 (supersedes 0013), impl-delta Group F.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(notion-md,notion-cli): Group G observability tests + notion edit umbrella alias

Group G (R21-R24, R29): span-assertion tests for the editor surface via the
in-process otelite capture bridge. Drive the real instrumented cat/put/edit
paths against a fake gateway and assert the actually-emitted span shape
(notion-md.cat/put/edit roots with their notion_md.* attributes; edit wraps the
engine sync-page/status-page/push-page children, decision 0017). R24 leak guard
pushes a sentinel body through the gateway and asserts no span attribute carries
the body, a signed-URL marker, or a Bearer token. Add the previously-untested
edit conflict-relocation path: an unmergeable concurrent same-line remote change
relocates the roughdraft out of $TMPDIR to a durable <page>.conflict.md.

The spec span table's nmd.* attribute shorthand was never implemented; tests
assert the real notion_md.* keys (divergence documented for separate
reconciliation, not papered over).

notion-cli umbrella (R18, decision 0004): wire the top-level `notion edit <page>`
marquee alias delegating to the same engine-backed session as `notion md edit`
via a shared command factory. cat/put/edit remain composed through the existing
md dispatch. Verified at the type/dispatch level and live through the standalone
notion-md binary; the umbrella runtime is blocked by a pre-existing tui-react
TDZ at startup (#787), unrelated to this work.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(notion-md): reconcile OTEL span table with implemented notion_md.* attrs

The span table used an nmd.* shorthand that was never implemented; align it with
the actual notion_md.* attribute keys emitted by cat/put/edit (Group G), and note
that result/changed/partial_write attrs are a follow-up.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(notion-cli): build renderer TUI apps lazily to fix umbrella TDZ crash (#787)

The standalone `notion` binary crashed at startup for every command with
`ReferenceError: Cannot access 'createTuiApp' before initialization`.

`runRootCli` (cli.ts) imports the three command trees concurrently via
`Promise.all`, and each schema/db renderer's `app.ts` built its `*App` by
calling `createTuiApp(...)` as a module-load side-effect. Under Bun's
concurrent async module evaluation that top-level call reached the shared
`@overeng/tui-react` graph while it was still mid-initialization, leaving a
re-exported binding (createTuiApp, then its body's createInterruptedAction)
in the temporal dead zone.

Root cause is the module-load side-effect, not a barrel export-order bug and
not a circular import: the only barrel self-imports are JSDoc, and converting
`createTuiApp` to a hoisted function merely relocated the crash to the next
top-level const.

Fix: the five renderer `app.ts` modules now construct their app lazily via a
memoized `get*App()` accessor instead of at module top level, so no
`createTuiApp(...)` runs during import. Memoization preserves the previous
single-instance (one registry/atom set) semantics. Confined to notion-cli;
shared `@overeng/tui-react` is untouched.

Regression test (src/concurrent-import.unit.test.ts) spawns the umbrella's
concurrent Promise.all import path under Bun (the binary's runtime) and
asserts no TDZ crash. Verified RED on pre-fix code, GREEN after.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(notion-md): surface integration token fingerprint on gateway errors

Add a log-safe Notion token fingerprint so a user can tell *which*
integration token is active when a gateway call fails (e.g. a secrets-run
token resolving to a different integration than expected).

- notion-effect-client: add `notionTokenFingerprint(token)` formatted
  `<scheme>…#<8hex>` where `<scheme>` is the public token-type prefix (up to
  and incl. the first `_`) and `<8hex>` is the first 8 hex of sha256(token).
  Empty token → `<none>`; no-`_` token → empty scheme. Emits zero secret
  bytes beyond the scheme prefix. Exported from `mod.ts`.
- notion-md: `NmdGatewayError` gains a `token_fingerprint` field; the live
  gateway computes the fingerprint once and threads it into every error
  construction, appending ` [integration token <fp>]` to the message.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(notion-cli): link upstream bun#30634 for the #787 lazy-init workaround

The umbrella TDZ crash (#787) is a confirmed upstream Bun bug — oven-sh/bun#30634
(TDZ on a re-exported const read during concurrent dynamic import; Node-fine;
fix in flight, PR oven-sh/bun#30656). Document it at the concurrency trigger in
cli.ts with a TODO to revert the lazy get*App() workaround once a Bun version
with the fix is pinned, and cross-reference from the renderer accessor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(notion-md): restructure VRS into layered subsystems + sync-progress design

Restructure the @overeng/notion-md VRS into six numeric layered subsystem dirs
(01-editor … 06-data-source), mirroring the restate-effect template. The
top-level spec.md becomes a thin architecture index (Status, Scope, system-shape
+ dependency diagram, subsystem-map table, and the cross-cutting OpenTelemetry /
Verification / residual long-term-decision lists). The §-section bodies moved
into each subsystem's spec.md, and the requirement bullets distributed into each
subsystem's requirements.md PRESERVING the global IDs (R01–R41 / A01–A05 /
T01–T08 — one ID per subsystem, never renumbered, so every cross-reference still
resolves; R42 stays retired). Root requirements.md keeps only the cross-cutting
surface-boundary / Effect-native / observability / verification requirements +
the global Assumptions/Tradeoffs; vision.md + glossary.md stay whole at root
(inherited downward). decisions/ renamed to .decisions/ (dot-prefixed per
/sk-vrs; git mv preserves history, 0001–0017 unchanged).

Also add the write-path sync-progress indicator design: a discrete-stage
@overeng/tui-react TaskList (observe → write-body → write-title → settle), not a
fake %, driven by a no-op-default ProgressReporter Effect service the engine
emits purpose-tagged stage events to; rendered to stderr gated on
process.stderr.isTTY (cat's stdout stays pure), TUI app built lazily in the
handler to dodge the #787 TDZ, write path only. New requirements R43–R45 in
01-editor, a spec section, decision record 0018, impl-delta Group H. #788 is the
complementary redundant-pull perf lever.

Docs-only; no library surface change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(notion-md): add edit --read-only requirement (R46) to 01-editor VRS

R46: edit --read-only pulls + presents the page in $EDITOR but never pushes
(discard edits, clean up, stderr note); composes with --frontmatter, rejects
--read-only --force. Spec section + impl-delta Group A item. The implementation
is in flight separately.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(notion-md): add `edit --read-only` inspection-only editor session

Open a Notion page in $VISUAL/$EDITOR/vi for inspection only (the terminal
analogue of `vim -R` / `git show`): present the body exactly like `edit`, but
on editor exit never push and never write anything remote, discarding any edits.

- New `editReadOnlyPage` with a deliberately lighter path than `edit`: a single
  observe/pull into a scoped $TMPDIR temp file (no engine round-trip, no
  NmdStateStore in its requirement type), then run the editor and discard. Its
  exit code is ignored entirely — a non-zero exit is a clean no-op, never an
  exit-8 abort, since nothing is pushed. Keeps the SAFE default: the same lossy
  refusal (exit 3) as `cat`/`edit` at observe time. No base-hash/guard machinery
  (nothing is written), and no `--force` interaction to resolve because `edit`
  exposes no `--force` (force lives on `put`/`sync`).
- Extract a shared `projectPageBuffer` projection (default `# title` + body, or
  the full `.nmd` envelope with `--frontmatter`) reused by `cat` and the
  read-only path so the lossy-refusal logic is single-sourced. The change to
  `catEditorPage` is a behavior-preserving refactor — its exact-byte stdout and
  `base-hash:` stderr output are unchanged.
- Wire `--read-only` through the shared `edit` command factory so `notion-md
  edit`, `notion md edit`, and the top-level `notion edit` alias all gain it.
- Emit `notion_md.edit.outcome=read-only` on the existing `notion-md.edit` span.

Covered by fake-gateway e2e tests (write paths `dieMessage` so any push crashes
the test, edits discarded, temp tree reaped, `--read-only --frontmatter`,
non-zero editor exit, lossy refusal) plus a span-shape test. 181/181 pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(notion-md): rebalance 03-sync-engine to own leaked push-engine mechanics

03-sync-engine is the only genuine boundary smell in the 6-subsystem VRS:
thin in both requirements and spec because its engine mechanics leaked into
its consumers. Move that prose home (Proposal A — rebalance in place, no
directory moves, no requirement-ID renumbering):

- 03-sync-engine/spec.md now authoritatively owns: update_content vs
  replace_content write-verb selection, the canonical base (base = the value
  the first pull emitted, never recomputed locally), the post-push
  semanticEquivalent gate, and settle/re-observe/re-pull.
- 02-file-sync/spec.md keeps the file-surface pull/status/push orchestration
  and now delegates engine internals to 03 via cites (push-flow steps 10-11).
- 04-fidelity/spec.md keeps refuse-lossy / classifier / hosted-media; its
  "Push Strategy and Canonical Base" engine prose moved to 03, leaving a cite
  where fidelity intersects the push. The put-specific two-write title order
  stays in 01-editor.
- Root spec.md: note that observability (OTEL), verification, and the
  Effect-services overview stay cross-cutting at root by design (a future
  07-observability is a possible follow-up); subsystem-index 03 row + system
  diagram updated for the engine's expanded role.

Global requirement IDs unchanged (R01-R46, R42 retired); before == after.
All internal markdown links + anchors resolve; oxfmt clean and idempotent.
Docs-only — no library surface change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(notion-md): fix stale cross-cutting anchor links in 01-editor

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(notion-md): lift leaked impl detail out of VRS requirements into specs

Four requirements named internal symbols / impl strategies (HOW) instead of
stating testable constraints (WHAT). The mechanism already lived in each owning
subsystem spec, so this is a pure altitude raise + spec cross-reference; no
constraint is dropped or weakened and global IDs are unchanged (R01-R46).

- R45 (01-editor): drop the ProgressReporter / no-op-default-Layer /
  TaskList-Layer / lazy-createTuiApp / #787-TDZ mechanism. New constraint:
  progress instrumentation must be behavior-neutral and zero-cost when
  non-interactive (write path byte-identical with the indicator on or off) and
  must not reintroduce the umbrella's startup-crash class (#787). Mechanism in
  01-editor/spec.md.
- R31 (04-fidelity): drop the internal replaceRemoteBodyVerified symbol; keep
  the guarded-verified-replace + typed-title two-write-body-first constraint.
- R40 (04-fidelity): drop the internal markdownToBlocks/parseInlineMarkdown
  names; keep "no client-side Markdown->block reconstruction" + the loss list.
- R14 (06-data-source): drop the schema_snapshot-comparison mechanism; keep the
  refuse/explicit-accept-on-drift, non---force-able exit code, resolve-by-repull.

Notion-platform vocabulary (replace_content), block-type names, and observable
behaviors (stderr/isTTY/exit codes) intentionally kept as testable WHAT. All
internal links resolve. Docs-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(notion-md): force list tightness in canonical body (spread:false)

Add `forceTightLists` (sets `spread = false` on every list/listItem) to
`canonicalizeBlockMarkdown` so the one canonical body form emits tight lists
regardless of how the renderer joined siblings, and fold the line-ending
normalize into its documented input path. Pure addition: no call site reads
the canonical form differently yet (push already canonicalizes; pull does not
until the next step). Lock the behavior with unit tests — loose→tight,
pre-paragraph blank preserved, headings blank-separated, nested-list stray
indented blank removed, and local idempotency.

Refs the body/markdown pipeline consolidation (decision 0018).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(notion-md): route pull receive through the canonical body form

`remoteMarkdownFromBodyObservation` now canonicalizes the rendered body
(`canonicalizeBlockMarkdown`) instead of only line-ending-normalizing it, so
pull and push agree by construction: the body `cat`/`edit`/file-sync read, the
body hashed/baselined, and the body pushed are the same canonical bytes. This
is the behavioral flip that removes the loose-vs-tight two-oracle divergence.
Lock it with a list-bearing pull case (loose render → tight body, trailing
paragraph blank preserved).

Refs the body/markdown pipeline consolidation (decision 0018).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(notion-md): make pull body total, drop dead endpoint fallback

`observeFromSnapshots` always renders the block tree into `renderedMarkdown`,
so it is total on the pull path. Remove the silent `?? endpoint` fallback (the
latent symptom-2 trap that runs headings together and drops inter-block blank
lines) and the redundant `renderedMarkdown === undefined → lossy` override —
the classifier already owns completeness. A missing rendered body is now an
invariant defect rather than a recoverable state. Lock it with a headings-never-
run-together case and a defect-throw case.

Refs the body/markdown pipeline consolidation (decision 0018).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(notion-effect-client): delete vestigial markdownToBlocks converter

`markdownToBlocks` and its helpers (`parseInlineMarkdown`, `parseMarkdownTable`,
`parseTableRow`, `isTableSeparator`) were a lossy Markdown→block converter with
zero runtime callers: under the refuse-lossy / one-engine design (decisions
0015/0016/0017) push sends a raw Markdown string to Notion's server-side
`/markdown` endpoint, which parses it — this client-side converter is never on
any path. Remove the functions, the `NotionMarkdown.markdownToBlocks` member,
the `mod.ts` re-exports, and the converter test block. We are the only consumer,
so this is a clean dead-code removal (no deprecation, no version bump).

Refs the body/markdown pipeline consolidation (decision 0018).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(notion-md): dedupe stripChildAnchors onto one definition

sync.ts and tree.ts each defined `stripChildAnchors`, and they diverged: the
sync copy filtered anchor lines only, while the tree copy also line-ending
normalized and collapsed the blank-line gap an anchor leaves between blocks
(`\n{3,}` → `\n\n`). For an anchor sitting between two blocks the two produced
different output for the same input. Unify on the richer (tree) behavior in
`hash.ts`: the sync consumers only feed the result into the whitespace-
insensitive `semanticEquivalent`, so adding the collapse is invariant there,
and the tree baseline path already used the rich form. Lock it with unit tests
for the previously divergent gap-collapse and trailing-anchor cases.

Refs the body/markdown pipeline consolidation (decision 0018).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(notion-md): lock planMarkdownUpdate over a canonical base/remote

After routing pull through the canonical body form, base and remote bodies fed
to `planMarkdownUpdate` are already canonical (tight lists). Add a locking test
proving the 3-way update planner operates correctly in canonical space WITHOUT
canonicalizing the raw desired buffer: the emitted `oldStr`/`newStr` are raw
substrings Notion matches verbatim, and applying the plan to the canonical
remote reconstructs exactly the desired body (decision 0018 §3.3). merge.ts
itself needs no change — its normalization stays line-level; the canonical
agreement is established upstream at the pull boundary.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(notion-md): clarify editor-surface frames the canonical body verbatim

The title↔H1 frame consumes the one canonical pipeline by construction: pull
now routes every body through `canonicalizeBlockMarkdown` (media-canon / Group B
included), so the body arriving here is already canonical. Document that this
module must NOT re-canonicalize the title-framed buffer — the `# <title>` line
is presentation, not body Markdown, and re-parsing it would break the
load-bearing H1 round-trip. No behavioral change.

Refs the body/markdown pipeline consolidation (decision 0018).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* refactor(notion): move canonical body fn into notion-effect-client (Option 2)

Relocate `canonicalizeBlockMarkdown` + the remark/unified/unist deps down from
`@overeng/notion-md` into `@overeng/notion-effect-client`, beside the renderer
(`treeToMarkdown`) and the media-URL canonicalizer it already calls. The move is
import-direction-legal (notion-effect-client does not depend on notion-md, no
cycle) and removes the dep duplication (remark leaves notion-md entirely).

`observeFromSnapshots` now canonicalizes the rendered body once, at the source,
before it flows into the inventory, the fidelity classifier, and the evidence
fingerprint — so the evidence, the classifier, pull, hash, and push all see the
same canonical bytes ("agree by construction", decision 0018). The pull choke
point (`remoteMarkdownFromBodyObservation`) therefore projects an already-
canonical body through verbatim instead of re-canonicalizing it.

`semanticEquivalent` stays in notion-md (it is sync policy, not the wire form)
and re-imports the canonical function. package.json regenerated via genie;
body-fidelity/body-observation re-baselined for the trailing-newline canonical
form (verdicts unchanged).

Refs the body/markdown pipeline consolidation (decision 0018).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(notion-md): record canonical-body decision 0019, re-baseline demo

Add decision record 0019 (canonical body is one function applied at both wire
boundaries; renderer emits parseable-not-canonical Markdown; spacing/tightness
policy lives only in the canonical layer; renderer joins must not be made
type-aware; with the deliberate non-changes — line-ending normalize stays a
sub-step, notion-core's pure normalizer stays). Add the 04-fidelity "Canonical
Body Form" spec note, update the hash.ts doc (canonicalization now lives in
notion-effect-client and is applied at pull receive too), and a CHANGELOG
[Unreleased] entry. Re-baseline demo/showcase.nmd to the canonical shape
(headings blank-separated, lists tight, enhanced-markdown tags preserved;
verified idempotent). Also fix the decision number (0018 → 0019; 0018 is the
staged-task-list record) and a stacked duplicate comment in live.ts.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(nix): refresh pnpm-deps FOD hashes after the canonical-body dep move

Moving the remark/unified/unist deps from notion-md into notion-effect-client
rewrote the root pnpm-lock.yaml, which is an input to two pnpm-deps fixed-output
derivations (the oxc-config oxlint plugin and the notion-cli bundle). Reconcile
both stale `pnpmDepsHash`/`hash` values (oxc-config via the file's documented
procedure, notion-cli via the evergreen FOD-refresh workflow) so `devenv shell`
and `dt check:all` build green again. Nix-side completion of the Option 2 move.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs(notion-md): compact VRS decision log — delete 6 superseded records, redirect citations

The reconciler/placeholder design (0005 inline placeholders, 0010
visible-placeholder deletion, 0011 block-level reconciliation, 0014
reconciliation-as-universal-engine, 0015 renderer-symmetric converter) is
wholly superseded by decision 0016 (refuse lossy pages); the 0013 stateless
in-buffer schema fingerprint by decision 0017 (ephemeral file-engine session,
drift from the engine base snapshot).

Delete those six `.decisions/` records (rationale lives in experiments.md) and
redirect every citation to the superseding decision — as a live link where one
is cited, or as historical prose naming the superseded concept. Touched:
schema-snapshot.ts doc-comment (0013→0017), 01-editor / 04-fidelity /
06-data-source spec+requirements, experiments.md, README.md decision-range
label, and the surviving records 0003/0008/0009/0016/0017/0019. Tighten 0016/0017
to state the supersession once; trim the "Refined by 0017" blockquotes on
0003/0008/0009 to one line. No IDs renumbered (log: 19 → 13 records).

Zero links resolve to a deleted record; all internal markdown links resolve;
ts:check + lint:check green; oxfmt clean. Docs/comment-only — no library surface
change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* test(notion-md): pin the canonical-body two-oracle invariant + demo fixpoint

Fold in the two test-coverage items flagged as urgent for the body/Markdown
consolidation (decision 0019), in canonical-markdown.test.ts:

1. Two-oracle agreement on canonical inputs. The raw-hash oracle (sha256Digest,
   used by classifyPlan in tree.ts) and the canon-invariant semanticEquivalent
   must agree for canonical inputs (raw-hash-equal <=> semanticEquivalent-true)
   now that both wire boundaries route through canonicalizeBlockMarkdown. The
   referee is neither oracle (canonicalize-then-byte-equal): a curated seed set
   of raw-spelling variants that converge to identical canonical bytes, a seeded
   fast-check property over arbitrary bodies, and an idempotence premise. Guards
   the failure mode a future canonicalization change could silently introduce.

2. Golden-file fixpoint over demo/showcase.nmd. Asserts the committed demo body
   is already canonical (canonicalizeBlockMarkdown(demoBody) === demoBody) and
   idempotent, locking the consolidation re-baseline and catching future drift.

Test-only, no production change. Both invariants hold against real code (the
demo body is canonical; the oracles agree on canonical inputs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(nix): reconcile notion-md pnpm-deps FOD hash after canonical-body dep move

The dep-move commit (bbd358b) that shifted remark/unified/unist from
notion-md into notion-effect-client rewrote pnpm-lock.yaml, which invalidates
the notion-md-pnpm-deps FOD. The prior refresh commit (8f2327d) updated
notion-cli and oxc-config-plugin but missed notion-md.

Repaired via `evergreen fod reconcile` (rung 1: nix already printed `got:`):

  notion-md: sha256-2V8S6/... → sha256-hYDYbicQQZY0XUlzrJT2awHs1zv3FyqAc8HDoFY6IcY=

Verified: `nix build .#notion-md-pnpm-deps .#notion-cli-pnpm-deps .#oxc-config-plugin-pnpm-deps`
and `nix build .#notion-md .#notion-cli` all build clean with no FOD hash mismatch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(notion): address PR #786 codex review (media-url host gate, utils runtime dep, umbrella exit codes)

Three findings from the chatgpt-codex-connector review of #786:

- P1 (notion-effect-client/media-url.ts): canonicalizeMediaUrlsInMarkdown
  stripped X-Amz-*/signature/Expires params from ANY signed-looking Markdown
  URL by param name. Since canonicalizeBlockMarkdown now runs on both pull and
  push, an external private-S3 URL embedded by URL could be surfaced and then
  persisted with its load-bearing credentials stripped. The string path cannot
  see file vs external (unlike the renderer's getBlockUrl, which only touches
  type:"file"), so it now gates on a Notion-media host allowlist; non-Notion
  hosts are left untouched. canonicalizeMediaUrl stays host-agnostic.

- P2 (notion-effect-client/config.ts): sha256Hex (notionTokenFingerprint) is a
  runtime import of @overeng/utils, but the package listed it only as a
  dev/peer dependency. Moved utilsPkg into the generated runtime dependencies
  via package.json.genie.ts + genie:run. The regen also reconciled a
  pre-existing stale generated package.json (leftover external peerDependencies
  the current genie source no longer emits) that had drifted the lockfile and
  broken the frozen-lockfile Nix build.

- P2 (notion-cli/cli.ts): the umbrella `notion edit` alias ran under the
  default runMain teardown, collapsing tagged editor failures (3 lossy /
  6 schema-drift / 8 abort) to exit 1 and diverging from `notion-md edit`. The
  umbrella now wires the same editorExitCode teardown; safe for non-editor
  commands (falls back to 1 for unmapped failures, 0 on success).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(nix): reconcile pnpm-deps FOD hashes after lockfile change

All 6 pnpm-deps fixed-output derivations were stale after the
pnpm-lock.yaml update (removed phantom @effect/*/@playwright deps
from notion-effect-client importer + reclassified @overeng/utils).
Hashes updated from build-observed `got:` values.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(nix): reconcile oxc-config pnpm-deps FOD hash after lockfile change

The pnpm-lock.yaml change in 228cc0d (utils runtime dep) invalidated the
oxc-config plugin's pnpmDepsHash, but the reconcile pass in 8f4f19a only
covered the computed mk-pnpm-cli/bun-deps hashes and missed this hardcoded
literal. The stale FOD cascaded into every CI job that builds the devshell
(typecheck, lint, test, nix-check, nix-fod-check, integration suites).

Verified locally: all three exported pnpm FODs (genie/megarepo/oxc-config)
cold-rebuild clean and `dt nix:check` passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(notion-effect-client): declare inherited utils peer deps via genie install mode

notion-effect-client imports `@overeng/utils` at runtime (sha256Hex), and
utils carries peer dependencies (the @effect/* cluster + @playwright/test).
The committed manifest/lockfile lacked them, so genie validation failed
("Missing dep … peer dep of @overeng/utils") and downstream tsc could not
resolve modules — failures previously masked by the FOD hash mismatch on
PR #786.

Switch the genie composition to `mode: 'install'` so genie installs the
inherited peers explicitly (matching utils/notion-cli/tui-react). Regenerate
package.json and sync pnpm-lock. The lockfile change re-stales every
full-workspace pnpm-deps FOD, so reconcile all of them from build-observed
values: oxc-config plus the genie/megarepo/notion-cli/notion-md/tui-stories/
workflow-report CLI deps.

Verified locally against the committed tree: `dt check:quick`, `dt nix:check`,
`dt test:run`, the three exported nix-fod-check FODs, and all full package
builds pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* feat(notion-md): staged sync progress + drift notes for `edit` (R43–R45, decisions 0018/0020)

After the editor exits, `notion-md edit`'s push made ~8–10 Notion
round-trips emitting only OTEL spans — the terminal was silent and read
as a hang. Implement the already-specified staged write-path progress
indicator (R43–R45, decision 0018) and surface the guarded push's
existing remote-drift outcomes as visible stderr notes.

- progress.ts: a `ProgressReporter` Context.Tag render seam. The engine
  emits purpose-tagged stage transitions at the existing push call sites
  (observe → write-body → write-title[skip if unchanged] → settle); the
  emit helpers use `Effect.serviceOption` (zero engine R-churn) and
  swallow all failures AND defects (`Effect<void, never>`), so the path
  is byte-identical with the reporter absent/present/hostile (R45).
- `edit` wires a stderr-line renderer on the write path, gated on
  `process.stderr.isTTY`; non-TTY → no layer → silent (R44). The
  static-line rung is deliberate (decision 0018 sanctions it): `edit`
  returns from a full-screen editor that owned the TTY, so a mounting
  TaskList TUI would fight the terminal, and lines sidestep the #787
  module-load TDZ. The animated TaskList Layer is the same Tag's later
  drop-in (zero engine re-touch).
- "+ warn": reuse the guarded push's existing drift outcomes (no
  background poller, no extra last_edited_time pull) — auto-merge against
  a moved remote and exit-7 conflict each emit a visible stderr `note:`.
- Live two-way watch mode (push-on-save + live upstream reflection in the
  editor) was investigated and rejected: the editor owns the TTY (no live
  feedback) and the CLI cannot trigger an `$EDITOR` reload (no live
  reflection) — both empirically spiked. Recorded as decision 0020.

Scope: the `edit` path. `put`/file-`sync` emit through the engine but do
not yet wire a render Layer. Tests pin R45 neutrality (no/capturing/
failing reporter → identical EditResult), the stage sequence, and both
warn notes. notion-md 194 passed; engine function signatures unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
robobun and others added 14 commits June 16, 2026 23:04
The referrer is how JSC's requestImportModule finds the initiator
CyclicModuleRecord (see oven-sh/WebKit#230 / #30651). Bun had been
passing an empty Identifier, which leaves the initiator unresolved —
so the WebKit-side discriminator can't fire and the TLA re-entrancy
skip continues to produce TDZ reads across independent dynamic
imports.

The registry key is the file-system path (for file:// sources) or the
substring after builtin:// (for builtins), mirroring what the resolve()
path above uses.

Adds a regression test for #30651 covering the parallel dynamic-import
case (two independent imports of the same TLA dep; the second one must
wait, not run against post-await TDZ bindings). Ships with the matching
WEBKIT_VERSION bump to oven-sh/WebKit#230's preview build.
…ng here

coderabbit flagged the setTimeout calls as 'timing-sensitive'. Replaced
them with promise-chained coordination — `await __tlaEntered` gate on
globalThis — and it passed BOTH with-fix and without-fix, because
purely in-JS microtasks can't sequence the async I/O in the module
loader's fetch step.

Reverted to setTimeout with an inline comment explaining that the race
this test reproduces is specifically 'entry1's Evaluate() completes
before entry2's starts', and file I/O between the two imports is what
needs to drain — timers are the only thing that yields there. Left the
original values (10ms for the gate, 100ms for the tla await) since they
gave a clean fail without fix / pass with fix.
Rebased oven-sh/WebKit#230 onto the latest WebKit main
(cf8fb22b7011 — LTO build config only, no source changes since
my previous rebase onto #236). Clean cherry-pick, no conflicts.

Preview tarball: autobuild-preview-pr-230-36cc1283.
The previous gate run hit a stale/empty `webkit-preview-pr-230-*` cache
and tried to fetch a tarball that WebKit CI hasn't published yet. Caches
in /root/.bun/build-cache/ are now populated with my locally-built
WebKit libs (debug-asan + release) and the correct .identity stamps.
Local gate simulation passes 6/6 with-fix, 5/6 without-fix (1 expected
fail on the new test).
The two bake-specific paths were passing an empty Identifier to
JSC::importModule, so dynamic imports initiated from or targeting
bake:/ modules fell back to the coarse pre-PR gate (VM::hasPendingDynamicImport)
instead of the new dep == initiator discriminator.

Not a regression (coarse fallback preserves pre-PR behavior for bake
dynamic imports) but Bake is precisely the code-splitting case the
Nitro-style skip was written for, so it should participate in the
precise check too.

For the bake:/ specifier path, reuse the same file://+builtin:// slice
logic as Zig::GlobalObject::moduleLoaderImportModule. For the bake:/
source-origin path, the refererString is already the bake registry
key and can be forwarded verbatim.
Three review nits:

1. Virtual-module fast path (Bun.plugin onResolve) was passing an empty
   Identifier to JSC::importModule — so dynamic imports that resolve
   through virtual specifiers dropped to the coarse pre-PR gate instead
   of the precise dep == initiator discriminator. Hoist the referrer
   computation into a lambda and use at both call sites.

2. Builtin sourceOrigin URL is colon-to-slash rewritten relative to the
   registry key (`node:fs` → `builtin://node/fs`). `substring(10)`
   alone yields `node/fs`, which misses the registry. Invert the
   rewrite for node/ and bun/ prefixes.

3. Test file: the comment "Same as above but the TLA dep is reached
   indirectly through different parents" was positionally broken by
   the new #30651 test inserted between it and its referent. Name the
   referent explicitly instead.
The prior gate run reused a stale local WebKit cache pointing at
upstream main (sha 782504c968). Wiping and refetching from the
just-published autobuild-preview-pr-230-36cc1283 tarball made all 6
TLA tests pass in both debug (ASAN) and release.
Replace the transcript-relative "coarse pre-PR gate" phrasing with
the concrete symbol it refers to (VM::hasPendingDynamicImport) so
the comment stands on its own after merge.
The earlier Bake-side plumbing (commits de98a5f, 2ac140a) fed a
`bake:/...` referrer into JSC::importModule so requestImportModule
could resolve the initiator for the #30651 discriminator. But that
re-routes through bakeModuleLoaderResolve's referrer-relative branch,
which then feeds the `bake:/...` specifier into BakeProdResolve's
is_package_path check → "Non-relative import … in production assets".

test/bake/dev/production.test.ts's "client-side component with default
import should work" and "importing useState from client component"
cases hit this — both involve dynamic import() of `bake:/...` chunks
during pre-rendering.

Revert to passing an empty Identifier. Bake dynamic imports fall back
to the coarse VM::hasPendingDynamicImport() gate on the WebKit side —
same as pre-PR. The #30651 fix still covers the reported non-bake
user bug (Vite/Nitro/Payload dynamic import into a shared TLA dep
via regular file:// modules), which is what matters.
Builds #57926/#58237 404'd fetching the WebKit tarball because the
preview build for the rebased PR-230 hadn't published yet (a transient
Ubuntu-24.04.4 runner-image failure on the FreeBSD Docker step delayed
it; the re-run published all 42 assets). Trigger a fresh build now that
the tarball is available.
The prior CI build ran before the rebased WebKit PR-230 preview finished
publishing (Windows arm64 + the artifact-upload step lag behind the
Linux builds). The tarball is now live with all 43 assets — trigger a
fresh build that can fetch it.
@robobun robobun force-pushed the farm/7009659f/fix-tla-cross-evaluate-dynamic-import branch from b235ccf to 0302850 Compare June 16, 2026 23:21
@robobun

robobun commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Superseded by #32437 (merged), which fixes the same bug via a cleaner mechanism: moduleLoaderImportModule looks up the referrer's asyncEvaluationOrder through the new JSModuleLoader::asyncEvaluationOrderForKey() and threads it to JSC::importModule as referrerAsyncOrder, rather than plumbing the referrer identifier and a VM-level initiator set as this PR did. That approach supersedes oven-sh/WebKit#228 (and this PR's paired oven-sh/WebKit#230).

Verified the #30651 repro is fixed on current main (eae803811b, WebKit cd821fec):

$ bun-debug driver.mjs   # two parallel dynamic imports of a shared TLA dep
entry1 foo: 123
entry2 foo: 123

No ReferenceError: Cannot access 'foo' before initialization. test/js/bun/resolve/dynamic-import-tla-cycle.test.ts passes 6/6 on main, including the sibling dynamic imports sharing a TLA wrapper wait for its post-await exports regression test #32437 added.

Closing as superseded. The paired oven-sh/WebKit#230 will be closed too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Broken imports

1 participant