Skip to content

sourcemap: bit-packed InternalSourceMap (~2.4 B/mapping, no decode)#29358

Merged
Jarred-Sumner merged 12 commits into
mainfrom
claude/internal-sourcemap-varint
Apr 16, 2026
Merged

sourcemap: bit-packed InternalSourceMap (~2.4 B/mapping, no decode)#29358
Jarred-Sumner merged 12 commits into
mainfrom
claude/internal-sourcemap-varint

Conversation

@Jarred-Sumner

@Jarred-Sumner Jarred-Sumner commented Apr 16, 2026

Copy link
Copy Markdown
Collaborator

What

Replace the runtime/standalone source-map storage with a private binary format (InternalSourceMap) that js_printer writes directly and find(line, col) reads in place — no whole-file decode into Mapping.List, no VLQ round-trip.

Mappings are stored in K=64 windows. Each window is a 32-byte fixed header (count, flags, three u16 section lengths, three always-present 8-byte equality bitmasks) followed by varint-delta streams. The format exploits the structure of transpiler output:

  • d_src_idx is 0 for ~100% of runtime-transpiled mappings → omitted unless flag_has_src_idx
  • d_gen_line is 0 for ~81% → 1-bit gen_line_mask per mapping, varint exceptions for the rare >1
  • d_orig_line == d_gen_line for ~62% → 1-bit orig_line_eq_mask, varint where they differ
  • d_orig_col == d_gen_col for ~77% → 1-bit orig_col_eq_mask, varint where they differ
  • d_gen_col is the only lane stored for every delta (zigzag varint)

A 24-byte SyncEntry per window holds the absolute (gen_line, gen_col, byte_offset, orig_line, orig_col, src_idx) for binary search. find(line, col) bsearches SyncEntry[], parses one window header, then advances ≤63 deltas via a WindowReader. A 16-slot fully-associative FindCache (256-byte key array + ~21 KB payload, one per VM) keeps decoded window prefixes warm so successive frames in the same window skip the decode entirely.

Full design rationale + blob layout is in the doc comment at the top of src/sourcemap/InternalSourceMap.zig.

Memory

_tsc.js (563k mappings) resident after first .stack
Mapping.List (main) ~11.3 MB (20 B/mapping)
LEB128 stream (commit 1) 2.92 MB (5.4 B/mapping)
bit-packed windows (this) 1.29 MB (2.41 B/mapping)

--compile binary for _tsc.js: −1.8 MB vs LEB128, −several MB vs main (no Mapping.parse at runtime; LazySourceMap.load is a view into the embedded blob).

The format degrades gracefully: minified single-line input (react-dom.production.min.js, 37.6k mappings, 132 KB source) packs to ~80 KB at ~2.2 B/mapping vs ~750 KB Mapping.List. Multi-source bundles pay O(files) for src_idx boundary exceptions, not O(mappings).

Perf (Apple M4 Max, release, vs bun-1.3.12)

this PR 1.3.12 Δ
bench/snippets/error-capturestack.mjs (mitata, multi-window) 1.37–1.41 µs 1.27–1.32 µs +6–8%
plain while(1) new Error().stack loop 657 ns ~810 ns −19%
5-frame multi-window synthetic (1500-line file) 818 ns 769 ns +6%
first .stack on a 150k-line module ~0.1 ms ~5 ms −98%
RSS load→first-stack (150k-line module) +0.06 MB +2.3 MB

The mitata-case +6–8% is the per-window header parse + ≤16-way associative scan; the plain-loop −19% is the C++ batching removing per-frame mutex+hashmap.

Scope

  • Runtime transpile path (Chunk.Builder with prepend_count): writes InternalSourceMap directly. The SavedMappings raw-VLQ variant is deleted.
  • --compile: serializeJsonSourceMapForStandalone runs the bundler's VLQ through fromVLQ() once at build time and embeds the blob; LazySourceMap.load is a view, no Mapping.parse.
  • CodeCoverage: uses Cursor.moveTo() so the per-byte loop decodes ~0–1 segments/step instead of bsearching the whole list.
  • C++: FormatStackTraceForJS.cpp formatStackTrace / computeErrorInfoWithPrepareStackTrace now batch frames into a Vector<ZigStackFrame, 8> and call Bun__remapStackFramePositions once; the Zig side caches (path_hash → ISM) across the batch.
  • Inspector / debug dump: re-encode to standard VLQ on demand via appendVLQTo().
  • Bundler external .map emission: unchanged (still spec VLQ).
  • RuntimeTranspilerCache version 18 → 20.

Test plan

  • bun run zig:check-all (16 targets)
  • bun bd test test/js/bun/sourcemap/internal-sourcemap.test.ts (incl. ~30K-column single-line minified-shape case)
  • bun bd test test/js/bun/test/stack.test.ts
  • bun bd test test/cli/test/coverage.test.ts
  • bun bd test test/bundler/compile-sourcemap-internal.test.ts
  • bun bd test test/js/node/module/node-module-module.test.js
  • cold/warm RuntimeTranspilerCache (210 KB file): identical :line:col
  • bun build --sourcemap=external still emits valid VLQ
  • K=64 boundary, N=1, N=0 mapping files
  • 50-deep recursive stack (batched remap) — all frames correct
  • CI

Replace the runtime source-map pipeline (js_printer emits VLQ text,
SavedSourceMap stores it, first .stack lookup decodes the whole thing
into a 20-byte/mapping Mapping.List) with a private binary format that
the printer writes directly and find() reads in place.

The format is a zig-zag LEB128 delta stream with absolute SyncPoints
every 64 mappings, laid out as a single []u8. Lookup is bsearch over
N/64 sync points + at most 64 varint decodes; no full-file
materialization ever happens. Memory stays at ~5-6 B/mapping instead of
expanding to 20 B/mapping on first use.

For typescript.js (~843k mappings) this keeps the resident map at
~4.6 MB instead of ~17 MB, and first-lookup latency drops from ~5 ms to
~0.1 ms (release).

- src/sourcemap/InternalSourceMap.zig: new format, Builder, find(),
  Cursor, fromVLQ(), appendVLQTo()
- Chunk.zig: runtime path (prepend_count) writes InternalSourceMap;
  bundler/.map output unchanged
- SavedSourceMap.zig: SavedMappings VLQ variant removed; lazy
  re-materialization to ParsedSourceMap kept for findSourceMap/dump
- CodeCoverage.zig: use Cursor.moveTo() instead of per-byte bsearch
- StandaloneModuleGraph.zig: --compile embeds the blob (via fromVLQ at
  build time) and LazySourceMap.load views it directly; ParsedSourceMap
  gains an optional InternalSourceMap backing
- VirtualMachine.zig: debugger inline-map path re-encodes via
  appendVLQTo on demand
- RuntimeTranspilerCache: bump version 18 -> 19
@robobun

robobun commented Apr 16, 2026

Copy link
Copy Markdown
Collaborator
Updated 3:11 AM PT - Apr 16th, 2026

@Jarred-Sumner, your commit db32176 has 5 failures in Build #45914 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 29358

That installs a local version of the PR into your bun-29358 executable, so you can run:

bun-29358 --bun

@github-actions

Copy link
Copy Markdown
Contributor

Found 1 issue this PR may fix:

  1. Bun VsCode Debugger Fails to Bind Breakpoints in Big Files #20964 - PR's 46x speedup on large-file source map lookup and debugger-specific re-encode-on-demand path directly addresses why breakpoints fail to bind in big files (slow/timing-out inspector source map resolution)

If this is helpful, copy the block below into the PR description to auto-close this issue on merge.

Fixes #20964

🤖 Generated with Claude Code

@coderabbitai

coderabbitai Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds an in-process InternalSourceMap binary sourcemap format and wires it through serialization, runtime storage, lookups, JSON emission, VM/C++ batched remapping, tests, and a microbenchmark, replacing VLQ/SavedMappings with a compact blob, cursor/cache APIs, and builder/fromVLQ tooling.

Changes

Cohort / File(s) Summary
New InternalSourceMap module
src/sourcemap/InternalSourceMap.zig
Adds InternalSourceMap blob format, validation, sync-window index, WindowReader/Cursor, find/findWithCache, Builder, fromVLQ/appendVLQ, deinit and memoryCost APIs.
Parsed/Export surface
src/sourcemap/ParsedSourceMap.zig, src/sourcemap/sourcemap.zig
ParsedSourceMap can hold internal: ?InternalSourceMap; adds findMapping()/internalCursor(), adapts deinit/memoryCost/writeVLQs to use internal blobs; exports InternalSourceMap.
SavedSourceMap & runtime cache
src/bun.js/SavedSourceMap.zig, src/bun.js/RuntimeTranspilerCache.zig
Replaces SavedMappings/VLQ storage with InternalSourceMap blobs, updates deinit/overwrite, adds cache fields and getValueLocked, and bumps runtime cache expected_version (18→20).
Chunk, VLQ emission & format
src/sourcemap/Chunk.zig
Removes offset-based VLQ emission, adds printSourceMapContentsFromInternal/printSourceMapContentsJSON, makes VLQSourceMap optionally build an InternalSourceMap via Builder, and changes getBuffer receiver.
Standalone graph serialization
src/StandaloneModuleGraph.zig
SerializedSourceMap now exposes mappingBlob() (nullable) holding an InternalSourceMap blob; LazySourceMap.load() uses the blob and initializes ParsedSourceMap.internal without VLQ decoding.
VM remapping and C++ bindings
src/bun.js/VirtualMachine.zig, src/bun.js/bindings/FormatStackTraceForJS.cpp
Implements batched remapping, holds locks across batches, adds cached fast-paths for InternalSourceMap lookups, and changes C++ to two-pass batched remap model that calls Bun__remapStackFramePositions once per batch.
Code coverage & JSSourceMap
src/sourcemap/CodeCoverage.zig, src/sourcemap/JSSourceMap.zig
Coverage and JS API prefer internal cursor/findMapping() when internal is present; fallbacks to existing mapping-list lookups preserved.
Bench & ignore
bench/sourcemap/internal-sourcemap-bench.ts, bench/sourcemap/.gitignore
Adds microbenchmark that generates a ~150k-line module to exercise InternalSourceMap paths and ignores big-module.generated.ts.
Tests
test/bundler/compile-sourcemap-internal.test.ts, test/js/bun/sourcemap/internal-sourcemap.test.ts
Adds integration and unit tests validating remapped stack traces for compiled output and multiple InternalSourceMap lookup scenarios.
Misc
test/internal/ban-limits.json
Small fixture tweak: increments a ban-count value.

Possibly related PRs

🚥 Pre-merge checks | ✅ 2
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main change: introducing a bit-packed InternalSourceMap format that reduces memory footprint to ~2.4 bytes per mapping without requiring full VLQ decode at runtime.
Description check ✅ Passed The PR description is comprehensive and exceeds template requirements. It includes detailed sections on what changed (binary format design, integration across runtime/compile/coverage/C++ paths), memory improvements, performance benchmarks, scope clarification, and test plan.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/SavedSourceMap.zig`:
- Around line 137-140: The duplicated allocation stored in blob is leaked if
putValue(...) fails; update putMappings (in SavedSourceMap) to free the
duplicated buffer on error by using a defer or try/catch around the call to
this.putValue(source.path.text, Value.init(...)) so that
bun.default_allocator.free(blob.ptr) (or the appropriate allocator free) runs
when putValue returns an error; ensure the defer is canceled only on success so
successful inserts keep the blob ownership transferred to the map.

In `@src/sourcemap/CodeCoverage.zig`:
- Around line 419-423: The borrowed internal mapping from
bun.jsc.VirtualMachine.get().source_mappings.getInternal(source_url.slice())
must be pinned (or cloned to an owned buffer) while still holding the
source_mappings table lock so it cannot be freed by a concurrent get() that
swaps the entry; modify the code in the block that computes
internal_mapping_/parsed_mappings_ to, when internal_mapping_ is non-null,
either increment its refcount or make an owned copy (e.g., call the mapping's
pin/retain API or clone the blob) inside the lock and use that pinned/owned
object outside the lock; ensure you reference internal_mapping_ (the pinned
variant) and parsed_mappings_ consistently so the subsequent cursor walk uses
the pinned/owned data rather than a borrowed view.

In `@src/sourcemap/InternalSourceMap.zig`:
- Around line 155-157: deinit currently always frees memory with
bun.default_allocator which is unsafe for views over non-owned memory; update
InternalSourceMap by documenting ownership semantics and/or adding an owned:
bool field and gating the free in deinit. Specifically, add a brief doc comment
on the InternalSourceMap type and its deinit method stating callers must only
call deinit on owned heap allocations, or add an owned: bool member to
InternalSourceMap and change pub fn deinit(self: InternalSourceMap) void to only
call bun.default_allocator.free(...) when self.owned is true (leave behavior
unchanged for existing owned instances). Ensure references to deinit and
bun.default_allocator are updated accordingly so non-owned views (mmap/embedded
blobs) won't be freed.

In `@src/sourcemap/ParsedSourceMap.zig`:
- Around line 96-99: writeVLQs() currently always iterates map.mappings and
therefore emits empty mappings for ParsedSourceMap instances that use the
.internal path; update writeVLQs() to check ParsedSourceMap.internal and, when
present, call the internal object's appendVLQTo(...) (or equivalent method on
the internal type) to write the pre-encoded VLQ blob instead of iterating
.mappings, otherwise fall back to iterating .mappings; alternatively, if you
prefer finalizing earlier, add a step (e.g., in formatVLQs() or before
serialization) that converts an internal-backed ParsedSourceMap into its
.mappings representation so writeVLQs() can remain unchanged.

In `@src/StandaloneModuleGraph.zig`:
- Around line 1377-1380: The mappingBlob function must validate that the
computed start and the requested map_bytes_length are within map.bytes bounds
before performing the slice: in pub fn mappingBlob(map: SerializedSourceMap)
[]const u8 verify that start <= map.bytes.len and start + head.map_bytes_length
<= map.bytes.len and return an empty or safe value (or an explicit error)
instead of slicing out-of-bounds; also update LazySourceMap.load() to treat a
null/invalid mappingBlob result as .none (i.e., if mappingBlob indicates missing
or truncated data, return .none rather than assuming non-null), referencing
mappingBlob, SerializedSourceMap, and LazySourceMap.load for the changes.

In `@test/js/bun/sourcemap/internal-sourcemap.test.ts`:
- Around line 74-84: Replace the unconditional stderr emptiness check with a
conditional that only asserts stderr is empty when the child exited non‑zero:
instead of expect(stderr).toBe(""), wrap it as if (exited !== 0) {
expect(stderr).toBe(""); } immediately before the expect(exited).toBe(0)
assertion; apply the same change to the similar block around lines 98-102. Use
the existing local variables (stderr, stdout, exited) and leave
extractPositions/positions assertions unchanged.
- Around line 76-82: The test under extractPositions currently expects only the
final frames and omits the intermediate call sites for beta() and gamma();
update the expected positions array in internal-sourcemap.test.ts to include the
remapped frames for the beta() and gamma() call sites (i.e., add the two
additional "index.ts:line:col" entries that represent the return alpha() and
return beta() frames) so the expected array matches every match returned by
extractPositions().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 9743e614-b069-4604-9b6d-3956771afa44

📥 Commits

Reviewing files that changed from the base of the PR and between 942c656 and b18fac2.

📒 Files selected for processing (13)
  • bench/sourcemap/.gitignore
  • bench/sourcemap/internal-sourcemap-bench.ts
  • src/StandaloneModuleGraph.zig
  • src/bun.js/RuntimeTranspilerCache.zig
  • src/bun.js/SavedSourceMap.zig
  • src/bun.js/VirtualMachine.zig
  • src/sourcemap/Chunk.zig
  • src/sourcemap/CodeCoverage.zig
  • src/sourcemap/InternalSourceMap.zig
  • src/sourcemap/ParsedSourceMap.zig
  • src/sourcemap/sourcemap.zig
  • test/bundler/compile-sourcemap-internal.test.ts
  • test/js/bun/sourcemap/internal-sourcemap.test.ts

Comment on lines 137 to 140
pub fn putMappings(this: *SavedSourceMap, source: *const logger.Source, mappings: MutableString) !void {
try this.putValue(source.path.text, Value.init(bun.cast(*SavedMappings, try bun.default_allocator.dupe(u8, mappings.list.items))));
const blob = try bun.default_allocator.dupe(u8, mappings.list.items);
try this.putValue(source.path.text, Value.init(bun.cast(*InternalSourceMap, blob.ptr)));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Free the duplicated blob if putValue() fails.

The allocation succeeds before the hash-table insert, so an error from putValue() leaks blob on the failure path.

Suggested fix
 pub fn putMappings(this: *SavedSourceMap, source: *const logger.Source, mappings: MutableString) !void {
     const blob = try bun.default_allocator.dupe(u8, mappings.list.items);
+    errdefer bun.default_allocator.free(blob);
     try this.putValue(source.path.text, Value.init(bun.cast(*InternalSourceMap, blob.ptr)));
 }

As per coding guidelines, "In Zig code, be careful with allocators and use defer for cleanup".

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pub fn putMappings(this: *SavedSourceMap, source: *const logger.Source, mappings: MutableString) !void {
try this.putValue(source.path.text, Value.init(bun.cast(*SavedMappings, try bun.default_allocator.dupe(u8, mappings.list.items))));
const blob = try bun.default_allocator.dupe(u8, mappings.list.items);
try this.putValue(source.path.text, Value.init(bun.cast(*InternalSourceMap, blob.ptr)));
}
pub fn putMappings(this: *SavedSourceMap, source: *const logger.Source, mappings: MutableString) !void {
const blob = try bun.default_allocator.dupe(u8, mappings.list.items);
errdefer bun.default_allocator.free(blob);
try this.putValue(source.path.text, Value.init(bun.cast(*InternalSourceMap, blob.ptr)));
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/SavedSourceMap.zig` around lines 137 - 140, The duplicated
allocation stored in blob is leaked if putValue(...) fails; update putMappings
(in SavedSourceMap) to free the duplicated buffer on error by using a defer or
try/catch around the call to this.putValue(source.path.text, Value.init(...)) so
that bun.default_allocator.free(blob.ptr) (or the appropriate allocator free)
runs when putValue returns an error; ensure the defer is canceled only on
success so successful inserts keep the blob ownership transferred to the map.

Comment thread src/sourcemap/CodeCoverage.zig Outdated
Comment on lines +419 to +423
const internal_mapping_ = bun.jsc.VirtualMachine.get().source_mappings.getInternal(source_url.slice());
const parsed_mappings_ = if (internal_mapping_ == null)
bun.jsc.VirtualMachine.get().source_mappings.get(source_url.slice())
else
null;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Pin the internal sourcemap before using it outside the table lock.

getInternal() only hands back a borrowed view. Another thread can call get() for the same path, swap the entry to ParsedSourceMap, and free the blob while the cursor below is still walking it. That turns the new internal-coverage path into a UAF race.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sourcemap/CodeCoverage.zig` around lines 419 - 423, The borrowed internal
mapping from
bun.jsc.VirtualMachine.get().source_mappings.getInternal(source_url.slice())
must be pinned (or cloned to an owned buffer) while still holding the
source_mappings table lock so it cannot be freed by a concurrent get() that
swaps the entry; modify the code in the block that computes
internal_mapping_/parsed_mappings_ to, when internal_mapping_ is non-null,
either increment its refcount or make an owned copy (e.g., call the mapping's
pin/retain API or clone the blob) inside the lock and use that pinned/owned
object outside the lock; ensure you reference internal_mapping_ (the pinned
variant) and parsed_mappings_ consistently so the subsequent cursor walk uses
the pinned/owned data rather than a borrowed view.

Comment thread src/sourcemap/InternalSourceMap.zig
Comment thread src/StandaloneModuleGraph.zig Outdated
Comment on lines +74 to +84
expect(stderr).toBe("");

const positions = extractPositions(stdout);
expect(positions).toEqual([
"6:13", // throw new Error("boom")
"24:3", // gamma() at top level
"31:17", // new Error("here").stack
"34:13", // captureViaCaptureStackTrace()
]);

expect(exited).toBe(0);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t require stderr to be empty on successful subprocess runs.

ASAN builds can print JSC signal-handler warnings to stderr even when the child exits successfully, so these assertions make the tests flaky in CI.

Suggested fix
-    expect(stderr).toBe("");
+    if (exited !== 0) {
+      expect(stderr).toBe("");
+    }
@@
-    expect(stderr).toBe("");
+    if (exited !== 0) {
+      expect(stderr).toBe("");
+    }

Based on learnings, in test/**/*.test.ts spawned-process assertions should use if (exitCode !== 0) { expect(stderr).toBe(""); } immediately before the exit-code assertion to avoid ASAN-only stderr flakes.

Also applies to: 98-102

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/js/bun/sourcemap/internal-sourcemap.test.ts` around lines 74 - 84,
Replace the unconditional stderr emptiness check with a conditional that only
asserts stderr is empty when the child exited non‑zero: instead of
expect(stderr).toBe(""), wrap it as if (exited !== 0) { expect(stderr).toBe("");
} immediately before the expect(exited).toBe(0) assertion; apply the same change
to the similar block around lines 98-102. Use the existing local variables
(stderr, stdout, exited) and leave extractPositions/positions assertions
unchanged.

Comment on lines +76 to +82
const positions = extractPositions(stdout);
expect(positions).toEqual([
"6:13", // throw new Error("boom")
"24:3", // gamma() at top level
"31:17", // new Error("here").stack
"34:13", // captureViaCaptureStackTrace()
]);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Include the intermediate beta() and gamma() frames in the expected stack.

extractPositions() keeps every index.ts:line:col match. The thrown-error stack will also include the return alpha() and return beta() call sites, so Line 77-82 undercounts the remapped frames and will fail on a normal stack.

Suggested fix
     expect(positions).toEqual([
       "6:13", // throw new Error("boom")
+      "10:10", // return alpha()
+      "14:10", // return beta()
       "24:3", // gamma() at top level
       "31:17", // new Error("here").stack
       "34:13", // captureViaCaptureStackTrace()
     ]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/js/bun/sourcemap/internal-sourcemap.test.ts` around lines 76 - 82, The
test under extractPositions currently expects only the final frames and omits
the intermediate call sites for beta() and gamma(); update the expected
positions array in internal-sourcemap.test.ts to include the remapped frames for
the beta() and gamma() call sites (i.e., add the two additional
"index.ts:line:col" entries that represent the return alpha() and return beta()
frames) so the expected array matches every match returned by
extractPositions().

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/sourcemap/ParsedSourceMap.zig (1)

13-17: ⚠️ Potential issue | 🟠 Major

.internal mode is introduced without matching VLQ serialization support.

After Line 17, ParsedSourceMap can be internal-backed, but writeVLQs() still emits from map.mappings only. Internal-backed maps will serialize as empty "mappings" unless this path calls the internal encoder first.

Suggested fix
 pub fn writeVLQs(map: *const ParsedSourceMap, writer: anytype) !void {
+    if (map.internal) |ism| {
+        try ism.appendVLQTo(writer);
+        return;
+    }
+
     var last_col: i32 = 0;
     var last_src: i32 = 0;
     var last_ol: i32 = 0;
     var last_oc: i32 = 0;
     var current_line: i32 = 0;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sourcemap/ParsedSourceMap.zig` around lines 13 - 17, ParsedSourceMap now
supports an internal: ?InternalSourceMap field but writeVLQs() still serializes
only from map.mappings, so internal-backed maps get an empty "mappings" output;
update writeVLQs() (or the serialization code path that produces the "mappings"
field) to detect when ParsedSourceMap.internal is non-null and call the
InternalSourceMap's VLQ encoder/serializer (e.g., internal.encodeVLQs() or the
equivalent method on InternalSourceMap) instead of reading map.mappings, falling
back to the existing map.mappings path when internal is null so both
materialized and internal-backed maps serialize correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/sourcemap/ParsedSourceMap.zig`:
- Around line 96-99: The new ParsedSourceMap.findMapping method should be used
everywhere instead of calling .mappings.find(...) directly; update all callers
(e.g., the code in JSSourceMap that currently calls this.mappings.find, the
ErrorReportRequest remapping usage, and the sourcemap module code) to call
ParsedSourceMap.findMapping(this, line, column) (or the appropriate receiver) so
internal-backed maps are respected; ensure you propagate the ?Mapping return
(handle null/none the same way the previous .mappings.find result was handled)
and remove direct .mappings.find usages.

---

Duplicate comments:
In `@src/sourcemap/ParsedSourceMap.zig`:
- Around line 13-17: ParsedSourceMap now supports an internal:
?InternalSourceMap field but writeVLQs() still serializes only from
map.mappings, so internal-backed maps get an empty "mappings" output; update
writeVLQs() (or the serialization code path that produces the "mappings" field)
to detect when ParsedSourceMap.internal is non-null and call the
InternalSourceMap's VLQ encoder/serializer (e.g., internal.encodeVLQs() or the
equivalent method on InternalSourceMap) instead of reading map.mappings, falling
back to the existing map.mappings path when internal is null so both
materialized and internal-backed maps serialize correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d30ec725-b1a1-4693-9742-4a7c66fa7be8

📥 Commits

Reviewing files that changed from the base of the PR and between b18fac2 and 631a655.

📒 Files selected for processing (1)
  • src/sourcemap/ParsedSourceMap.zig

Comment on lines +96 to +99
pub fn findMapping(this: *const ParsedSourceMap, line: bun.Ordinal, column: bun.Ordinal) ?Mapping {
if (this.internal) |ism| return ism.find(line, column);
return this.mappings.find(line, column);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

findMapping() is correct here, but migration is incomplete in current callers.

Line 96 adds the right dispatch API, but other paths still call .mappings.find(...) directly (src/sourcemap/JSSourceMap.zig Line 248, src/bake/DevServer/ErrorReportRequest.zig Line 148+, src/sourcemap/sourcemap.zig Line 220). Those paths will bypass internal-backed maps and can return incorrect remaps.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sourcemap/ParsedSourceMap.zig` around lines 96 - 99, The new
ParsedSourceMap.findMapping method should be used everywhere instead of calling
.mappings.find(...) directly; update all callers (e.g., the code in JSSourceMap
that currently calls this.mappings.find, the ErrorReportRequest remapping usage,
and the sourcemap module code) to call ParsedSourceMap.findMapping(this, line,
column) (or the appropriate receiver) so internal-backed maps are respected;
ensure you propagate the ?Mapping return (handle null/none the same way the
previous .mappings.find result was handled) and remove direct .mappings.find
usages.

Comment thread src/sourcemap/InternalSourceMap.zig Outdated
Comment thread src/sourcemap/CodeCoverage.zig Outdated
Comment thread src/StandaloneModuleGraph.zig
Comment thread src/sourcemap/ParsedSourceMap.zig
Comment thread src/StandaloneModuleGraph.zig
Comment thread src/bun.js/SavedSourceMap.zig Outdated
Comment thread src/bun.js/SavedSourceMap.zig
Replace the per-mapping LEB128 stream with K=64-mapping windows that
exploit transpiler-output structure: a 32B fixed header (count, flags,
3 section lengths, 3 always-present 8-byte equality masks for
d_gen_line / d_orig_line==d_gen_line / d_orig_col==d_gen_col) followed
by a d_gen_col varint for every delta and varints for the other lanes
only where the masks say they differ. SyncEntry stays 24B and carries
the absolute first-mapping state for bsearch.

For _tsc.js (563k mappings): 1.29 MB resident vs 2.92 MB LEB128 vs
~11.3 MB Mapping.List on main; --compile binary -1.8 MB.

FindCache becomes a 16-slot fully-associative set with a contiguous
key array so the per-lookup scan is one 256-byte sweep; the heavy
slot payloads sit in a parallel array. Fixes the per-frame window
re-decode that showed up when a single stack touches several distinct
windows (e.g. mitata's runner.mjs).

error-capturestack.mjs vs bun-1.3.12: 1.37-1.41 vs 1.27-1.32 us
(+4-7%); plain-loop new Error().stack is ~19% faster than 1.3.12.

Also: expanded codec field/variable names (d_gen_col, orig_line_eq_mask,
etc.) for legibility, hoisted Builder.flushWindow's pending[] read to a
sliding prev, RuntimeTranspilerCache version 19 -> 20, and added a
single-line ~30K-column minified-shape test.
@Jarred-Sumner Jarred-Sumner changed the title sourcemap: varint+sync InternalSourceMap for runtime & --compile sourcemap: bit-packed InternalSourceMap (~2.4 B/mapping, no decode) Apr 16, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/bun.js/bindings/FormatStackTraceForJS.cpp (1)

488-501: ⚠️ Potential issue | 🟠 Major

Restore the first-frame line/column/sourceURL updates in the prepareStackTrace path.

This loop rewrites each CallSite, but it never assigns the function’s line, column, or sourceURL out-parameters. computeErrorInfoWrapperToJSValue() still writes those refs back to the caller afterward, so any Error.prepareStackTrace user now gets stale/default top-frame location metadata even when the batch remap succeeded. Mirror the hasSet behavior from formatStackTrace(...) here and populate the refs from the first displayed frame.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/bun.js/bindings/FormatStackTraceForJS.cpp` around lines 488 - 501, The
loop that remaps frames (using remappedFrames, didRemap, sourceURLs, callSites
and ZigStackFrame) updates each CallSite but does not populate the first-frame
out-parameters (line/column/sourceURL) that computeErrorInfoWrapperToJSValue()
expects; mirror the hasSet logic from formatStackTrace(...) and, when processing
the first displayed frame (remappedFrames[0] / callSites.at(0)), set the
function-level line, column and sourceURL refs from that frame (using the same
conditions used for callsite->setLineNumber/setColumnNumber and setSourceURL) so
Error.prepareStackTrace consumers receive the correct top-frame metadata.
♻️ Duplicate comments (2)
test/js/bun/sourcemap/internal-sourcemap.test.ts (2)

76-82: ⚠️ Potential issue | 🟠 Major

Add the missing beta() and gamma() frames to this expected stack.

extractPositions() keeps every index.ts:line:col match. The thrown stack from gamma() -> beta() -> alpha() also includes the return alpha() and return beta() call sites, so the array on Line 77 undercounts the remapped frames.

Suggested fix
     expect(positions).toEqual([
       "6:13", // throw new Error("boom")
+      "10:10", // return alpha()
+      "14:10", // return beta()
       "24:3", // gamma() at top level
       "31:17", // new Error("here").stack
       "34:13", // captureViaCaptureStackTrace()
     ]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/js/bun/sourcemap/internal-sourcemap.test.ts` around lines 76 - 82, The
expected positions array under the extractPositions(stdout) assertion is missing
the two intermediate frames produced by the gamma() -> beta() -> alpha() call
chain; update the expected list in the test to include the additional remapped
call-site positions produced by those return alpha()/return beta() frames so it
matches extractPositions() output (look for references to extractPositions,
gamma, beta, and alpha in the test to locate and adjust the expected positions
array).

74-75: ⚠️ Potential issue | 🟡 Minor

Make the empty-stderr assertions conditional on nonzero exits.

ASAN jobs can print JSC signal-handler warnings to stderr on successful subprocess runs, so these checks will keep flaking even when the remap output is correct.

Suggested fix
-    expect(stderr).toBe("");
+    if (exited !== 0) {
+      expect(stderr).toBe("");
+    }

Based on learnings, in test/**/*.test.ts spawned-process assertions should use if (exitCode !== 0) { expect(stderr).toBe(""); } immediately before the exit-code assertion to avoid ASAN-only stderr flakes.

Also applies to: 123-123

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/js/bun/sourcemap/internal-sourcemap.test.ts` around lines 74 - 75, The
test currently asserts expect(stderr).toBe("") unconditionally which flakes
under ASAN; change it to only assert empty stderr when the subprocess failed by
replacing that line with a conditional: if (exitCode !== 0) {
expect(stderr).toBe(""); } and place it immediately before the existing
exit-code assertion (the code that checks exitCode), updating both occurrences
that match the pattern (the expect(stderr).toBe("") at the current spot and the
duplicate at the other location).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/sourcemap/InternalSourceMap.zig`:
- Around line 100-103: InternalSourceMap currently uses the header-derived
totalLen() to free memory in deinit(), which can free the wrong span if the blob
is truncated/corrupted; add a dedicated field (e.g., ownedLen or an ownedSlice)
to InternalSourceMap to track the actual owned byte length when the struct is
constructed, use that owned length for deallocation in deinit() and only use
totalLen() for validation checks, and when wrapping raw bytes assert that
totalLen() == ownedLen (or map.totalLen() == map.len) so corrupted headers are
detected rather than relied on for free; update all places that construct
InternalSourceMap (including the other occurrences noted around the 162-167
area) to set the new owned length field.

In `@src/sourcemap/ParsedSourceMap.zig`:
- Around line 13-17: The ParsedSourceMap instances created with an
InternalSourceMap blob must mark the blob as borrowed to avoid deinit() freeing
it; locate the places that set ParsedSourceMap.internal = someInternal (the two
sites that also set input_line_count from the blob) and add
is_standalone_module_graph = true on those ParsedSourceMap initializers so the
deinit path treats the blob as borrowed rather than freeing it.

In `@src/StandaloneModuleGraph.zig`:
- Around line 270-283: The code stores an InternalSourceMap from mappingBlob()
without validating the embedded header fields, which can allow truncated/corrupt
blobs to be cached and later cause invalid offset walks in findMapping() /
appendVLQTo(); fix by validating the InternalSourceMap header and bounds before
assigning into ParsedSourceMap: after obtaining blob and before constructing
SourceMap.InternalSourceMap and storing into stored.internal, call the same
header/length checks used by Mapping.parse() (verify total_len against blob.len,
validate sync table/span offsets and stream offset ranges) and reject (set
this.* = .none and return null) if any inconsistency is found so only fully
self-consistent InternalSourceMap instances are cached.

---

Outside diff comments:
In `@src/bun.js/bindings/FormatStackTraceForJS.cpp`:
- Around line 488-501: The loop that remaps frames (using remappedFrames,
didRemap, sourceURLs, callSites and ZigStackFrame) updates each CallSite but
does not populate the first-frame out-parameters (line/column/sourceURL) that
computeErrorInfoWrapperToJSValue() expects; mirror the hasSet logic from
formatStackTrace(...) and, when processing the first displayed frame
(remappedFrames[0] / callSites.at(0)), set the function-level line, column and
sourceURL refs from that frame (using the same conditions used for
callsite->setLineNumber/setColumnNumber and setSourceURL) so
Error.prepareStackTrace consumers receive the correct top-frame metadata.

---

Duplicate comments:
In `@test/js/bun/sourcemap/internal-sourcemap.test.ts`:
- Around line 76-82: The expected positions array under the
extractPositions(stdout) assertion is missing the two intermediate frames
produced by the gamma() -> beta() -> alpha() call chain; update the expected
list in the test to include the additional remapped call-site positions produced
by those return alpha()/return beta() frames so it matches extractPositions()
output (look for references to extractPositions, gamma, beta, and alpha in the
test to locate and adjust the expected positions array).
- Around line 74-75: The test currently asserts expect(stderr).toBe("")
unconditionally which flakes under ASAN; change it to only assert empty stderr
when the subprocess failed by replacing that line with a conditional: if
(exitCode !== 0) { expect(stderr).toBe(""); } and place it immediately before
the existing exit-code assertion (the code that checks exitCode), updating both
occurrences that match the pattern (the expect(stderr).toBe("") at the current
spot and the duplicate at the other location).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1834369a-bba0-4d6b-a9ff-d4d39aa6bbf6

📥 Commits

Reviewing files that changed from the base of the PR and between 631a655 and bd62e3f.

📒 Files selected for processing (11)
  • src/StandaloneModuleGraph.zig
  • src/bun.js/RuntimeTranspilerCache.zig
  • src/bun.js/SavedSourceMap.zig
  • src/bun.js/VirtualMachine.zig
  • src/bun.js/bindings/FormatStackTraceForJS.cpp
  • src/sourcemap/CodeCoverage.zig
  • src/sourcemap/InternalSourceMap.zig
  • src/sourcemap/JSSourceMap.zig
  • src/sourcemap/ParsedSourceMap.zig
  • src/sourcemap/sourcemap.zig
  • test/js/bun/sourcemap/internal-sourcemap.test.ts

Comment on lines +100 to +103
/// The blob is stored in the SavedSourceMap table as a tagged pointer to its
/// first byte. This struct is a thin view over that pointer; it owns no
/// separate allocation.
data: [*]const u8,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Track the owned slice length separately from total_len.

deinit() currently frees self.data[0..self.totalLen()], but totalLen() is read from the blob itself. If a cached/internal blob is truncated or corrupted, cleanup can free the wrong span instead of the allocation that was actually loaded. Please carry the real owned length (or the owned slice) on InternalSourceMap and only use the header length for validation, not for free().

[suggested fix]

Proposed direction
- data: [*]const u8,
+ data: [*]const u8,
+ len: usize,
...
 pub fn deinit(self: InternalSourceMap) void {
-    bun.default_allocator.free(`@constCast`(self.data[0..self.totalLen()]));
+    bun.default_allocator.free(`@constCast`(self.data[0..self.len]));
 }
// When wrapping raw bytes:
bun.assert(map.totalLen() == map.len);

Also applies to: 162-167

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sourcemap/InternalSourceMap.zig` around lines 100 - 103,
InternalSourceMap currently uses the header-derived totalLen() to free memory in
deinit(), which can free the wrong span if the blob is truncated/corrupted; add
a dedicated field (e.g., ownedLen or an ownedSlice) to InternalSourceMap to
track the actual owned byte length when the struct is constructed, use that
owned length for deallocation in deinit() and only use totalLen() for validation
checks, and when wrapping raw bytes assert that totalLen() == ownedLen (or
map.totalLen() == map.len) so corrupted headers are detected rather than relied
on for free; update all places that construct InternalSourceMap (including the
other occurrences noted around the 162-167 area) to set the new owned length
field.

Comment on lines +13 to +17
/// Set when this map's mappings are backed by an InternalSourceMap blob (e.g.
/// embedded in a `bun build --compile` executable) instead of a materialized
/// `Mapping.List`. The blob's bytes are borrowed (they live in the standalone
/// module graph's section), so `deinit` does not free them.
internal: ?InternalSourceMap = null,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n -C4 '\.internal\s*=' --type zig src

Repository: oven-sh/bun

Length of output: 17153


🏁 Script executed:

cat -n src/sourcemap/ParsedSourceMap.zig | head -120

Repository: oven-sh/bun

Length of output: 5407


🏁 Script executed:

sed -n '190,210p' src/bun.js/SavedSourceMap.zig

Repository: oven-sh/bun

Length of output: 1039


🏁 Script executed:

sed -n '270,295p' src/StandaloneModuleGraph.zig

Repository: oven-sh/bun

Length of output: 1312


Both .internal initializers fail to mark the blob as borrowed.

SavedSourceMap.zig:198-205 and StandaloneModuleGraph.zig:279-283 each create a ParsedSourceMap with borrowed InternalSourceMap blobs but omit the is_standalone_module_graph = true flag. Since the field defaults to false, deinit() will attempt to free borrowed memory, causing heap corruption. Both sites correctly set input_line_count from the blob but must also set is_standalone_module_graph = true to prevent freeing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/sourcemap/ParsedSourceMap.zig` around lines 13 - 17, The ParsedSourceMap
instances created with an InternalSourceMap blob must mark the blob as borrowed
to avoid deinit() freeing it; locate the places that set
ParsedSourceMap.internal = someInternal (the two sites that also set
input_line_count from the blob) and add is_standalone_module_graph = true on
those ParsedSourceMap initializers so the deinit path treats the blob as
borrowed rather than freeing it.

Comment thread src/StandaloneModuleGraph.zig

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The review-fix commits (0d6bbb1, bd62e3f) address all the issues I previously raised: readVarint bounds/shift overflow, errdefer in putMappings, fromVLQ max-line tracking, writeVLQs internal dispatch, mappingBlob bounds validation, and the coverage cursor using findMapping. This is a substantial algorithm change touching critical paths — human sign-off seems appropriate before merge.

Extended reasoning...

Overview

This PR replaces the runtime/standalone sourcemap storage (VLQ string → Mapping.List) with InternalSourceMap: a bit-packed window codec with varint delta streams and sync points. 15 files changed, including the hot-path stack remapping loop in VirtualMachine.zig, code coverage, the --compile standalone binary format, and the transpiler cache version bump (18→19→20).

Review fixes applied

My prior comments flagged six concrete defects; all are resolved in the review commits:

  1. readVarint – shift widened to , bounds guard added
  2. putMappings – between dupe and putValue
  3. fromVLQ input_line_count – now tracks and uses it even when hint=0
  4. writeVLQs – dispatches to when
  5. mappingBlob() – header-size guard and start/length bounds checks before slicing
  6. CodeCoverage cursor – both call sites use / correctly

Security risks

No auth/crypto/permissions surface. The new blob format is read from disk (.pile cache) and from embedded executable sections, both of which are trusted-internal paths. The new bounds guards in readVarint and mappingBlob reduce exposure for disk-corrupted files.

Level of scrutiny

High: this is a new binary wire format in a hot path. The design is well-documented, tests cover sync-point crossing and long-line columns, and the author ran the full test matrix. CI is still pending.

Other factors

The batch-remap refactor in FormatStackTraceForJS.cpp (pass-1 collect, pass-2 format) is a clean improvement that avoids per-frame lock/unlock. The FindCache LRU set is a reasonable optimization for typical multi-frame stacks. The cache version bump correctly invalidates old blobs.

Jarred-Sumner and others added 2 commits April 15, 2026 21:23
…e blob

The pass-1 loop in formatStackTrace/computeErrorInfoWithPrepareStackTrace
left remappedFrame.position.byte_position at memset 0 for frames that
skip the remap block (node:vm cross-global frames, frames without
line/col). ZigStackFramePosition.isInvalid() byte-compares against
{-1,-1,-1}, so these frames passed !isInvalid() with a zeroed source_url,
hit the path.len==0 -> remapped=true Zig path, and pass 2 then read the
zeroed frame instead of the originals. Set byte_position = -1 alongside
line/col so isInvalid() fires; add the missing memset in formatStackTrace.

Also: InternalSourceMap.isValidBlob() checks the embedded blob's
self-described total_len/stream_offset/sync_count against its actual
length so a corrupted --compile section degrades to "no sourcemap".
Bump ban-limits for the three guarded-undefined defaults in
FindCacheSlot/Builder.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/sourcemap/InternalSourceMap.zig`:
- Around line 320-335: The loop that scans for the gen_line_exc terminator when
(flags & flag_has_gen_line_exceptions) != 0 can overrun because it reads
bytes[pos] without bounds checking; update the scan in InternalSourceMap.zig
(the block that sets r.gen_line_exc_pos, r.gen_line_exc_next_idx and calls
readVarint) to check pos < bytes.len on each iteration and abort/return an error
(or mark the blob invalid) if the terminator 0xFF is not found before bytes.len,
and ensure any subsequent uses of r.gen_line_exc_next_idx and r.gen_line_exc_pos
handle the early-failure path; keep readVarint usage protected by the same
bounds checks and consider reusing isValidBlob/Builder.flushWindow assumptions
only for optimistic fast paths but enforce the defensive check here.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7d04fc8c-389c-435d-a518-98c9f49523ea

📥 Commits

Reviewing files that changed from the base of the PR and between bd62e3f and dc4ad33.

📒 Files selected for processing (4)
  • src/StandaloneModuleGraph.zig
  • src/bun.js/bindings/FormatStackTraceForJS.cpp
  • src/sourcemap/InternalSourceMap.zig
  • test/internal/ban-limits.json

Comment thread src/sourcemap/InternalSourceMap.zig

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/internal/ban-limits.json`:
- Line 13: The three struct fields (FindCacheSlot.reader, FindCacheSlot.decoded,
Builder.pending) were added with explicit "= undefined" defaults which triggers
the banned pattern; remove those explicit "= undefined" assignments in
InternalSourceMap.zig and instead give the fields a proper default or
deferred-init pattern: either initialize them to a valid zero/default value
appropriate for their types, change their type to an optional and initialize to
null, or leave them unassigned but add a clear comment and ensure they are set
in the struct's initializer (constructor) before use; update any constructors or
init functions that create these structs (e.g., where FindCacheSlot and Builder
instances are created) to perform the deferred initialization so the code no
longer relies on the discouraged explicit "= undefined" default.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 177fdc8f-f6ea-4f3e-ba12-318628d8a9c9

📥 Commits

Reviewing files that changed from the base of the PR and between dc4ad33 and 371cbc2.

📒 Files selected for processing (1)
  • test/internal/ban-limits.json

".stdFile()": 16,
"// autofix": 148,
": [^=]+= undefined,$": 256,
": [^=]+= undefined,$": 259,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Find all instances of the banned pattern to identify the new ones.
# Expected: 259 total instances, with 3 likely in InternalSourceMap.zig or related files.

rg -n ': [^=]+= undefined,$' --type=zig \
  -g '!src/deps/**' \
  -g '!src/codegen/**' \
  -g '!src/unicode/uucode/**' \
  | grep -E '(InternalSourceMap|sourcemap|SourceMap)' \
  | head -20

Repository: oven-sh/bun

Length of output: 567


🏁 Script executed:

# Find the ban-words test file to understand validation
find . -name '*ban-words*' -type f 2>/dev/null | head -10

Repository: oven-sh/bun

Length of output: 88


🏁 Script executed:

# Count all instances of the banned pattern to validate against 259
rg -c ': [^=]+= undefined,$' --type=zig 2>/dev/null | \
  awk -F: '{sum += $NF} END {print "Total count: " sum}'

Repository: oven-sh/bun

Length of output: 71


🏁 Script executed:

# Get comprehensive list of all files with this pattern (first pass)
rg -l ': [^=]+= undefined,$' --type=zig 2>/dev/null | sort

Repository: oven-sh/bun

Length of output: 3405


🏁 Script executed:

cat test/internal/ban-words.test.ts

Repository: oven-sh/bun

Length of output: 8461


🏁 Script executed:

# Get the PR/commit diff to see which files were modified
git diff --name-only HEAD~1 HEAD | head -30

Repository: oven-sh/bun

Length of output: 241


🏁 Script executed:

# Search specifically in the modified Zig files for the pattern
git diff HEAD~1 HEAD -- '*.zig' | grep -B2 -A2 ': [^=]+= undefined,$' | head -50

Repository: oven-sh/bun

Length of output: 83


🏁 Script executed:

sed -n '385,392p; 648,652p' src/sourcemap/InternalSourceMap.zig | cat -n

Repository: oven-sh/bun

Length of output: 660


🏁 Script executed:

# Get context around those lines to understand if they're necessary
sed -n '380,395p; 645,655p' src/sourcemap/InternalSourceMap.zig

Repository: oven-sh/bun

Length of output: 1117


Consider refactoring the 3 new instances to avoid the banned pattern.

This change increases the limit for : [^=]+= undefined,$ from 256 to 259, properly acknowledging 3 new struct field defaults in src/sourcemap/InternalSourceMap.zig (lines 387–388, 650). The pattern is banned because explicit = undefined defaults are redundant in Zig—fields are undefined by default.

The 3 instances are:

  • FindCacheSlot.reader: WindowReader = undefined, (line 387)
  • FindCacheSlot.decoded: [sync_interval]State = undefined, (line 388)
  • Builder.pending: [sync_interval]State = undefined, (line 650)

While the limit increase is correct, consider whether these fields can be initialized more explicitly (e.g., with a default/zero value) or documented with comments explaining the need for deferred initialization, rather than relying on the discouraged explicit undefined pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/internal/ban-limits.json` at line 13, The three struct fields
(FindCacheSlot.reader, FindCacheSlot.decoded, Builder.pending) were added with
explicit "= undefined" defaults which triggers the banned pattern; remove those
explicit "= undefined" assignments in InternalSourceMap.zig and instead give the
fields a proper default or deferred-init pattern: either initialize them to a
valid zero/default value appropriate for their types, change their type to an
optional and initialize to null, or leave them unassigned but add a clear
comment and ensure they are set in the struct's initializer (constructor) before
use; update any constructors or init functions that create these structs (e.g.,
where FindCacheSlot and Builder instances are created) to perform the deferred
initialization so the code no longer relies on the discouraged explicit "=
undefined" default.

Comment thread src/sourcemap/InternalSourceMap.zig
Comment thread src/sourcemap/InternalSourceMap.zig
WindowReader.parse() read count as a raw u8 and ran the
gen-line-exception scan until 0xFF without a length guard. Well-formed
blobs never violate either, but a tampered standalone-graph section
could drive next() past FindCacheSlot.decoded[64] or scan past
stream() in ReleaseFast. Clamp count to sync_interval and add
pos < bytes.len to the scan; both are free on the parse path.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fixes — both critical bugs I flagged (unbounded gen-line-exception scan and the findWithCache OOB decode-loop write) are addressed in commit 3af7239. Given the scope of this PR (new binary sourcemap format, 16 files, critical stack-trace path), this still deserves a human architectural review before merge.

Extended reasoning...

Overview

This PR introduces a new binary sourcemap format (InternalSourceMap) across 16 files, replacing VLQ+Mapping.List storage with a bit-packed window codec (~2.4 B/mapping). Changes touch runtime stack-trace remapping, standalone binary loading, code coverage, the transpiler cache, and a new C++ two-pass stack-frame collector.

Bugs Found and Addressed

Two out-of-bounds issues I flagged in prior inline comments were fixed in commit 3af7239:

  1. WindowReader.parse() unbounded scan – fixed by guarding pos < bytes.len before accessing bytes[pos] in the gen-line-exception scan loop.
  2. findWithCache OOB decode-loop write – fixed by clamping r.count = @min(b[win_hdr.count_off], sync_interval), preventing a per-window count byte of >64 from writing past FindCacheSlot.decoded[63].

Security Risks

The deinit() path reads totalLen() directly from the blob header for the free size — a corrupted heap-allocated blob could pass a wrong size to free(). This is mitigated by a doc comment restricting callers and by isValidBlob() gating the external-data paths. No auth/crypto concerns.

Level of Scrutiny

High. This is a new binary format on the critical runtime path (stack traces, code coverage, standalone binaries). The algorithm is non-trivial (bit-packed windows, varint deltas, bsearch sync entries, 16-slot per-VM FindCache). A human reviewer familiar with Bun's sourcemap architecture should verify the overall design, not just individual bug fixes.

Other Factors

CI shows ASAN failures on broadcast-channel-worker-gc.test.ts (appears pre-existing) and widespread process.test.js failures on Windows/Linux (also appear pre-existing). The PR includes new test files (compile-sourcemap-internal.test.ts, internal-sourcemap.test.ts) and a benchmark. The ban-limits.json bump (256→259) for three = undefined uses is documented as intentional deferred-init.

Jarred-Sumner and others added 4 commits April 16, 2026 00:15
…und-trip tests

Adds InternalSourceMap.TestingAPIs with three JSC-bound functions and a
round-trip test (1825 assertions) covering the codec paths nothing else
hits: multi-source flag_has_src_idx, 5-field segments (names, dropped),
1-field segments (skipped), blank-line runs (flag_has_gen_line_exceptions
with d_gen_line > 1), plus a real bundler .map. fromVLQ -> toVLQ ->
decode matches the input exactly; find() at every mapping + 50 between-
mapping probes matches a JS reference decoder.

No bugs found in appendVLQTo or fromVLQ.
…t windows

20 functions spread ~125 padding lines apart in one file so each frame
lands in a different K=64 window; 20 > FindCache.slot_count (16) forces
eviction mid-stack. Captured twice from the same call site and compared
byte-for-byte, then assert all 20 expected line numbers appear. Tail
calls disabled via BUN_JSC_useTailCalls=0 so the chain survives.
@Jarred-Sumner Jarred-Sumner merged commit b2d3f5d into main Apr 16, 2026
57 of 63 checks passed
@Jarred-Sumner Jarred-Sumner deleted the claude/internal-sourcemap-varint branch April 16, 2026 09:19
Comment on lines 267 to 285
.none => null,
.parsed => |map| map,
.serialized => |serialized| {
var stored = switch (SourceMap.Mapping.parse(
bun.default_allocator,
serialized.mappingVLQ(),
null,
std.math.maxInt(i32),
std.math.maxInt(i32),
.{},
)) {
.success => |x| x,
.fail => {
this.* = .none;
return null;
},
const blob = serialized.mappingBlob() orelse {
this.* = .none;
return null;
};
if (!SourceMap.InternalSourceMap.isValidBlob(blob)) {
this.* = .none;
return null;
}
const ism = SourceMap.InternalSourceMap{ .data = blob.ptr };
var stored: SourceMap.ParsedSourceMap = .{
.ref_count = .init(),
.internal = ism,
.input_line_count = ism.inputLineCount(),
};

const source_files = serialized.sourceFileNames();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In ParsedSourceMap.deinit(), the external_source_names cleanup block (lines 113–117 of src/sourcemap/ParsedSourceMap.zig) lacks the is_standalone_module_graph guard that the adjacent ism.deinit() call correctly has. For standalone module graph PSMs, external_source_names elements are borrowed sub-slices of serialized.bytes (not heap-allocated), and the backing allocation is alloc(?[]u8, source_files.len * 2) while external_source_names only covers the first half — so if deinit() were called, it would both free non-heap pointers and pass a size-N slice to the allocator when the allocation was 2N, causing heap corruption. Currently latent because parsed.ref() at StandaloneModuleGraph.zig:306 keeps the refcount ≥ 2; add the same \!this.is_standalone_module_graph guard to the external_source_names block.

Extended reasoning...

What the bug is and how it manifests

In LazySourceMap.load() (StandaloneModuleGraph.zig:286), a single allocation of 2N elements is made:

const slices = bun.default_allocator.alloc(?[]u8, source_files.len * 2);

The first half becomes file_names (a slice of length N), and each element is set to src.slice(serialized.bytes) — a borrowed sub-slice pointing into the standalone binary blob, not a separate heap allocation. This is then stored:

stored.external_source_names = file_names;  // N borrowed pointers
stored.is_standalone_module_graph = true;
parsed.ref(); // never free

The specific code path that triggers it

ParsedSourceMap.deinit() (src/sourcemap/ParsedSourceMap.zig:113–117) contains:

if (this.external_source_names.len > 0) {
    for (this.external_source_names) |name|
        allocator.free(name);              // (1) frees borrowed blob pointers
    allocator.free(this.external_source_names); // (2) size-N free of a 2N allocation
}

This has no is_standalone_module_graph guard. Contrast with the ism.deinit() call immediately above it (line 109), which this PR correctly guarded:

if (this.internal) |ism| {
    if (\!this.is_standalone_module_graph) ism.deinit();  // ← guarded
}

The inconsistency — adding the guard for ISM but not for external_source_names — is the bug introduced by this PR.

Why existing code doesn't prevent it

The protection is entirely behavioral: parsed.ref() // never free at StandaloneModuleGraph.zig:306 leaves the refcount at 2, so deinit() never reaches zero in normal operation. But this is a fragile invariant: any future refcount refactor, test harness that calls deref(), or leak-finder that attempts balanced ref/deref would trigger both heap corruptions silently in ReleaseFast builds.

What the impact would be

If deinit() were called on a standalone PSM:

  1. allocator.free(name) for each name in external_source_names — each name is a slice into serialized.bytes (mmap'd or embedded binary data), not a heap allocation. Freeing a non-heap pointer is immediate heap corruption.
  2. allocator.free(this.external_source_names) passes a []const u8 of length N to the allocator, but the original allocation was alloc(?[]u8, 2*N). Passing the wrong size to free corrupts allocator metadata in tcmalloc/mimalloc.

Step-by-step proof

  1. bun build --compile --sourcemap=inline produces an executable with embedded SerializedSourceMap.
  2. At runtime, LazySourceMap.load() reaches the .serialized branch: allocates slices[0..2N], sets file_names = @ptrCast(slices[0..N]), fills each element with src.slice(serialized.bytes) (borrowed pointer into the embedded section), sets stored.external_source_names = file_names, stored.is_standalone_module_graph = true, then calls parsed.ref() bringing refcount to 2.
  3. Suppose a future deref() is added (or the ref() is removed as a cleanup): refcount reaches 0, deinit() is called.
  4. ism.deinit() is skipped (correctly, because is_standalone_module_graph == true).
  5. allocator.free(name) runs on each borrowed pointer → heap corruption.
  6. allocator.free(this.external_source_names) passes the wrong size → heap corruption.

How to fix it

Add the same guard to the external_source_names block:

if (this.external_source_names.len > 0 and \!this.is_standalone_module_graph) {
    for (this.external_source_names) |name|
        allocator.free(name);
    allocator.free(this.external_source_names);
}

This mirrors exactly the pattern already used for ism.deinit() on line 109.

Comment on lines +248 to +268
if (first < 0x80) {
pos.* = i;
return zigzagDecode(first);
}
var result: u32 = first & 0x7f;
var shift: u6 = 7;
while (true) {
if (i >= bytes.len or shift > 28) break;
const byte = bytes[i];
i += 1;
result |= @as(u32, byte & 0x7f) << @as(u5, @intCast(shift));
if (byte & 0x80 == 0) break;
shift += 7;
}
pos.* = i;
return zigzagDecode(result);
}

inline fn testBit(base: [*]const u8, idx: usize) bool {
return (base[idx >> 3] >> @as(u3, @intCast(idx & 7))) & 1 != 0;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 When the shift > 28 guard fires at line 255 of readVarint(), the loop breaks BEFORE executing const byte = bytes[i]; i += 1;, so pos.* = i points to the unread continuation byte (which always has MSB set, 0x80). The next readVarint() call starts from that byte and treats it as the start of a new multi-byte varint, silently misaligning all subsequent varint reads in that window stream. For valid data this path is unreachable (i32 fits in ≤5 bytes), but for a corrupted RuntimeTranspilerCache entry or tampered --compile binary, a single oversized varint causes wrong source line/column offsets for the remainder of the window without any error signal. The fix is to consume remaining continuation bytes (or return an error) before breaking when shift > 28 fires.

Extended reasoning...

What the bug is and how it manifests

readVarint() (src/sourcemap/InternalSourceMap.zig:244-264) has the following loop structure:

var shift: u6 = 7;
while (true) {
    if (i >= bytes.len or shift > 28) break;  // guard at TOP of loop
    const byte = bytes[i];
    i += 1;
    result |= @as(u32, byte & 0x7f) << @as(u5, @intCast(shift));
    if (byte & 0x80 == 0) break;
    shift += 7;
}
pos.* = i;

When a malformed varint has 6+ continuation bytes, the shift > 28 guard fires at the top of the loop — BEFORE bytes[i] is read and before i is incremented. The loop breaks with i still pointing at the unread continuation byte. Then pos.* = i stores that position as the new stream cursor.

Step-by-step trace with a 6-byte overlong varint [b0, b1, b2, b3, b4, b5] where all have MSB set

  • Fast-path reads b0 (first byte), sets shift=7.
  • Iteration 1: shift=7 ≤ 28 → read b1, i→b2, shift→14
  • Iteration 2: shift=14 ≤ 28 → read b2, i→b3, shift→21
  • Iteration 3: shift=21 ≤ 28 → read b3, i→b4, shift→28
  • Iteration 4: shift=28 NOT > 28 → read b4, i→b5, shift→35
  • Iteration 5: shift=35 > 28 → BREAK before reading b5, i stays at b5
  • pos.* = i → points to unread b5

The next readVarint() call starts at b5. Since b5 has MSB=1 (it is a continuation byte), the fast-path check if (first < 0x80) fails, and readVarint() treats b5 as the first byte of a multi-byte sequence. This silently decodes an entirely wrong value, and all subsequent varint reads in that window stream are now misaligned.

Why existing code does not prevent this

For valid data, a zig-zag encoded i32 fits in at most 5 LEB128 bytes (ceil(32/7) = 5). The 5th byte always terminates with MSB=0 (continuation bit clear), so shift += 7 is never evaluated with shift=28 on a well-formed stream. The guard is therefore dead code for valid inputs.

The two entry points that can supply externally-influenced blob data are: (a) RuntimeTranspilerCache loads the ISM blob from disk — the sourcemap_hash stored in metadata is never verified on load, so a disk-corrupted or modified .pile file passes validation. (b) LazySourceMap.load() for --compile standalone binaries calls isValidBlob(), which only validates the top-level blob header fields (total_len, stream_offset), not per-window stream content or varint encoding lengths.

Impact

For a corrupted RuntimeTranspilerCache entry or tampered --compile binary, a single overlong varint in any window's stream causes the remainder of that window's varint reads to produce wrong values. The result is incorrect source line/column offsets for every mapping decoded after the corrupt varint — potentially affecting every stack frame in a multi-frame trace that lands in that window. There is no error signal; the decode silently returns wrong mapping data.

How to fix it

Before breaking on shift > 28, consume remaining continuation bytes so the cursor advances past the overlong varint:

if (i >= bytes.len or shift > 28) {
    // skip remaining continuation bytes to avoid leaving pos at an
    // unread byte with MSB set, which would misalign the next readVarint
    while (i < bytes.len and bytes[i] & 0x80 != 0) i += 1;
    if (i < bytes.len) i += 1; // consume terminating byte
    break;
}

Alternatively, return an error sentinel or 0 immediately, since values requiring more than 5 bytes cannot represent a valid i32.

Comment on lines +375 to +387
d_gen_line.* = readVarint(self.bytes, &p);
if (testBit(self.base + win_hdr.orig_line_eq_mask_off, delta_idx)) d_orig_line.* = d_gen_line.*;
self.gen_line_exc_pos = p;
self.gen_line_exc_next_idx = self.bytes[p];
}
if (self.flags & flag_has_src_idx != 0 and !testBit(self.src_idx_mask, delta_idx)) {
state.source_index += readVarint(self.bytes, &self.src_idx_exc_pos);
}
}
};

/// One decoded-window prefix. See `FindCache` for the multi-slot wrapper that
/// callers actually hold.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In WindowReader.nextRare(), after readVarint(self.bytes, &p) advances p, the code unconditionally accesses self.bytes[p] (line 378) without a bounds check. For corrupted blobs — a missing 0xFF terminator in the gen_line_exc section — readVarint can legally advance p to bytes.len (by consuming the 1-byte stream_tail_pad), making self.bytes[bytes.len] a one-past-end OOB read: a clean index-out-of-bounds panic in Debug/ReleaseSafe builds and silent undefined behavior in ReleaseFast (production). The scan loop in parse() was explicitly fixed with a while (pos < bytes.len and bytes[pos] != 0xFF) guard, but the analogous read in nextRare() was not updated with the same protection. The fix is to guard the self.bytes[p] read: self.gen_line_exc_next_idx = if (p < self.bytes.len) self.bytes[p] else 0xFF;

Extended reasoning...

The bug lives in WindowReader.nextRare() (src/sourcemap/InternalSourceMap.zig:371-383). When a gen_line_exc entry is matched (self.gen_line_exc_next_idx == delta_idx), the function reads the varint-encoded d_gen_line and then updates the "next" cursor:

var p = self.gen_line_exc_pos + 1;
d_gen_line.* = readVarint(self.bytes, &p);
// ...
self.gen_line_exc_pos = p;
self.gen_line_exc_next_idx = self.bytes[p];   // ← line 378: no bounds check

For well-formed blobs, after the last gen_line_exc pair, p lands on the 0xFF terminator which sits before the 1-byte stream_tail_pad, so self.bytes[p] is always in-bounds. However, for a blob where the gen_line_exc section lacks a 0xFF terminator (corrupted RuntimeTranspilerCache .pile file, or a tampered --compile standalone binary), readVarint is free to consume the tail-pad byte (0x00, which has its MSB clear and thus terminates the varint), advancing p to bytes.len. The subsequent self.bytes[bytes.len] is then a one-past-end read.

The design comment for stream_tail_pad (line 94–98) explains: "exception cursors in WindowReader advance to one byte past their last varint, so a 1-byte tail pad keeps that read in-bounds." This protects readVarint's own unconditional fast-path byte read (bytes[i] at line 246). It does not protect the subsequent self.bytes[p] access in nextRare() — that read is one position further than what the tail pad was designed to cover.

The parse() function already handles this correctly: the scan loop that walks the gen_line_exc section to find its end is guarded with while (pos < bytes.len and bytes[pos] != 0xFF). That fix was applied to the initialization path but the same pattern in the hot-path reader nextRare() was not updated in parallel.

Step-by-step proof with corrupted data: (1) A .pile cache file or --compile binary has a gen_line_exc section whose last pair ends exactly at bytes.len - 2 (the byte just before the tail pad), with no 0xFF terminator. (2) parse() initializes gen_line_exc_pos = bytes.len - 2 and gen_line_exc_next_idx = bytes[bytes.len - 2] (the last valid idx byte). (3) During nextRare(), p = (bytes.len - 2) + 1 = bytes.len - 1 (the tail pad position). readVarint reads bytes[bytes.len - 1] = 0x00 (< 0x80, fast path), sets p = bytes.len. (4) self.bytes[bytes.len] — one past the end. In Debug/ReleaseSafe this panics; in ReleaseFast (Bun's production build mode) it reads an arbitrary byte from adjacent heap memory and stores it as gen_line_exc_next_idx, silently corrupting subsequent gen-line delta decoding.

The two concrete exploit paths: (a) RuntimeTranspilerCache load reads the ISM blob from disk; sourcemap_hash is stored in Metadata but never verified on load, so disk-corrupted .pile files pass without detection. (b) LazySourceMap.load() calls isValidBlob(), which validates only the top-level blob header — it explicitly does NOT walk per-window section structure — so a tampered --compile binary with a malformed gen_line_exc section passes validation and reaches nextRare().

The fix is a one-liner: replace self.gen_line_exc_next_idx = self.bytes[p] with self.gen_line_exc_next_idx = if (p < self.bytes.len) self.bytes[p] else 0xFF, which matches the semantics that 0xFF means "no more exceptions in this window."

structwafel pushed a commit to structwafel/bun that referenced this pull request Apr 25, 2026
…ven-sh#29358)

## What

Replace the runtime/standalone source-map storage with a private binary
format (`InternalSourceMap`) that `js_printer` writes directly and
`find(line, col)` reads in place — no whole-file decode into
`Mapping.List`, no VLQ round-trip.

Mappings are stored in K=64 **windows**. Each window is a 32-byte fixed
header (count, flags, three `u16` section lengths, three always-present
8-byte equality bitmasks) followed by varint-delta streams. The format
exploits the structure of transpiler output:

- `d_src_idx` is 0 for ~100% of runtime-transpiled mappings → omitted
unless `flag_has_src_idx`
- `d_gen_line` is 0 for ~81% → 1-bit `gen_line_mask` per mapping, varint
exceptions for the rare `>1`
- `d_orig_line == d_gen_line` for ~62% → 1-bit `orig_line_eq_mask`,
varint where they differ
- `d_orig_col == d_gen_col` for ~77% → 1-bit `orig_col_eq_mask`, varint
where they differ
- `d_gen_col` is the only lane stored for every delta (zigzag varint)

A 24-byte `SyncEntry` per window holds the absolute `(gen_line, gen_col,
byte_offset, orig_line, orig_col, src_idx)` for binary search.
`find(line, col)` bsearches `SyncEntry[]`, parses one window header,
then advances ≤63 deltas via a `WindowReader`. A 16-slot
fully-associative `FindCache` (256-byte key array + ~21 KB payload, one
per VM) keeps decoded window prefixes warm so successive frames in the
same window skip the decode entirely.

Full design rationale + blob layout is in the doc comment at the top of
`src/sourcemap/InternalSourceMap.zig`.

## Memory

| `_tsc.js` (563k mappings) | resident after first `.stack` |
|---|---|
| `Mapping.List` (main) | ~11.3 MB (20 B/mapping) |
| LEB128 stream (commit 1) | 2.92 MB (5.4 B/mapping) |
| **bit-packed windows (this)** | **1.29 MB (2.41 B/mapping)** |

`--compile` binary for `_tsc.js`: **−1.8 MB** vs LEB128, **−several MB**
vs main (no `Mapping.parse` at runtime; `LazySourceMap.load` is a view
into the embedded blob).

The format degrades gracefully: minified single-line input
(`react-dom.production.min.js`, 37.6k mappings, 132 KB source) packs to
~80 KB at ~2.2 B/mapping vs ~750 KB `Mapping.List`. Multi-source bundles
pay O(files) for `src_idx` boundary exceptions, not O(mappings).

## Perf (Apple M4 Max, release, vs `bun-1.3.12`)

| | this PR | 1.3.12 | Δ |
|---|---|---|---|
| `bench/snippets/error-capturestack.mjs` (mitata, multi-window) |
1.37–1.41 µs | 1.27–1.32 µs | **+6–8%** |
| plain `while(1) new Error().stack` loop | 657 ns | ~810 ns | **−19%**
|
| 5-frame multi-window synthetic (1500-line file) | 818 ns | 769 ns |
+6% |
| first `.stack` on a 150k-line module | ~0.1 ms | ~5 ms | **−98%** |
| RSS load→first-stack (150k-line module) | +0.06 MB | +2.3 MB | |

The mitata-case +6–8% is the per-window header parse + ≤16-way
associative scan; the plain-loop −19% is the C++ batching removing
per-frame mutex+hashmap.

## Scope

- **Runtime transpile path** (`Chunk.Builder` with `prepend_count`):
writes `InternalSourceMap` directly. The `SavedMappings` raw-VLQ variant
is deleted.
- **`--compile`**: `serializeJsonSourceMapForStandalone` runs the
bundler's VLQ through `fromVLQ()` once at build time and embeds the
blob; `LazySourceMap.load` is a view, no `Mapping.parse`.
- **CodeCoverage**: uses `Cursor.moveTo()` so the per-byte loop decodes
~0–1 segments/step instead of bsearching the whole list.
- **C++**: `FormatStackTraceForJS.cpp` `formatStackTrace` /
`computeErrorInfoWithPrepareStackTrace` now batch frames into a
`Vector<ZigStackFrame, 8>` and call `Bun__remapStackFramePositions`
once; the Zig side caches `(path_hash → ISM)` across the batch.
- **Inspector / debug dump**: re-encode to standard VLQ on demand via
`appendVLQTo()`.
- **Bundler external `.map` emission**: unchanged (still spec VLQ).
- `RuntimeTranspilerCache` version 18 → 20.

## Test plan

- [x] `bun run zig:check-all` (16 targets)
- [x] `bun bd test test/js/bun/sourcemap/internal-sourcemap.test.ts`
(incl. ~30K-column single-line minified-shape case)
- [x] `bun bd test test/js/bun/test/stack.test.ts`
- [x] `bun bd test test/cli/test/coverage.test.ts`
- [x] `bun bd test test/bundler/compile-sourcemap-internal.test.ts`
- [x] `bun bd test test/js/node/module/node-module-module.test.js`
- [x] cold/warm `RuntimeTranspilerCache` (210 KB file): identical
`:line:col`
- [x] `bun build --sourcemap=external` still emits valid VLQ
- [x] K=64 boundary, N=1, N=0 mapping files
- [x] 50-deep recursive stack (batched remap) — all frames correct
- [ ] CI

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
xhjkl pushed a commit to xhjkl/bun that referenced this pull request May 14, 2026
…ven-sh#29358)

## What

Replace the runtime/standalone source-map storage with a private binary
format (`InternalSourceMap`) that `js_printer` writes directly and
`find(line, col)` reads in place — no whole-file decode into
`Mapping.List`, no VLQ round-trip.

Mappings are stored in K=64 **windows**. Each window is a 32-byte fixed
header (count, flags, three `u16` section lengths, three always-present
8-byte equality bitmasks) followed by varint-delta streams. The format
exploits the structure of transpiler output:

- `d_src_idx` is 0 for ~100% of runtime-transpiled mappings → omitted
unless `flag_has_src_idx`
- `d_gen_line` is 0 for ~81% → 1-bit `gen_line_mask` per mapping, varint
exceptions for the rare `>1`
- `d_orig_line == d_gen_line` for ~62% → 1-bit `orig_line_eq_mask`,
varint where they differ
- `d_orig_col == d_gen_col` for ~77% → 1-bit `orig_col_eq_mask`, varint
where they differ
- `d_gen_col` is the only lane stored for every delta (zigzag varint)

A 24-byte `SyncEntry` per window holds the absolute `(gen_line, gen_col,
byte_offset, orig_line, orig_col, src_idx)` for binary search.
`find(line, col)` bsearches `SyncEntry[]`, parses one window header,
then advances ≤63 deltas via a `WindowReader`. A 16-slot
fully-associative `FindCache` (256-byte key array + ~21 KB payload, one
per VM) keeps decoded window prefixes warm so successive frames in the
same window skip the decode entirely.

Full design rationale + blob layout is in the doc comment at the top of
`src/sourcemap/InternalSourceMap.zig`.

## Memory

| `_tsc.js` (563k mappings) | resident after first `.stack` |
|---|---|
| `Mapping.List` (main) | ~11.3 MB (20 B/mapping) |
| LEB128 stream (commit 1) | 2.92 MB (5.4 B/mapping) |
| **bit-packed windows (this)** | **1.29 MB (2.41 B/mapping)** |

`--compile` binary for `_tsc.js`: **−1.8 MB** vs LEB128, **−several MB**
vs main (no `Mapping.parse` at runtime; `LazySourceMap.load` is a view
into the embedded blob).

The format degrades gracefully: minified single-line input
(`react-dom.production.min.js`, 37.6k mappings, 132 KB source) packs to
~80 KB at ~2.2 B/mapping vs ~750 KB `Mapping.List`. Multi-source bundles
pay O(files) for `src_idx` boundary exceptions, not O(mappings).

## Perf (Apple M4 Max, release, vs `bun-1.3.12`)

| | this PR | 1.3.12 | Δ |
|---|---|---|---|
| `bench/snippets/error-capturestack.mjs` (mitata, multi-window) |
1.37–1.41 µs | 1.27–1.32 µs | **+6–8%** |
| plain `while(1) new Error().stack` loop | 657 ns | ~810 ns | **−19%**
|
| 5-frame multi-window synthetic (1500-line file) | 818 ns | 769 ns |
+6% |
| first `.stack` on a 150k-line module | ~0.1 ms | ~5 ms | **−98%** |
| RSS load→first-stack (150k-line module) | +0.06 MB | +2.3 MB | |

The mitata-case +6–8% is the per-window header parse + ≤16-way
associative scan; the plain-loop −19% is the C++ batching removing
per-frame mutex+hashmap.

## Scope

- **Runtime transpile path** (`Chunk.Builder` with `prepend_count`):
writes `InternalSourceMap` directly. The `SavedMappings` raw-VLQ variant
is deleted.
- **`--compile`**: `serializeJsonSourceMapForStandalone` runs the
bundler's VLQ through `fromVLQ()` once at build time and embeds the
blob; `LazySourceMap.load` is a view, no `Mapping.parse`.
- **CodeCoverage**: uses `Cursor.moveTo()` so the per-byte loop decodes
~0–1 segments/step instead of bsearching the whole list.
- **C++**: `FormatStackTraceForJS.cpp` `formatStackTrace` /
`computeErrorInfoWithPrepareStackTrace` now batch frames into a
`Vector<ZigStackFrame, 8>` and call `Bun__remapStackFramePositions`
once; the Zig side caches `(path_hash → ISM)` across the batch.
- **Inspector / debug dump**: re-encode to standard VLQ on demand via
`appendVLQTo()`.
- **Bundler external `.map` emission**: unchanged (still spec VLQ).
- `RuntimeTranspilerCache` version 18 → 20.

## Test plan

- [x] `bun run zig:check-all` (16 targets)
- [x] `bun bd test test/js/bun/sourcemap/internal-sourcemap.test.ts`
(incl. ~30K-column single-line minified-shape case)
- [x] `bun bd test test/js/bun/test/stack.test.ts`
- [x] `bun bd test test/cli/test/coverage.test.ts`
- [x] `bun bd test test/bundler/compile-sourcemap-internal.test.ts`
- [x] `bun bd test test/js/node/module/node-module-module.test.js`
- [x] cold/warm `RuntimeTranspilerCache` (210 KB file): identical
`:line:col`
- [x] `bun build --sourcemap=external` still emits valid VLQ
- [x] K=64 boundary, N=1, N=0 mapping files
- [x] 50-deep recursive stack (batched remap) — all frames correct
- [ ] CI

---------

Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants