Skip to content

ICU: per-item zstd decompression hook for compressed libicudata#31200

Merged
Jarred-Sumner merged 20 commits into
mainfrom
claude/icu-decompress-hook
May 23, 2026
Merged

ICU: per-item zstd decompression hook for compressed libicudata#31200
Jarred-Sumner merged 20 commits into
mainfrom
claude/icu-decompress-hook

Conversation

@dylan-conway

@dylan-conway dylan-conway commented May 22, 2026

Copy link
Copy Markdown
Member

Runtime side of oven-sh/WebKit#237 (now merged as 782504c968e2). That change repacks ICU's libicudata.a with most items per-item zstd-compressed and patches udata.cpp to call bun_icu_maybe_decompress between TOC lookup and header validation. This PR provides that function, the test coverage, and bumps WEBKIT_VERSION.

What's in this PR

  • src/jsc/bindings/bun_icu_decompress.cppBun::ICUDecompressor singleton (LazyNeverDestroyed + std::call_once, WTF::Lock/HashMap). The hook checks ZSTD_MAGICNUMBER and returns immediately for raw items; zstd frames are decompressed once with the shared DDict into 16-aligned mimalloc and cached by .rodata address. Linux-only (#if OS(LINUX)).
  • test/js/web/intl/ — 31 tests / 5,902 assertions. Snapshot tests for 12 locales × every Intl.* API (captured against unmodified ICU); structural sweep over all locales × 5 trees; plus icu-locales.txt (the full locale list).
  • scripts/build/deps/webkit.tsWEBKIT_VERSION782504c968e2; fix prebuiltDestDir cache key for autobuild-* tags.

Runtime cost

Nothing is removed — same ICU code, same data, just stored compressed and decoded on first read, then cached for the process lifetime.

Before After
Binary size (linux-x64, stripped) 83.8 MB 72.2 MB −11.5 MB
First new Intl.NumberFormat("ru", {style:"unit"}) 1.00 ms 1.30 ms +0.30 ms once
First i18n library init for one non-en locale 0.59 ms 0.82 ms +0.23 ms once
First time every Intl API × every locale is used 196.5 ms 218.0 ms +21.4 ms once
Any of the above, second time onward 185.9 ms 184.0 ms −1.9 ms

Intl.*("en"), Date.toString(), URL parsing, String.normalize, regex \p{} are unchanged (kept uncompressed). All Intl outputs byte-identical — the snapshots are the unmodified-ICU reference.

No popular i18n package iterates more than the app's own locale at runtime (luxon, @formatjs/intl, i18next, dayjs, date-fns all checked); the every-locale row is the absolute upper bound, not a realistic workload.

30 fresh processes per binary, median; baseline = autobuild-0d85951a, this PR = autobuild-782504c968e2.

Memory

Total RSS is flat. Up to ~11 MB shifts from evictable .rodata to pinned heap as items are decompressed — reachable only via DisplayNames/unit/timeZoneName/currencyDisplay:"name" across many locales.

Scope

Linux glibc + musl only. Windows (build-icu.ps1, ICU 73.2), FreeBSD, Android Dockerfiles, and macOS (system libicucore) are untouched. The hook compiles everywhere but is dead code where the prebuilt's udata.cpp isn't patched.

@robobun

robobun commented May 22, 2026

Copy link
Copy Markdown
Collaborator
Updated 1:43 AM PT - May 23rd, 2026

@dylan-conway, your commit 2900814 has 1 failures in Build #57143 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 31200

That installs a local version of the PR into your bun-31200 executable, so you can run:

bun-31200 --bun

Comment thread src/jsc/bindings/bun_icu_decompress.cpp Outdated
@dylan-conway dylan-conway marked this pull request as ready for review May 22, 2026 05:32
@coderabbitai

coderabbitai Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR adds ICU zstd decompression support for Linux, a comprehensive ECMA-402 Intl test suite with locale fixtures, and updates WebKit build configuration. The decompression hook memoizes buffers keyed by input pointer and conditionally uses weak zstd dictionary symbols. The Intl test suite covers DisplayNames, NumberFormat, DateTimeFormat, Collator, Segmenter, and related APIs with snapshot testing and an exhaustive locale sweep verifying outputs across 600+ locales. WebKit configuration is updated to use preview tags with distinct cache key derivation.

Changes

ICU Decompression Hook

Layer / File(s) Summary
Decompression hook implementation and entry point
src/jsc/bindings/bun_icu_decompress.cpp
Declares weak zstd-dictionary symbols and wraps implementation in OS(LINUX) guard. Defines Bun::ICUDecompressor with lock-protected pointer-keyed cache, mimalloc-aligned output buffers, frame sizing and zstd decompression paths with optional dictionary support. Implements bun_icu_maybe_decompress entry point that checks zstd magic and updates length.

Intl Behavior Test Suite

Layer / File(s) Summary
Test setup and locale fixture
test/js/web/intl/intl.test.ts, test/js/web/intl/icu-locales.txt
Test header, Linux-only snapshot version gating, shared locale constants, imports, and 600-line ICU locale identifier fixture.
Core Intl API snapshot tests
test/js/web/intl/intl.test.ts
Snapshot tests for Intl.DisplayNames, Intl.NumberFormat (currency, symbol, name display, units), Intl.DateTimeFormat (default, timeZoneName extraction), Intl.Collator (collation, sensitivity), Intl.Segmenter (grapheme, word), plus supplemental Intl.PluralRules, Intl.ListFormat, Intl.RelativeTimeFormat, non-Intl Unicode normalization, Turkish casing, IDNA punycode, and Intl.getCanonicalLocales with deprecated BCP-47 mappings.
Exhaustive locale sweep and cache regression tests
test/js/web/intl/intl.test.ts
Loads icu-locales.txt, filters via Intl.getCanonicalLocales, derives supported locales, probes five ICU tree types (region, language, currency, unit, zone) across all locales asserting non-empty results and diversity. Verifies cache consistency by repeating Intl.DisplayNames calls return identical strings.

WebKit Build Configuration Update

Layer / File(s) Summary
WebKit version pin and cache-key derivation
scripts/build/deps/webkit.ts
Updates WEBKIT_VERSION to autobuild preview tag and modifies prebuiltDestDir cache-key derivation to strip autobuild prefixes before substring selection, ensuring preview pins and standard commit hashes generate distinct cache keys.

Suggested reviewers

  • Jarred-Sumner
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding a decompression hook for zstd-compressed ICU data items, which is the core functionality of this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description is comprehensive and well-structured, providing clear context for the WebKit runtime integration, implementation details, test coverage, performance metrics, and scope limitations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/web/intl/icu-locales.txt`:
- Line 434: Remove the stray non-locale entry "res-index" from the icu-locales
fixture file so it no longer gets filtered out by Intl.getCanonicalLocales;
alternatively, replace "res-index" with a valid BCP47 locale tag if it was
intended to be a test case, and re-generate the fixture so
test/js/web/intl/intl.test.ts will exercise actual Intl behavior instead of
discarding this entry.
- Around line 291-292: The fixture includes deprecated BCP‑47 tags (e.g.
"in"/"in-ID", "iw"/"iw-IL", "mo", "no*", "sh*", "tl*") but the test harness
(Intl.getCanonicalLocales in test/js/web/intl/intl.test.ts) doesn't assert
alias→canonical mappings; either (A) add explicit assertions in intl.test.ts
that for each deprecated tag Intl.getCanonicalLocales(tag) returns the expected
modern tag(s) (build a small map of deprecated→canonical and assert equality) or
(B) remove/trim deprecated entries from test/js/web/intl/icu-locales.txt so the
fixture only contains canonical/modern tags—pick one approach and apply
consistently across the listed deprecated symbols.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f1e23297-222c-4b5b-9798-99d7df2bd552

📥 Commits

Reviewing files that changed from the base of the PR and between 74e191b and d0e7449.

⛔ Files ignored due to path filters (1)
  • test/js/web/intl/__snapshots__/intl.test.ts.snap is excluded by !**/*.snap
📒 Files selected for processing (3)
  • src/jsc/bindings/bun_icu_decompress.cpp
  • test/js/web/intl/icu-locales.txt
  • test/js/web/intl/intl.test.ts

Comment thread test/js/web/intl/icu-locales.txt
Comment thread test/js/web/intl/icu-locales.txt Outdated
Comment thread src/jsc/bindings/bun_icu_decompress.cpp Outdated
Comment thread scripts/build/deps/webkit.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/build/deps/webkit.ts`:
- Around line 6-11: Add a build-time guard that rejects WEBKIT_VERSION values
beginning with "autobuild-preview-" unless an explicit override flag is present:
detect the exported constant WEBKIT_VERSION at startup (e.g., in the configure
or CI entrypoint) and if it matches /^autobuild-preview-/ exit non‑zero with a
clear error unless an environment variable or CLI flag like ALLOW_PREVIEW_WEBKIT
/ --allow-preview-webkit is set; ensure the error message references
WEBKIT_VERSION and instructs how to set the override for temporary testing so
CI/local builds fail fast on accidental preview pins.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 85628d3e-9043-433a-ae81-6d57b4024b61

📥 Commits

Reviewing files that changed from the base of the PR and between dd4abbb and 43f5569.

📒 Files selected for processing (1)
  • scripts/build/deps/webkit.ts

Comment thread scripts/build/deps/webkit.ts Outdated
Comment thread scripts/build/deps/webkit.ts Outdated
… preview WEBKIT_VERSION, drop res-index from locale fixture

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/jsc/bindings/bun_icu_decompress.cpp (1)

87-93: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle ZSTD_createDCtx allocation failure.

ZSTD_createDCtx() returns nullptr on memory exhaustion. With m_dctx null, subsequent calls to ZSTD_decompressDCtx or ZSTD_decompress_usingDDict will dereference null, causing UB/crash.

Add a null check in decompress() to fail gracefully.

Proposed fix
     const void* decompress(const void* p, int32_t* length)
     {
+        if (!m_dctx) [[unlikely]]
+            return p;
+
         Locker locker { m_lock };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/jsc/bindings/bun_icu_decompress.cpp` around lines 87 - 93, The
constructor may leave m_dctx null if ZSTD_createDCtx() fails; update
ICUDecompressor::decompress() to check if m_dctx is nullptr before calling
ZSTD_decompressDCtx or ZSTD_decompress_usingDDict and fail gracefully (e.g.,
return an error/false or set an error code) instead of dereferencing it;
reference the ICUDecompressor class, the m_dctx member, the decompress() method,
and the ZSTD_createDCtx / ZSTD_decompressDCtx / ZSTD_decompress_usingDDict calls
so you add an early null check and an appropriate error path (and ensure any
allocated m_ddict is still cleaned up as before).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/build/deps/webkit.ts`:
- Around line 83-91: The preview-guard uses the unnormalized value but the URL
uses a normalized tag, so inputs like "preview-pr-..." bypass the check; compute
the normalized tag first (e.g., const tag = version.startsWith("autobuild-") ?
version : `autobuild-${version}`) and then check
tag.startsWith("autobuild-preview-") against cfg.allowPreviewWebkit, throwing
the BuildError if needed, and use that same tag to build the download URL
(references: version, tag, cfg.allowPreviewWebkit, BuildError).

---

Outside diff comments:
In `@src/jsc/bindings/bun_icu_decompress.cpp`:
- Around line 87-93: The constructor may leave m_dctx null if ZSTD_createDCtx()
fails; update ICUDecompressor::decompress() to check if m_dctx is nullptr before
calling ZSTD_decompressDCtx or ZSTD_decompress_usingDDict and fail gracefully
(e.g., return an error/false or set an error code) instead of dereferencing it;
reference the ICUDecompressor class, the m_dctx member, the decompress() method,
and the ZSTD_createDCtx / ZSTD_decompressDCtx / ZSTD_decompress_usingDDict calls
so you add an early null check and an appropriate error path (and ensure any
allocated m_ddict is still cleaned up as before).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b5e251ac-2724-4221-9912-664ed762dd05

📥 Commits

Reviewing files that changed from the base of the PR and between 43f5569 and 277eca8.

📒 Files selected for processing (6)
  • scripts/build.ts
  • scripts/build/config.ts
  • scripts/build/deps/webkit.ts
  • src/jsc/bindings/bun_icu_decompress.cpp
  • test/js/web/intl/icu-locales.txt
  • test/js/web/intl/intl.test.ts
💤 Files with no reviewable changes (1)
  • test/js/web/intl/icu-locales.txt

Comment thread scripts/build/deps/webkit.ts Outdated
Comment on lines 83 to 91
if (version.startsWith("autobuild-preview-") && !cfg.allowPreviewWebkit) {
throw new BuildError(
`WEBKIT_VERSION is pinned to a PR preview tag (${version}). ` +
`Preview prebuilts are deleted when the PR closes. ` +
`Pass --allow-preview-webkit=on to build anyway, or revert to a stable hash before merging.`
);
}
const tag = version.startsWith("autobuild-") ? version : `autobuild-${version}`;
return `https://github.com/oven-sh/WebKit/releases/download/${tag}/${name}.tar.gz`;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize preview detection before enforcing the guard.

At Line 83, the guard checks raw cfg.webkitVersion, but at Line 90 the value is normalized into tag. This allows bypassing the guard with inputs like --webkit-version=preview-pr-237-..., which still become preview tags after normalization.

Suggested fix
 function prebuiltUrl(cfg: Config): string {
   const os = cfg.windows ? "windows" : cfg.darwin ? "macos" : cfg.freebsd ? "freebsd" : "linux";
   const arch = cfg.arm64 ? "arm64" : "amd64";
   const name = `bun-webkit-${os}-${arch}${prebuiltSuffix(cfg)}`;
   const version = cfg.webkitVersion;
-  if (version.startsWith("autobuild-preview-") && !cfg.allowPreviewWebkit) {
+  const tag = version.startsWith("autobuild-") ? version : `autobuild-${version}`;
+  if (tag.startsWith("autobuild-preview-") && !cfg.allowPreviewWebkit) {
     throw new BuildError(
       `WEBKIT_VERSION is pinned to a PR preview tag (${version}). ` +
         `Preview prebuilts are deleted when the PR closes. ` +
         `Pass --allow-preview-webkit=on to build anyway, or revert to a stable hash before merging.`
     );
   }
-  const tag = version.startsWith("autobuild-") ? version : `autobuild-${version}`;
   return `https://github.com/oven-sh/WebKit/releases/download/${tag}/${name}.tar.gz`;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/build/deps/webkit.ts` around lines 83 - 91, The preview-guard uses
the unnormalized value but the URL uses a normalized tag, so inputs like
"preview-pr-..." bypass the check; compute the normalized tag first (e.g., const
tag = version.startsWith("autobuild-") ? version : `autobuild-${version}`) and
then check tag.startsWith("autobuild-preview-") against cfg.allowPreviewWebkit,
throwing the BuildError if needed, and use that same tag to build the download
URL (references: version, tag, cfg.allowPreviewWebkit, BuildError).

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional findings (outside current diff — PR may have been updated during review):

  • 🟡 scripts/build/deps/webkit.ts:99-101 — The fix only partially achieves its stated goal: after stripping autobuild-, "preview-pr-237-72c07745".slice(0, 16) yields "preview-pr-237-7" — the fixed 15-char preview-pr-237- prefix leaves only one hex char of the sha in the cache key, so two preview pins of the same PR whose shas share a first hex digit (1/16 chance) still collapse to the same dir. Since this is just a cache dirname, you could drop the .slice(0, 16) entirely for autobuild--prefixed pins (or slice more chars / use the full cfg.webkitVersion).

    Extended reasoning...

    What the bug is

    Commit 277eca8 addressed the earlier review comment by stripping /^autobuild-/ before slicing in prebuiltDestDir() (scripts/build/deps/webkit.ts:99-101):

    // Strip the autobuild- prefix (matches prebuiltUrl) so two different
    // preview/tag pins don't collapse to the same "autobuild-previe" cache key.
    const version16 = cfg.webkitVersion.replace(/^autobuild-/, "").slice(0, 16);

    The comment says the goal is "so two different preview/tag pins don't collapse to the same cache key." But for the actual preview-tag format used in this PR, the fix leaves only a single hex character of entropy.

    Step-by-step proof

    Take the value currently pinned at line 11: "autobuild-preview-pr-237-72c07745".

    1. .replace(/^autobuild-/, "")"preview-pr-237-72c07745" (23 chars).
    2. .slice(0, 16)"preview-pr-237-7".
    3. The fixed prefix "preview-pr-237-" is exactly 15 characters, so only the first hex digit of the commit sha (7) contributes to version16.

    Now consider two preview builds of the same WebKit PR — say autobuild-preview-pr-237-72c07745 and a hypothetical autobuild-preview-pr-237-7a3f0012. Both reduce to version16 = "preview-pr-237-7", and prebuiltDestDir() returns the same path for both. With 16 possible first hex digits, any two same-PR preview pins have a 1/16 chance of colliding.

    The two concrete pins this PR has actually cycled through (cbfdad14c, 72c077457) happen to differ in the first digit, so the immediate collision that motivated the original comment is resolved — but only by luck, not by construction.

    Why existing code only partially mitigates it

    • The Source.identity stamp (webkit.ts:199) does carry the full cfg.webkitVersion + suffix, so within a single worktree a stale extraction would be detected and re-fetched. But per the original review comment's analysis, the multi-worktree-shared-$BUN_INSTALL/build-cache scenario can still see ninja's restat skip the fetch when the colliding destDir's outputs already exist with newer mtimes.
    • The new --allow-preview-webkit=on guard (webkit.ts:83-89) makes preview pins explicitly opt-in, so this only affects developers who deliberately enable it.
    • The normal autobuild-<40hexsha> post-merge form is unaffected: stripping autobuild- leaves a 40-hex sha, and .slice(0, 16) gives the full intended 16 hex chars.
    • Different PR numbers (preview-pr-237- vs preview-pr-238-) don't collide — only same-PR re-pins do.

    Impact

    Low. This is a residual 1/16 same-PR collision in an opt-in, dev-only flow gated behind --allow-preview-webkit=on, and the WEBKIT_VERSION value carrying it is marked DO-NOT-MERGE. But the line was just changed specifically to fix this class of collision, and its own comment overstates what it achieves, so it's worth tightening while you're here.

    How to fix

    Any of these one-liners fully satisfies the comment's stated intent:

    • Drop the slice for tag-form pins: const v = cfg.webkitVersion.replace(/^autobuild-/, ""); const version16 = v.length === 40 ? v.slice(0, 16) : v; — bare 40-hex shas keep the short form, anything else (preview tags) uses the full string.
    • Slice more characters (e.g. .slice(0, 32) covers preview-pr-NNN- + 8 hex).
    • Just use the full cfg.webkitVersion in the dirname — it's a cache path, length doesn't matter.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/build/deps/webkit.ts`:
- Around line 86-89: Update the comment above the version slicing to accurately
reflect the implemented logic: it checks cfg.webkitVersion for any "autobuild-"
prefix (not only "autobuild-preview-") and uses the full remainder of that
prefix, otherwise it takes the first 16 characters of the version string;
reference the variables v and version16 and the cfg.webkitVersion check so
reviewers know the behavior matches the code.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: afd20132-2676-4338-b941-3ec8cb0da0a0

📥 Commits

Reviewing files that changed from the base of the PR and between 277eca8 and 2900814.

📒 Files selected for processing (1)
  • scripts/build/deps/webkit.ts

Comment on lines +86 to +89
// For 40-hex shas, 16 chars is plenty. For autobuild-preview-* tags, the
// meaningful sha is at the end, so use the whole thing.
const v = cfg.webkitVersion;
const version16 = v.startsWith("autobuild-") ? v.slice("autobuild-".length) : v.slice(0, 16);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider clarifying the comment to match code behavior.

The comment mentions "autobuild-preview-* tags" specifically, but the code checks for any "autobuild-" prefix (line 89). While preview tags are the primary use case, the logic applies to any autobuild-prefixed version string.

📝 Suggested comment refinement
-  // For 40-hex shas, 16 chars is plenty. For autobuild-preview-* tags, the
-  // meaningful sha is at the end, so use the whole thing.
+  // For 40-hex shas, 16 chars is plenty. For autobuild-prefixed tags
+  // (e.g., preview tags), use the full suffix after the prefix.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// For 40-hex shas, 16 chars is plenty. For autobuild-preview-* tags, the
// meaningful sha is at the end, so use the whole thing.
const v = cfg.webkitVersion;
const version16 = v.startsWith("autobuild-") ? v.slice("autobuild-".length) : v.slice(0, 16);
// For 40-hex shas, 16 chars is plenty. For autobuild-prefixed tags
// (e.g., preview tags), use the full suffix after the prefix.
const v = cfg.webkitVersion;
const version16 = v.startsWith("autobuild-") ? v.slice("autobuild-".length) : v.slice(0, 16);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/build/deps/webkit.ts` around lines 86 - 89, Update the comment above
the version slicing to accurately reflect the implemented logic: it checks
cfg.webkitVersion for any "autobuild-" prefix (not only "autobuild-preview-")
and uses the full remainder of that prefix, otherwise it takes the first 16
characters of the version string; reference the variables v and version16 and
the cfg.webkitVersion check so reviewers know the behavior matches the code.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My earlier concerns (debug tryAlignedMalloc assert, preview WEBKIT_VERSION pin, prebuiltDestDir cache-key collision) are all addressed and no new issues turned up, but this adds new runtime C++ on the ICU data-load path tied to a coordinated WebKit bump — worth a human pass before merging.

Extended reasoning...

Overview

This PR is the Bun-side half of oven-sh/WebKit#237: it adds src/jsc/bindings/bun_icu_decompress.cpp (a LazyNeverDestroyed singleton with a WTF::Lock-guarded HashMap cache that zstd-decompresses per-item ICU display-name data into 16-aligned mimalloc), bumps WEBKIT_VERSION to the now-merged 782504c968e2… sha, adds a ~300-line Intl test suite with 20 snapshots and a 599-locale sweep fixture, and tweaks prebuiltDestDir() in scripts/build/deps/webkit.ts to handle autobuild--prefixed version strings.

Since my last inline comments the author has: rounded the alloc size with WTF::roundUpToMultipleOf<16> (fixing the debug assert), replaced the preview autobuild-preview-pr-237-* pin with the real merged sha, fixed the prebuiltDestDir cache-key truncation, gated the whole hook body under #if OS(LINUX), and added/reverted the allowPreviewWebkit guard per discussion. All prior review threads are substantively resolved.

Security risks

The hook only runs on pointers handed to it by ICU's patched udata.cpp from .rodata, gated by a ZSTD_MAGICNUMBER check, with zstd doing bounded decompression into a buffer sized from the frame header. Inputs are not user-controlled (they're baked into libicudata.a at WebKit build time), so there's no injection or untrusted-data surface. The weak-symbol pattern and OS(LINUX) gate keep non-ELF platforms out of scope. I don't see a security concern here.

Level of scrutiny

High. This is new C++ on a path that every Intl.DisplayNames / NumberFormat({style:'unit'}) / DateTimeFormat({timeZoneName}) call will traverse once the WebKit bump activates it, with process-lifetime caching, locking, and a deliberate RssFile→RssAnon memory tradeoff. It's also tightly coupled to an external repo's data layout (the repacked libicudata.a and shared zstd dict from oven-sh/WebKit#237). The test coverage is extensive and the design has already been iterated with Jarred, but the combination of runtime hot-path code + cross-repo ABI coordination + a WEBKIT_VERSION bump is well outside the "mechanical change" bar for auto-approval.

Other factors

  • Jarred reviewed an earlier revision and his structural feedback was incorporated; he hasn't signed off on the final shape.
  • robobun reported CI failures on 9b70a2ae; the subsequent 2900814c (the real WEBKIT_VERSION bump) hasn't been reflected in the CI status comment yet.
  • The one remaining unresolved CodeRabbit thread (normalize-before-guard in prebuiltUrl) is moot since the guard it referenced was reverted.
  • The prebuiltDestDir change slightly alters cache-dir naming for the normal 40-hex-sha case too (no behavioral difference, but worth a glance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants