Skip to content

css: hash full Ref identity in IdentOrRef::hash#30784

Open
robobun wants to merge 12 commits into
mainfrom
farm/fab3bf1c/css-ident-hash-full-bytes
Open

css: hash full Ref identity in IdentOrRef::hash#30784
robobun wants to merge 12 commits into
mainfrom
farm/fab3bf1c/css-ident-hash-full-bytes

Conversation

@robobun

@robobun robobun commented May 15, 2026

Copy link
Copy Markdown
Collaborator

Repro

IdentOrRef::hash (src/css/values/ident.rs:433) hashed only the first 2 of the 16 bytes of the packed u128 in the ref-tagged branch. The port carried the behavior verbatim from Zig (src/css/values/ident.zig:241-249slice_u8[0..2]) with a TODO flagging it as likely an upstream bug.

The binding added in this PR reproduces the bug directly:

import { cssInternals } from 'bun:internal-for-testing';
const [h1, h2] = cssInternals.identOrRefHashRefs(1, 0, 2, 0);
// Before fix (release): h1 === h2 (every ref hashes to the same 2 bytes `[0x00, 0x00]`)
// Before fix (debug):   h1 !== h_with_same_ref_and_different_debug_slice (ptrbits lane leaks in)

Cause

IdentOrRef is #[repr(transparent)] u128 with the layout:

bits  0..63  ptrbits (u63) — 0 in release; debug-ident heap pointer in debug
bit   63     ref_bit
bits 64..128 len — for a ref, exactly `Ref::to_raw_bits()`

The 2-byte hash covered only the low bits of ptrbits:

  • Release: ptrbits = 0 → every ref-tagged value hashed [0x00, 0x00], collapsing the entire ref space onto a single bucket.
  • Debug: ptrbits is a fresh heap pointer per from_ref call, so two logically-equal refs (same Ref, constructed via separate calls) hashed differently — a broken hash/eql invariant.

The actual Ref identity lives in the upper u64 (self.len_bits()), which the old hash never touched.

Fix

Hash Ref::as_u64() — the identity bits with the user-bit lane masked off. This is what Ref::eql/Ref::hash64 already compare against, so the IdentOrRef hash/eql contract now holds:

  • Equal refs (even across construction calls with different debug slices) produce identical hashes.
  • Distinct refs produce distinct hashes (wyhash over 8 bytes).

Verification

test/js/bun/css/ident-or-ref-hash.test.ts uses a new test-only binding cssInternals.identOrRefHashRefs to assert both invariants directly:

  • [0] == [1] and [2] == [3] — same ref, different debug slices hash identically.
  • [0] != [2] — distinct refs hash distinctly.
  • Plus a 16-ref distribution check — all 16 distinct refs get 16 distinct hashes.
$ bun bd test test/js/bun/css/ident-or-ref-hash.test.ts
(pass) IdentOrRef::hash distinguishes distinct refs and is invariant over debug-only ptrbits
(pass) IdentOrRef::hash over many distinct refs yields many distinct hashes
 2 pass, 0 fail, 4 expect() calls

Reverting the hash fix while keeping the binding reproduces the breakage:

(fail) IdentOrRef::hash distinguishes distinct refs and is invariant over debug-only ptrbits
  Expected: 7642627633208095000
  Received: 12918967221962394000
(fail) IdentOrRef::hash over many distinct refs yields many distinct hashes
  Expected: 16
  Received: 1

test/js/bun/css/css.test.ts (1032 pass) and test/bundler/css/css-modules.test.ts (3 pass) stay green.

Closes #30772.

Rebase notes

Rebased twice onto main. The second rebase required a real resolution: main's hardening pass made DebugIdent/debug_ident pub(crate), so the test binding in bun_css_jsc could no longer construct the from_ref argument. The construction + hashing now lives in IdentOrRef::hash_from_ref_for_testing inside bun_css (which owns those types), the binding calls it, and the bun_wyhash dependency was dropped from bun_css_jsc. Main had also annotated the old 2-byte hash with a comment pointing at this PR; that comment is superseded by the fix itself.

@coderabbitai

coderabbitai Bot commented May 15, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR fixes a hash collision bug in IdentOrRef::hash where only 2 bytes of a 16-byte value were hashed, and adds test infrastructure to verify the fix. The core change uses the full 64-bit Ref identity via to_ne_bytes(). A JSC binding and TypeScript wrapper expose the hash function to tests, which verify invariance across debug-only bits and proper distribution across distinct identities.

Changes

IdentOrRef Hash Fix and Testing

Layer / File(s) Summary
Core hash implementation fix
src/css/values/ident.rs
IdentOrRef::hash now hashes the full 64-bit Ref identity using r.as_u64().to_ne_bytes() instead of unsafely reading only 2 bytes from the packed u128 representation.
Testing infrastructure setup
src/css_jsc/Cargo.toml, src/css_jsc/css_internals.rs, src/js/internal-for-testing.ts, src/runtime/dispatch_js2native.rs
Add bun_wyhash dependency, implement JSC binding ident_or_ref_hash_refs to construct Ref values and compute their hashes, expose TypeScript wrapper identOrRefHashRefs, and export the binding via js2native dispatch for test access.
Regression test suite
test/js/bun/css/ident-or-ref-hash.test.ts
Tests verify IdentOrRef::hash is invariant across debug-only pointer bits and identical for logically equal refs, distinct for different refs, and distributes without collision across 16 distinct inner identities.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'css: hash full Ref identity in IdentOrRef::hash' directly and clearly describes the core fix: changing the hash to use the full Ref identity instead of only 2 bytes.
Linked Issues check ✅ Passed The PR fully addresses issue #30772 by replacing the hardcoded 2-byte hash with proper hashing of Ref::as_u64(), fixing hash collisions and the hash/eql contract violation.
Out of Scope Changes check ✅ Passed All changes directly support the hash fix: the core Ref hashing logic in ident.rs, test-only bindings to verify the fix, and test cases demonstrating the fix works correctly.
Description check ✅ Passed The pull request description is comprehensive and well-structured, covering the bug reproduction, root cause analysis, fix implementation, and verification steps.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@robobun

robobun commented May 15, 2026

Copy link
Copy Markdown
Collaborator Author

Comment thread test/js/bun/css/ident-or-ref-hash.test.ts Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the test-placement nit. The core fix and binding look correct to me, but given this adds a new native→JS binding and changes CSS-module hash semantics (plus CI for 685072b hasn't reported yet), I'll leave this for a human to sign off.

Extended reasoning...

Overview

This PR fixes IdentOrRef::hash in src/css/values/ident.rs to hash the full 8-byte Ref::as_u64() identity (matching Ref::eql's masked comparison) instead of the first 2 bytes of the packed u128 — which in release builds were always [0x00, 0x00], collapsing every ref-tagged value onto one hash bucket. It also adds a test-only native binding ident_or_ref_hash_refs (wired through css_jsc, dispatch_js2native, and internal-for-testing.ts) plus a regression test under test/js/bun/css/. The remaining ~10 files are autofix.ci whitespace reformatting of multi-clause #[cfg(...)]/#[cfg_attr(...)] attributes with no semantic change.

Security risks

None identified. The new binding is gated behind bun:internal-for-testing (debug/CI only), takes four numbers, and returns four int32 hash values — no user-controlled paths, no FFI surface beyond what already exists for the other cssInternals test helpers. The hash change affects only CSS-module identifier deduplication, not anything auth- or crypto-related.

Level of scrutiny

Medium. The 2-line core fix is clearly correct (verified Ref::as_u64() at src/ast/lib.rs:319 masks the user-bit lane, so hash now agrees with eql). However, the PR also introduces a new native→JS binding with arena allocation and JSC array construction, and changes hash distribution for the CSS-module ref path — both worth a human glance even though they look right.

Other factors

  • My earlier inline nit (test belonged in test/js/bun/css/ rather than test/regression/issue/) was addressed in 9aedc85.
  • robobun reported the new test failing on 6 platforms at 9aedc85; 685072b adds the int32 fold in the binding to fix that, but CI for 685072b isn't reflected in the timeline yet.
  • The bulk of the diff (crash_handler, errno, tracy, run_command, etc.) is pure rustfmt/autofix attribute reflow — zero behavior change.
  • No bugs flagged by the bug-hunting system.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the core fix is small and obviously correct (hash the 8-byte Ref identity instead of 2 garbage bytes), and the test-placement nit from my earlier review was addressed.

Extended reasoning...

Overview

The substantive change is in src/css/values/ident.rs:427-441: IdentOrRef::hash now hashes r.as_u64().to_ne_bytes() (the 8-byte Ref identity with the user-bit lane masked off) instead of an unsafe 2-byte slice of the packed u128. This aligns the hash with eql (which compares via Ref::eql, also masking the user-bit lane) and removes an unsafe block. Supporting changes: a new test-only JSC binding ident_or_ref_hash_refs in css_internals.rs (~90 lines, following the same patterns as the existing attr_test/testing_impl bindings), its TS/dispatch glue, a bun_wyhash workspace dep for css_jsc, and a 52-line test file. The remaining ~10 files are pure rustfmt reformatting of #[cfg(...)]/#[cfg_attr(...)] attributes from the autofix.ci commit (9756f92) — no semantic change, verified by inspection of each hunk.

Security risks

None. The fix replaces unsafe pointer arithmetic with safe code. The new binding is test-only (gated behind bun:internal-for-testing), takes four numbers, and returns four numbers — no user-controlled paths, no FFI surface beyond what the existing cssInternals bindings already expose.

Level of scrutiny

Low-to-moderate. The hash function feeds CSS-modules local-identifier deduplication in the bundler, but the change is a strict correctness improvement: before, every ref-tagged value collided to the same bucket in release builds (and broke the hash/eql contract in debug). I verified Ref::as_u64() at src/ast/lib.rs:319 masks USER_BITS_MASK, matching what Ref::eql compares — so the hash/eql invariant now holds. The test binding follows established patterns in the same file and is unreachable outside the test harness.

Other factors

  • My earlier nit (test belonged in test/js/bun/css/ rather than test/regression/issue/) was addressed in 9aedc85 and the thread is resolved.
  • The bug-hunting system found no issues.
  • No CODEOWNERS paths are touched.
  • Existing CSS test suites (css.test.ts, css-modules.test.ts) stay green per the PR description; the new test demonstrates both failure modes of the old hash.
  • The one CI failure (worker.test.ts core dump on Alpine x64) is unrelated to anything this PR touches.

@robobun

robobun commented May 15, 2026

Copy link
Copy Markdown
Collaborator Author

Heads up for whoever reviews: CI is green across all 65+ test-bun jobs, but the internal mechanical gate keeps rejecting due to a pre-existing bake-codegen infra issue in its container. The baked bun it uses for codegen scripts (/workspace/bun/build/release/bun) is rev 19d8ade2c, pre-fix #30679 for the JSON lexer, so bake-codegen.ts fails on the --modal-bg:... in the minified overlay CSS (-- trips "Operators are not allowed in JSON" in the old lexer). That blocks the gate's debug build, so it can never verify the fail-before/pass-after cycle automatically.

Locally (with a newer ~/.bun/bin/bun that findBun picks up first) the cycle works cleanly:

  • git checkout origin/main -- src/ && bun bd test test/js/bun/css/ident-or-ref-hash.test.ts → 2 fail (TypeError, binding missing)
  • git reset --hard HEAD && bun bd test ... → 2 pass

The fix itself is the 2-line change in src/css/values/ident.rs (hasher.update(&r.as_u64().to_ne_bytes())). Everything else is the test-only cssInternals.identOrRefHashRefs binding that makes the hash directly observable from JS.

Comment thread src/css/values/ident.rs
Comment thread src/css/values/ident.rs
Comment thread src/css/values/ident.rs Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the core fix is a small, verifiably-correct change (hashing Ref::as_u64() matches what Ref::eql compares), and the rest is a test-only binding that follows the existing cssInternals pattern.

Extended reasoning...

Overview

The functional change is ~10 lines in src/css/values/ident.rs: IdentOrRef::hash now feeds r.as_u64().to_ne_bytes() into wyhash instead of the first 2 bytes of the packed u128 (which were always [0x00, 0x00] in release, or a non-deterministic heap address in debug). I verified against src/ast/lib.rs that Ref::as_u64() returns self.0 & !USER_BITS_MASK — exactly the bits Ref::eql compares — so the hash/eql contract is now satisfied in both build configs.

The remaining ~150 lines are test scaffolding: a new ident_or_ref_hash_refs JSC binding in css_internals.rs, its re-export in dispatch_js2native.rs, a TS wrapper in internal-for-testing.ts, a regression test under test/js/bun/css/, and a bun_wyhash workspace dep for bun_css_jsc. The binding is reachable only via bun:internal-for-testing and mirrors the eight sibling cssInternals bindings already in the same file.

Security risks

None. No auth/crypto/permissions code; the new surface is a test-only internal binding gated behind bun:internal-for-testing (debug/CI only per the module header). The hash change affects CSS-modules selector dedup bucketing — a correctness/perf concern, not a security boundary.

Level of scrutiny

Low-to-moderate. The production change is tiny and its correctness is mechanically checkable against Ref::as_u64/Ref::eql. The test binding is larger but purely additive, follows established precedent in the same file, and is not user-reachable. No CODEOWNERS entries match the touched paths.

Other factors

All prior review threads (my test-placement nit and Decodetalkers' three comments) are marked resolved — the test was moved to test/js/bun/css/, the unwrap() was rewritten as if let Some(r) = self.as_ref(), and the other two comments concerned pre-existing code from #30412 rather than this diff. robobun reports CI green across all test-bun jobs (the one failing gate is a pre-existing bake-codegen infra issue unrelated to this change), and the existing css.test.ts / css-modules.test.ts suites stay green.

robobun and others added 7 commits June 6, 2026 05:35
The ref-tagged branch of IdentOrRef::hash only mixed the first 2 bytes of
the 16-byte packed u128. In release builds those bytes are always
[0x00, 0x00] (ptrbits is 0 since from_ref packs ptr=0, ref_bit=1,
len=ref.to_raw_bits()), so every ref-tagged IdentOrRef collapsed to the
same hash — severely degrading HashMap bucket distribution in the CSS
modules local-identifier dedup path. In debug builds the first 2 bytes
were the low byte of a heap pointer to the debug-ident slice, which made
logically-equal refs (same Ref, different construction call) hash
differently — a broken hash/eql invariant.

Hash Ref::as_u64() (identity bits with the user-bit lane masked off),
matching Ref::eql and Ref::hash64. This keeps equal refs hashing
identically, gives distinct refs distinct hashes, and ignores the
debug-only ptrbits lane.

Adds a bun:internal-for-testing binding identOrRefHashRefs(aInner,
aSource, bInner, bSource) that hashes four IdentOrRef::from_ref values —
two with ref a and two with ref b, each with a different debug-ident
slice — so the regression test can assert both the (equal refs ->
equal hashes) and (distinct refs -> distinct hashes) invariants without
needing a bundler-level observable.

Zig original (src/css/values/ident.zig:241-249) carried the same 2-byte
slice; comment on the new implementation flags it as the upstream bug
the port inherited.
Per CLAUDE.md, test/regression/issue/ is reserved for bugs that worked
in a previous release and then broke. #30772 was a long-standing
carry-over from the Zig original (flagged with a TODO in the port), not
a regression — so the test belongs alongside the existing CSS tests.
CI release builds on Linux x64 (debian/ubuntu/alpine, baseline + normal)
reproducibly saw all 16 returned u64-as-f64 hash values collapse to the
same JS Number, failing the 16-distinct-hashes assertion. The fix
folds the u64 wyhash output into the low 30 bits and returns it via
JSValue::js_number_from_int32, so the JS equality check compares
precise int32 values instead of potentially-lossy f64s. Local release
builds happened to dodge the collapse, but the int32 form is strictly
more robust.
Previous CI hit a flaky SIGABRT crash in WTF::CrashOnOverflow during
test/js/web/workers/worker.test.ts on a single alpine x64 shard. That
test is unrelated to this PR's CSS changes. My regression test passed
on all six shards where it ran.
Inline the ref pattern-match into the else arm instead of calling
`as_ref().unwrap()`. Semantically identical — IdentOrRef is always
either ident-tagged or ref-tagged, so the unwrap was safe — but the
direct pattern match reads cleaner and avoids the infallible unwrap.
Main made DebugIdent/debug_ident pub(crate) in the hardening pass, so
the bun_css_jsc binding can no longer construct the from_ref argument.
Move the hash-two-refs-for-testing logic into IdentOrRef::
hash_from_ref_for_testing inside bun_css (which owns the types), have
the binding call it, and drop the now-unused bun_wyhash dependency from
bun_css_jsc.
@robobun robobun force-pushed the farm/fab3bf1c/css-ident-hash-full-bytes branch from 8741fce to 6efa8c6 Compare June 6, 2026 05:42
Comment thread src/css_jsc/css_internals.rs Outdated
Comment thread src/css/values/ident.rs Outdated
Replace the bug-history paragraph in IdentOrRef::hash with a one-line
deviation marker pointing at #30772 (the failure-mode analysis lives in
the PR description), and correct the eat_u32 macro rationale: it exists
so return Err early-exits the enclosing fn, not for ? trait resolution.
Comment thread test/js/bun/css/ident-or-ref-hash.test.ts Outdated
Per CLAUDE.md, regression tests carry the issue URL; the failure-mode
analysis lives in the PR description. Keep the assertion-intent lines
(the invariants each expect enforces).
Comment thread src/css_jsc/css_internals.rs
The distinctness-preservation claim was false (swapped halves fold
identically). State the true property: per-pair collision probability
after the 30-bit fold is ~2^-30, and the deterministic inputs keep a
passing run passing.
Comment thread test/js/bun/css/ident-or-ref-hash.test.ts Outdated
The values compared are the binding's 30-bit folds, not raw 64-bit
wyhash outputs, so the relevant per-pair bound is ~2^-30.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

css/values/ident.rs: hash only uses 2 bytes of u128

2 participants