Skip to content

Use direct property lookup instead of HashSet in FormData/CookieMap toJSON#28338

Open
alii wants to merge 4 commits into
mainfrom
claude/cookiemap-formdata-tojson-no-hashset
Open

Use direct property lookup instead of HashSet in FormData/CookieMap toJSON#28338
alii wants to merge 4 commits into
mainfrom
claude/cookiemap-formdata-tojson-no-hashset

Conversation

@alii

@alii alii commented Mar 20, 2026

Copy link
Copy Markdown
Member

Follow-up to #28314 — the HashSet is unnecessary. We can check whether a key was already written by looking it up directly in the output object's own storage:

  • For named properties, getDirect(vm, ident) is a single Structure lookup that returns an empty JSValue if the property doesn't exist and cannot throw.
  • For numeric keys, properties live in the butterfly rather than the Structure, so use getDirectIndex(globalObject, index).

Parsing the index once and branching also avoids the double-parse that putDirectMayBeIndex would do in the put path.

This also fixes a crash in FormData.toJSON() when the same numeric key is appended more than once. The duplicate-key path called getDirect, which only looks at the Structure and missed indexed properties, so it got an empty value and hit RELEASE_ASSERT_NOT_REACHED.

const fd = new FormData();
fd.append("42", "a");
fd.append("42", "b");
fd.toJSON(); // crashed with SIGSEGV

Supersedes #28319.

@robobun

robobun commented Mar 20, 2026

Copy link
Copy Markdown
Collaborator

Rebased onto latest main — PR is now MERGEABLE.

  • JSDOMFormData.cpp conflict resolved: main had landed a different numeric-key fix (kept the HashSet + RELEASE_ASSERT), this supersedes it with the HashSet-free version + graceful OOM on top of main.
  • Reworked the two toJSON loops, both test files, and the src/bun.js → src/jsc path move all carried over cleanly.
  • Addressed the open review thread: added a CookieMap test that actually hits the numeric-key skip path (duplicate keys in the cookie string itself, not via set()), verified it fails without the continue.

Built locally (debug+ASAN) — FormData/CookieMap toJSON tests green. CI building: buildkite #59220.

@coderabbitai

coderabbitai Bot commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b6db533b-5632-4863-ba6e-d2ef37471981

📥 Commits

Reviewing files that changed from the base of the PR and between 5511d3f and 81f9997.

📒 Files selected for processing (1)
  • src/bun.js/bindings/webcore/JSDOMFormData.cpp

Walkthrough

Replaces HashSet-based duplicate-key tracking in CookieMap and JSDOMFormData with direct JS object property lookups (using getDirectIndex/getDirect). Adjusted merge/write logic accordingly. Added tests for CookieMap.toJSON and FormData.toJSON covering numeric-named and duplicate-key serialization cases.

Changes

Cohort / File(s) Summary
Duplicate-key handling refactor
src/bun.js/bindings/CookieMap.cpp, src/bun.js/bindings/webcore/JSDOMFormData.cpp
Removed HashSet seen-keys usage. Both now check existing properties via getDirectIndex (numeric index) or getDirect (identifier) and conditionally insert. JSDOMFormData centralizes writes into a put lambda and bases merge behavior on the looked-up existing value.
Serialization tests added
test/js/bun/cookie/cookie-map.test.ts, test/js/web/html/FormData.test.ts
Added tests verifying CookieMap.toJSON() reflects modifications to numeric-named entries and that FormData.toJSON() serializes duplicate numeric and mixed keys into arrays.
🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description provides clear context for the changes, technical rationale, and documents a bug fix for a crash scenario, but does not follow the template structure requiring 'What does this PR do?' and 'How did you verify your code works?' sections. Restructure the description to include the required template sections: explicitly answer 'What does this PR do?' and 'How did you verify your code works?' in the specified format.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: replacing HashSet usage with direct property lookups in FormData/CookieMap toJSON methods.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

Found 5 issues this PR may fix:

  1. panic(main thread): Segmentation fault at address 0x5 #27463 - Direct SIGSEGV pattern match - panic at segmentation fault address 0x5
  2. Seg fault at address 0x5 #19650 - Matching segfault address pattern that could be FormData related
  3. crash: panic: Segmentation fault at address 0x50 #24539 - Similar SIGSEGV crash pattern that may be resolved by assertion fixes
  4. Bun.readableStreamToFormData return malformed formdata #18853 - FormData serialization issue that could be related to duplicate key handling improvements
  5. Multipart Form Upload Fails Intermittently with Bun #19097 - Intermittent FormData failures that may be caused by the crash this PR addresses

If this is helpful, consider adding Fixes number to the PR description to auto-close the issue on merge.

🤖 Generated with Claude Code

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HashSet removal and numeric-key fix look correct, but the JSC butterfly/structure split and the retained RELEASE_ASSERT(array) on OOM deserve a human eye before merging.

Extended reasoning...

Overview

This PR modifies FormData.toJSON() and CookieMap.toJSON() in two C++ files to replace a HashSet duplicate-key tracker with direct property lookups on the output JSObject. For named properties it uses getDirect(vm, ident) (structure-only, cannot throw); for numeric/indexed keys it uses getDirectIndex(globalObject, index) (reads the butterfly). It also fixes a real crash: the old duplicate-key path always used getDirect, which misses indexed properties in the butterfly, so numeric duplicate keys hit RELEASE_ASSERT_NOT_REACHED. Tests are added for both the numeric-duplicate and CookieMap numeric-override cases.

Security risks

No authentication, crypto, or permissions code is touched. The primary risk is a process-abort under OOM: RELEASE_ASSERT(array) at the array-upgrade branch is pre-existing, but is now more reachable because numeric duplicates no longer crash earlier on RELEASE_ASSERT_NOT_REACHED.

Level of scrutiny

The change relies on a correct mental model of JSC internals -- specifically that named properties live in the Structure while indexed properties live in the butterfly, and that getDirectIndex can trigger GC/exception paths (hence the RETURN_IF_EXCEPTION guards). The put lambda also silently discards putDirectIndex's bool return value, which warrants confirmation that errors are not swallowed. These are subtle points that benefit from a human reviewer familiar with JSC conventions.

Other factors

No prior reviews exist on this PR. The CI build was still in progress at review time. The logic appears correct but the combination of JSC internals, a retained RELEASE_ASSERT that could abort a server process on OOM, and the discarded putDirectIndex return value make this worth a human look.

Comment thread src/jsc/bindings/webcore/JSDOMFormData.cpp
@alii

alii commented Mar 21, 2026

Copy link
Copy Markdown
Member Author

@robobun address review comments

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the OOM concern from the previous review has been addressed, and the refactoring is correct and well-tested.

Extended reasoning...

Overview

This PR touches CookieMap.cpp and JSDOMFormData.cpp, refactoring toJSON in both to drop a HashSet<String> and instead check for duplicate keys via direct JSObject storage lookups (getDirect for named properties, getDirectIndex for numeric/indexed properties). It also fixes a crash where FormData.toJSON() with duplicate numeric keys would hit RELEASE_ASSERT_NOT_REACHED because the old code used getDirect (which only searches the Structure, not the butterfly) when looking up an already-seen indexed key. Four test files are updated with regression coverage.

Security risks

None. This is internal serialization logic with no auth, crypto, or permission surface. The change reduces process-abort risk (OOM) and eliminates a crash path.

Level of scrutiny

Moderate — the change is in JSC/C++ binding code but is self-contained and well-reasoned. The key correctness invariants (getDirect vs getDirectIndex, putDirect vs putDirectIndex) are JSC conventions that the code follows correctly. The put lambda captures index and ident by reference from the enclosing scope, which is fine since both outlive the lambda.

Other factors

My previous review flagged RELEASE_ASSERT(array) as an OOM-crash risk. That concern was addressed in commit 81f9997, replacing it with a graceful throwException + return {} consistent with the pattern in JSURLSearchParams.cpp. The inline comment is marked resolved. The bug hunting system found no issues. New tests cover the numeric-key crash scenario and mixed duplicate-key cases.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the OOM concern from my prior review has been addressed, and the direct-lookup approach is correct.

Extended reasoning...

Overview

This PR touches CookieMap.cpp and JSDOMFormData.cpp, replacing a WTF::HashSet<String> used to track seen keys with direct JSC property lookups on the output object. Two test files add coverage for the fixed behavior.

Security risks

No security-sensitive paths are involved. The change does not affect auth, crypto, or permissions — it only affects serialization of FormData/CookieMap to a plain JS object.

Level of scrutiny

The change is a self-contained performance/correctness fix in JSC binding code. The logic is well-reasoned: named properties live in the Structure, so getDirect(vm, ident) suffices; indexed properties live in the butterfly, so getDirectIndex(globalObject, index) is required. The prior crash (RELEASE_ASSERT_NOT_REACHED for numeric duplicate keys) was caused by using only getDirect and missing the indexed-property path — this PR fixes it cleanly.

Other factors

My previous inline comment flagged RELEASE_ASSERT(array) as a process-abort risk under OOM. The author addressed this in commit 81f99974 by replacing it with the graceful throwScope.throwException(…createOutOfMemoryError(…)) pattern from JSURLSearchParams.cpp, and the inline comment is now resolved. The current code in the preloaded file confirms the fix is present. New tests cover the crash scenario (numeric duplicate keys → array) and mixed key types. No further open concerns.

@alii

alii commented Apr 3, 2026

Copy link
Copy Markdown
Member Author

@robobun get it mergable

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the OOM crash I flagged was addressed, and the numeric-key duplicate handling is now correct for both FormData and CookieMap.

Extended reasoning...

Overview

This PR replaces a HashSet used for seen-key tracking in CookieMap::toJSON and FormData::getInternalProperties with direct object property lookups (getDirect/getDirectIndex). It also fixes a SIGSEGV crash when numeric keys are duplicated in FormData.toJSON() — the old code called getDirect which only looks at the Structure, missing butterfly-stored indexed properties, and then hit RELEASE_ASSERT_NOT_REACHED. Tests are added for both fixed scenarios.

Security risks

None. This is internal serialization logic with no auth or data-exposure surface.

Level of scrutiny

Medium — this touches C++ JSC internals (butterfly vs. Structure property storage), but the logic is well-understood JSC API usage and the PR description clearly explains the rationale. The bug hunting system found nothing, and the one pre-existing issue I raised (OOM hard-crash via RELEASE_ASSERT(array)) was resolved in commit 81f9997 by throwing a catchable JS exception instead.

Other factors

The inline comment I left was resolved by the author. The refactoring is a strict improvement: fewer allocations (no HashSet), correct handling of numeric keys in both read and write paths, and graceful OOM handling. Tests cover the crash case and mixed numeric/named key scenarios.

@alii

alii commented May 30, 2026

Copy link
Copy Markdown
Member Author

@robobun rebase and get this mergable

…oJSON

The HashSet used to track seen keys is unnecessary — whether a key was
already written can be checked by looking it up directly in the output
object's own storage:

- For named properties, getDirect(vm, ident) is a single Structure lookup
  that returns an empty JSValue if the property doesn't exist and cannot
  throw.
- For numeric keys, properties live in the butterfly rather than the
  Structure, so use getDirectIndex(globalObject, index).

Parsing the index once and branching also avoids the double-parse that
putDirectMayBeIndex would do in the put path.

Also replaces RELEASE_ASSERT(array) in the string/Blob-to-array upgrade
path with a catchable out-of-memory error. tryCreateUninitializedRestricted
can return nullptr under allocation failure, and this path is reachable
from JavaScript (formData.toJSON() on a FormData with a duplicate key), so
a hard process abort there would drop every in-flight request on a server
under memory pressure.
@robobun robobun force-pushed the claude/cookiemap-formdata-tojson-no-hashset branch from 1a1029b to 9e1bf34 Compare May 30, 2026 09:44
@robobun

robobun commented May 30, 2026

Copy link
Copy Markdown
Collaborator
Updated 6:22 AM PT - May 30th, 2026

@autofix-ci[bot], your commit 9a3b937 has 1 failures in Build #59239 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 28338

That installs a local version of the PR into your bun-28338 executable, so you can run:

bun-28338 --bun

Comment thread test/js/bun/cookie/cookie-map.test.ts
The existing numeric test used set(), which calls removeInternal() first,
so the original-cookie loop never saw an index already on the output
object — the getDirectIndex + continue skip path went untested. Add
duplicate numeric and named keys within the cookie string itself, which
keeps both entries in the original list and exercises the skip.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — both prior review comments addressed (graceful OOM + skip-path test coverage), and the HashSet→direct-lookup refactor is semantically equivalent.

Extended reasoning...

Overview

This PR refactors CookieMap::toJSON() and getInternalProperties() (FormData's toJSON) to drop the per-call HashSet<String> used for duplicate-key tracking, instead checking the freshly-constructed output object's own storage via getDirect (Structure lookup, cannot throw) or getDirectIndex (butterfly lookup) depending on whether the key parses as an array index. It also replaces RELEASE_ASSERT(array) after tryCreateUninitializedRestricted with the graceful OOM throw pattern already used in JSURLSearchParams.cpp, and adds regression tests for duplicate numeric/named keys in both FormData and CookieMap (including a subprocess crash-regression test that already existed on main).

Security risks

None. The output object is freshly constructed by this function, so getDirect/getDirectIndex only ever observe values this function wrote (the only other property is Symbol.toStringTag, which can't collide with a string identifier). No user-controlled getters or prototype walks are introduced. The OOM change strictly hardens a pre-existing abort into a catchable exception.

Level of scrutiny

Moderate — this is C++ JSC bindings code with GC-adjacent constructs (GCDeferralContext, ObjectInitializationScope), but the refactor is mechanical and the semantic equivalence is straightforward: seenKeys.contains(key) ⇔ "we already putDirect*'d this key on obj" ⇔ getDirect*/getDirectIndex returns non-empty. The early return {} in the OOM path is inside RAII scopes that destruct cleanly, matching the reference pattern in JSURLSearchParams.cpp:443-451 exactly.

Other factors

Both of my earlier inline comments (OOM crash on RELEASE_ASSERT, and the cookie-map test not actually exercising the getDirectIndex skip path) were addressed in 81f9997 and 6487963 respectively, and are marked resolved. The new tests cover the numeric-index branch, the named branch, the 2→array upgrade, and the array-push path. The bug-hunting system found no issues on the current revision. A maintainer has been actively shepherding this PR toward merge.

robobun and others added 2 commits May 30, 2026 10:37
The 100k-iteration loop in the name-string refcount regression test runs
~15x slower under the debug+ASAN build and blew the 5s default timeout.
Scale the budget for debug builds instead of reducing iterations, which
would weaken the bug's reproducibility.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found and both earlier threads are addressed, but this reworks JSC property-storage lookup semantics in two bindings, so I'd like a human to sign off rather than auto-approving.

Extended reasoning...

Overview

This PR removes the HashSet<String> seen-key tracking from CookieMap::toJSON() and JSDOMFormData getInternalProperties(), replacing it with direct lookups on the output object's own storage (getDirect for named properties, getDirectIndex for numeric keys). It also swaps a RELEASE_ASSERT(array) for a graceful OOM throw, adds test coverage for duplicate numeric/named keys in both toJSON() paths, and bumps the timeout on an allocation-heavy FormData stress test for the debug+ASAN lane.

Security risks

None identified. The output object is freshly constructed via constructEmptyObject, so getDirect/getDirectIndex only see properties this code wrote — no prototype-chain or user-controlled-object interaction. FormData/CookieMap key strings are user-controlled, but the change doesn't introduce new parsing or interpretation of them beyond what parseIndex already did.

Level of scrutiny

Moderate. The logic is self-contained and well-commented, but it hinges on JSC storage-layout invariants (named properties in the Structure, indexed in the butterfly) and on getDirectIndex returning an empty JSValue for unset indices on a plain object. Getting this wrong would either re-introduce the RELEASE_ASSERT_NOT_REACHED crash or silently drop/duplicate keys, so a maintainer familiar with the JSC bindings conventions should confirm the approach.

Other factors

Both of my earlier inline threads (graceful OOM on tryCreateUninitializedRestricted, and the CookieMap test that didn't actually hit the skip path) were addressed in 81f9997 and 6487963 respectively, and the new tests look correct. The latest CI run (#59237) shows build-rust / -no-pie linker failures across several lanes; these look like infrastructure noise unrelated to the C++ changes here, but the suite hasn't been observed green on the current head. Given the JSC-bindings subtlety plus non-green CI, deferring to a human is the safer call.

@robobun

robobun commented May 30, 2026

Copy link
Copy Markdown
Collaborator

@alii heads up on mergeability: the branch is rebased and CI is green (ASAN test lane passes), but I want to flag what this PR now is after the rebase.

The numeric-key toJSON crash this PR originally fixed has since landed on main independently via #31417 (FormData) — so on current main, fd.append("42","a"); fd.append("42","b"); fd.toJSON() no longer crashes. I verified empirically that this branch and main produce identical output across every edge case I could construct (duplicate numeric/named keys, 01 vs 1, >u32 indices, modified-vs-original precedence).

So what remains here is:

  1. Dropping the HashSet in both toJSON paths in favor of direct getDirect/getDirectIndex lookups — a behavior-preserving refactor.
  2. Replacing RELEASE_ASSERT(array) with a catchable out-of-memory throw (matching the JSURLSearchParams.cpp pattern). This is a genuine robustness improvement, but the path is only reachable under real allocation failure — there is no deterministic way to trigger tryCreateUninitializedRestricted returning null from JS for a 2-element array (no existing test in the tree covers any of the ~9 sites using this pattern).

Net: it is a clean, semantically-equivalent refactor + OOM hardening, not a bug fix (the bug is already fixed). It builds and all FormData/CookieMap tests pass. Since there is no JS-observable behavior change, there is no test that would fail without the src diff. Happy to leave it as a code-quality PR for your call to merge, or close it as superseded by #31417 if you prefer.

@robobun

robobun commented May 30, 2026

Copy link
Copy Markdown
Collaborator

CI note: the red buildkite/bun aggregate on build #59239 is a stale roll-up. The only hard-failed step was windows-aarch64 test-bun, which passed on retry — buildkite/bun/windows-aarch64-build-bun now reports SUCCESS, as do all 69 individual build/test lanes (including the x64 ASAN test lane). The aggregate status just did not recompute to green after the retry. Nothing in this diff (two platform-agnostic JSC binding files + tests) is connected to a Windows aarch64 failure, so this is infra flake, not a real regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants