Skip to content

yaml: fold merge-key property budget into alias-expansion budget#32385

Open
robobun wants to merge 4 commits into
mainfrom
farm/9d093b8c/yaml-merge-budget
Open

yaml: fold merge-key property budget into alias-expansion budget#32385
robobun wants to merge 4 commits into
mainfrom
farm/9d093b8c/yaml-merge-budget

Conversation

@robobun

@robobun robobun commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Reproduction

let y = "base: &base\n";
for (let i = 0; i < 64; i++) y += `  k${i}: ${i}\n`;
y += "out:\n";
for (let i = 0; i < 16500; i++) y += `  m${i}:\n    <<: *base\n`;
Bun.YAML.parse(y);

On current main this throws RangeError: Out of memory for a ~380 KB document.

Cause

Parser.merge_props_budget (MappingProps::MAX_MERGED_PROPERTIES = 1,048,576, added in #31417) is a per-stream counter decremented for every property copied by a <<: merge. 16,500 mappings each merging a 64-key anchor is 1,056,000 merged properties, which trips the budget and surfaces as AllocErrorYamlParseError::OutOfMemory.

The Zig reference (src/parsers/yaml.zig MappingProps.merge) has no such limit.

Fix

Remove the separate merge_props_budget and instead charge every merge() call's input slice against alias_expansion_budget (MAX_ALIAS_EXPANSION = 16M, added in #31495), returning ParseError::ExcessiveAliasing when exhausted.

For the direct case (<<: *anchor) this is already nearly redundant: every *anchor reference is charged its full subtree via charge_alias_expansion before merge() runs. The reproduction above charges ~2.1M for the alias references plus ~1.1M for the merge copies, well under 16M.

The merge charge is still needed for nested inline wrappers: {<<: {<<: ... {<<: *big}}} resolves *big once but re-materializes its properties at every nesting level. Without a per-copy charge, a ~1.5 MB document could allocate tens of millions of G::Property structs. With the merge charge each level draws from the same budget.

Verification

$ USE_SYSTEM_BUN=1 bun test test/js/bun/yaml/yaml.test.ts -t "merge keys across many mappings"
(fail) RangeError: Out of memory

$ USE_SYSTEM_BUN=1 bun test test/js/bun/yaml/yaml.test.ts -t "nested inline wrappers"
(fail) expected to throw /Excessive aliasing/

$ bun bd test test/js/bun/yaml/yaml.test.ts
606 pass, 23 todo, 0 fail

The per-stream merge_props_budget (1,048,576) added in #31417 rejected
legitimate documents that merge a modest anchor into many mappings: a
380 KB file with 16,500 mappings each merging a 64-key anchor exceeds
the cap and throws RangeError("Out of memory").

The alias_expansion_budget added in #31495 already bounds this case:
every `*anchor` in a `<<: *anchor` is charged its full subtree
(1 + 2*props nodes) before merge() runs, so the total number of
properties materialized through merge keys is bounded by
MAX_ALIAS_EXPANSION/2. Inline merge values (`<<: {...}`) are bounded
by the input size. The separate merge budget is redundant and too
tight; the Zig reference implementation has no such limit.

Replace the hardening test that asserted the old cap with one that
verifies the 380 KB / 64x16,500 document parses correctly. The
exponential-expansion guard (alias_expansion_budget) is unchanged and
its test still passes.
@robobun

robobun commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author
Updated 4:14 AM PT - Jun 16th, 2026

@robobun, your commit 186daa6 has 2 failures in Build #62782 (All Failures):


🧪   To try this PR locally:

bunx bun-pr 32385

That installs a local version of the PR into your bun-32385 executable, so you can run:

bun-32385 --bun

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4a97d5bf-dab4-4b6f-8aff-b28bb0d1841b

📥 Commits

Reviewing files that changed from the base of the PR and between 33ed661 and 4380606.

📒 Files selected for processing (1)
  • test/js/bun/yaml/yaml.test.ts

Walkthrough

Consolidates the YAML merge-key property budget from a dedicated parser field into the existing alias_expansion_budget. The Parser::merge_props_budget field and MappingProps::MAX_MERGED_PROPERTIES constant are removed. MappingProps::merge() now charges a single upfront budget cost and returns ParseError instead of AllocError. All four parse-function call sites are updated to pass alias_expansion_budget. Tests are updated to validate both large-merge success and nested-merge overflow detection.

Changes

Consolidate merge property budget to alias expansion budget

Layer / File(s) Summary
Parser state consolidation and initialization
src/parsers/yaml.rs
Removes merge_props_budget field from Parser struct and its initialization in Parser::init(), consolidating merge expansion budget tracking to the existing alias_expansion_budget.
MappingProps merge method refactor and constant removal
src/parsers/yaml.rs
Removes MappingProps::MAX_MERGED_PROPERTIES constant. Refactors MappingProps::merge() to perform a single upfront budget charge via checked_sub(merge_props.len()) instead of per-property decrements in the reconstruction loop. Changes the return type from AllocError to ParseError for budget exhaustion. Updates MappingProps::append_maybe_merge() signature to return ParseError instead of AllocError.
Parse function call site updates
src/parsers/yaml.rs
Updates all four merge-expansion call sites in parse_flow_sequence(), parse_flow_mapping(), and parse_block_mapping() (two merge insertion paths) to pass &mut self.alias_expansion_budget instead of the removed merge_props_budget when invoking append_maybe_merge().
Merge-key stress and bounding tests
test/js/bun/yaml/yaml.test.ts
Replaces the old cap-failure test with a stress test that constructs a 64-key anchor merged into 16,500 separate mappings, verifies successful parsing, validates per-mapping key counts, and confirms structure and value identity. Adds a second test using nested inline merge-key wrapper structures that asserts moderate nesting depth succeeds with expected property counts, then verifies a larger nested payload is rejected with an "excessive aliasing" error.

Possibly related PRs

  • oven-sh/bun#31561: Both PRs change YAML parsing around MappingProps::append_maybe_merge/merge handling—main PR refactors the merge/alias budget plumbing and error surface, while the retrieved PR alters parse_flow_mapping's timing and context for calling append_maybe_merge, so the modifications directly intersect at the same merge-call-site logic.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: removing a separate merge property budget and consolidating it into the existing alias-expansion budget.
Description check ✅ Passed The description provides detailed context including reproduction case, root cause analysis, the fix rationale, and verification results, exceeding the template requirements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/js/bun/yaml/yaml.test.ts`:
- Line 4513: The test at line 4513 includes an explicit per-test timeout value
of 120_000, which violates Bun test policy. Remove the timeout parameter
(120_000) from the closing parenthesis of the test or describe block, leaving
only the closing braces and semicolon so the test relies on the suite-level
timeout policy instead.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6f8bfe43-45e3-4a73-9ed8-2010711cd972

📥 Commits

Reviewing files that changed from the base of the PR and between 78f0fff and 5262ac4.

📒 Files selected for processing (2)
  • src/parsers/yaml.rs
  • test/js/bun/yaml/yaml.test.ts

Comment thread test/js/bun/yaml/yaml.test.ts
@github-actions

Copy link
Copy Markdown
Contributor

This PR may be a duplicate of:

  1. Input validation and bounds-checking hardening #30681 - Both PRs modify MappingProps in src/parsers/yaml.rs to address YAML << merge-key amplification; Input validation and bounds-checking hardening #30681 adds pointer-identity deduplication while this PR removes the merge_props_budget cap as redundant

🤖 Generated with Claude Code

Comment thread src/parsers/yaml.rs Outdated
Comment thread test/js/bun/yaml/yaml.test.ts
Nested inline merge wrappers (`{<<: {<<: ... {<<: *big}}}`) resolve the
innermost `*big` once but re-materialize its properties at every
nesting level, so depth * keyCount properties are allocated while
alias_expansion_budget is only charged once. Charge each merge() call's
input slice against the same budget so nested merges are bounded the
same way direct alias references are, and surface the limit as
ParseError::ExcessiveAliasing rather than OutOfMemory.

The 64 x 16,500 reproduction consumes ~3.2M of the 16M budget and still
parses; the nested-wrapper attack is now rejected.
@robobun robobun changed the title yaml: drop redundant merge-key property budget yaml: fold merge-key property budget into alias-expansion budget Jun 16, 2026
Comment thread test/js/bun/yaml/yaml.test.ts
robobun added 2 commits June 16, 2026 09:50
The bomb pre-consumes ~99.9% of MAX_ALIAS_EXPANSION via tuned
multipliers. If node-counting or the constant ever shifts, the bomb
alone would throw ExcessiveAliasing and the final assertion would pass
for the wrong reason. Append an unresolved-alias probe that only
reaches the Unresolved alias error if the parser got past the bomb
without exhausting the budget (and avoids materializing ~16M JS values
on the success path).

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — the unresolved-alias probe is a nicer precondition check than what I suggested. I have no remaining concerns; deferring only because this reshapes DoS-protection limits for untrusted-input parsing (1M dedicated cap → shared 16M alias budget), which is worth a human sign-off on the threshold choice.

Extended reasoning...

Overview

This PR removes the dedicated merge_props_budget (1,048,576) from the Rust YAML parser and instead charges every <<: merge-key copy against the existing alias_expansion_budget (16,777,216). The motivation is a false positive: a ~380 KB document with 16,500 mappings each merging a 64-key anchor tripped the 1M cap and surfaced as RangeError: Out of memory. The Zig reference parser has no such separate cap.

Touched: src/parsers/yaml.rs (struct field + constant removed, merge() charges upfront via checked_sub(merge_props.len()) and returns ParseError::ExcessiveAliasing instead of AllocError, four call sites updated) and test/js/bun/yaml/yaml.test.ts (replaces the old cap test with a positive 64×16,500 case plus a nested-inline-wrapper bomb test with a precondition probe).

Review history

This is my third look. Round 1 found a real DoS-protection regression (nested {<<: {<<: ... {<<: *big}}} wrappers re-materialize K properties at each of D levels with only one alias charge); fixed in 33ed661 by charging merge_props.len() per merge() call. Round 2 flagged that the bomb test's hand-tuned 18/29 multipliers sit within 0.1% of MAX_ALIAS_EXPANSION and could silently rot; fixed in 4380606 with a derivation comment and an unresolved-alias probe (bomb + "probe: *nope" throws Unresolved alias only if pad: parsed without exhausting the budget — cheaper and tighter than my .not.toThrow() suggestion). All threads resolved; the bug-hunting system found nothing on the current revision.

Security risks

This is DoS-protection code for a parser that handles untrusted input (Bun.YAML.parse). The change loosens the merge-key materialization bound from a dedicated 1M to a shared 16M, but keeps it bounded — the nested-wrapper attack I raised is now rejected via the same budget. The error surface improves (ExcessiveAliasing instead of the misleading Out of memory). I verified oom_from_alloc!(ParseError) provides From<AllocError> so the remaining get_or_put()? calls inside merge() still propagate correctly through the new Result<(), ParseError> signature. No auth/crypto/permissions surface.

Level of scrutiny

Medium-high. The implementation is small and I'm confident it's correct, but it's a deliberate threshold redesign (fold two budgets into one, raise the effective merge cap 16×) for adversarial-input handling. That's the kind of design call a maintainer should ratify even when the mechanics are verified.

Other factors

No CODEOWNERS cover these paths. Test coverage is solid: positive case proves the false-positive is fixed, negative case proves nested-merge amplification is still bounded, and the precondition probe ensures the negative case can't pass for the wrong reason. Suite reported 606 pass / 0 fail. The 120s timeout on the first new test was justified with measured debug+ASAN timings (~31s, irreducible since the workload must exceed the removed 1M cap).

@robobun

robobun commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator Author

CI status: the diff is green on every test lane across both builds; yaml.test.ts passes everywhere. The remaining failures are infrastructure/environment issues unrelated to this change:

  • Build #62780: 285/285 test jobs passed. Only hard failure is linux-aarch64-android - verify-baseline: rustup timed out connecting to static.rust-lang.org (tcp connect error: Connection timed out).

  • Build #62782 (retrigger): 284 test jobs passed. Two hard failures, both on darwin 14 aarch64:

    • test/js/web/websocket/autobahn.test.ts: the autobahn Docker image is linux/amd64 and fails with exec format error on the arm64 host.
    • test/js/web/fetch/fetch-leak.test.ts: should not leak using readable stream RSS threshold off by 0.42 MB (expected ≥56, got 55.578).

    The remaining annotations are retry-passed flakes in bun-install (Windows EBADF), node-http-connect (Windows), serve-body-leak (ASAN), and rm.test.ts (ASAN timeout).

None of these touch src/parsers/yaml.rs or test/js/bun/yaml/. Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant