yaml: fold merge-key property budget into alias-expansion budget#32385
yaml: fold merge-key property budget into alias-expansion budget#32385robobun wants to merge 4 commits into
Conversation
The per-stream merge_props_budget (1,048,576) added in #31417 rejected legitimate documents that merge a modest anchor into many mappings: a 380 KB file with 16,500 mappings each merging a 64-key anchor exceeds the cap and throws RangeError("Out of memory"). The alias_expansion_budget added in #31495 already bounds this case: every `*anchor` in a `<<: *anchor` is charged its full subtree (1 + 2*props nodes) before merge() runs, so the total number of properties materialized through merge keys is bounded by MAX_ALIAS_EXPANSION/2. Inline merge values (`<<: {...}`) are bounded by the input size. The separate merge budget is redundant and too tight; the Zig reference implementation has no such limit. Replace the hardening test that asserted the old cap with one that verifies the 380 KB / 64x16,500 document parses correctly. The exponential-expansion guard (alias_expansion_budget) is unchanged and its test still passes.
|
Updated 4:14 AM PT - Jun 16th, 2026
❌ @robobun, your commit 186daa6 has 2 failures in
🧪 To try this PR locally: bunx bun-pr 32385That installs a local version of the PR into your bun-32385 --bun |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughConsolidates the YAML merge-key property budget from a dedicated parser field into the existing ChangesConsolidate merge property budget to alias expansion budget
Possibly related PRs
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/js/bun/yaml/yaml.test.ts`:
- Line 4513: The test at line 4513 includes an explicit per-test timeout value
of 120_000, which violates Bun test policy. Remove the timeout parameter
(120_000) from the closing parenthesis of the test or describe block, leaving
only the closing braces and semicolon so the test relies on the suite-level
timeout policy instead.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6f8bfe43-45e3-4a73-9ed8-2010711cd972
📒 Files selected for processing (2)
src/parsers/yaml.rstest/js/bun/yaml/yaml.test.ts
|
This PR may be a duplicate of:
🤖 Generated with Claude Code |
Nested inline merge wrappers (`{<<: {<<: ... {<<: *big}}}`) resolve the
innermost `*big` once but re-materialize its properties at every
nesting level, so depth * keyCount properties are allocated while
alias_expansion_budget is only charged once. Charge each merge() call's
input slice against the same budget so nested merges are bounded the
same way direct alias references are, and surface the limit as
ParseError::ExcessiveAliasing rather than OutOfMemory.
The 64 x 16,500 reproduction consumes ~3.2M of the 16M budget and still
parses; the nested-wrapper attack is now rejected.
The bomb pre-consumes ~99.9% of MAX_ALIAS_EXPANSION via tuned multipliers. If node-counting or the constant ever shifts, the bomb alone would throw ExcessiveAliasing and the final assertion would pass for the wrong reason. Append an unresolved-alias probe that only reaches the Unresolved alias error if the parser got past the bomb without exhausting the budget (and avoids materializing ~16M JS values on the success path).
There was a problem hiding this comment.
Thanks — the unresolved-alias probe is a nicer precondition check than what I suggested. I have no remaining concerns; deferring only because this reshapes DoS-protection limits for untrusted-input parsing (1M dedicated cap → shared 16M alias budget), which is worth a human sign-off on the threshold choice.
Extended reasoning...
Overview
This PR removes the dedicated merge_props_budget (1,048,576) from the Rust YAML parser and instead charges every <<: merge-key copy against the existing alias_expansion_budget (16,777,216). The motivation is a false positive: a ~380 KB document with 16,500 mappings each merging a 64-key anchor tripped the 1M cap and surfaced as RangeError: Out of memory. The Zig reference parser has no such separate cap.
Touched: src/parsers/yaml.rs (struct field + constant removed, merge() charges upfront via checked_sub(merge_props.len()) and returns ParseError::ExcessiveAliasing instead of AllocError, four call sites updated) and test/js/bun/yaml/yaml.test.ts (replaces the old cap test with a positive 64×16,500 case plus a nested-inline-wrapper bomb test with a precondition probe).
Review history
This is my third look. Round 1 found a real DoS-protection regression (nested {<<: {<<: ... {<<: *big}}} wrappers re-materialize K properties at each of D levels with only one alias charge); fixed in 33ed661 by charging merge_props.len() per merge() call. Round 2 flagged that the bomb test's hand-tuned 18/29 multipliers sit within 0.1% of MAX_ALIAS_EXPANSION and could silently rot; fixed in 4380606 with a derivation comment and an unresolved-alias probe (bomb + "probe: *nope" throws Unresolved alias only if pad: parsed without exhausting the budget — cheaper and tighter than my .not.toThrow() suggestion). All threads resolved; the bug-hunting system found nothing on the current revision.
Security risks
This is DoS-protection code for a parser that handles untrusted input (Bun.YAML.parse). The change loosens the merge-key materialization bound from a dedicated 1M to a shared 16M, but keeps it bounded — the nested-wrapper attack I raised is now rejected via the same budget. The error surface improves (ExcessiveAliasing instead of the misleading Out of memory). I verified oom_from_alloc!(ParseError) provides From<AllocError> so the remaining get_or_put()? calls inside merge() still propagate correctly through the new Result<(), ParseError> signature. No auth/crypto/permissions surface.
Level of scrutiny
Medium-high. The implementation is small and I'm confident it's correct, but it's a deliberate threshold redesign (fold two budgets into one, raise the effective merge cap 16×) for adversarial-input handling. That's the kind of design call a maintainer should ratify even when the mechanics are verified.
Other factors
No CODEOWNERS cover these paths. Test coverage is solid: positive case proves the false-positive is fixed, negative case proves nested-merge amplification is still bounded, and the precondition probe ensures the negative case can't pass for the wrong reason. Suite reported 606 pass / 0 fail. The 120s timeout on the first new test was justified with measured debug+ASAN timings (~31s, irreducible since the workload must exceed the removed 1M cap).
|
CI status: the diff is green on every test lane across both builds;
None of these touch |
Reproduction
On current main this throws
RangeError: Out of memoryfor a ~380 KB document.Cause
Parser.merge_props_budget(MappingProps::MAX_MERGED_PROPERTIES= 1,048,576, added in #31417) is a per-stream counter decremented for every property copied by a<<:merge. 16,500 mappings each merging a 64-key anchor is 1,056,000 merged properties, which trips the budget and surfaces asAllocError→YamlParseError::OutOfMemory.The Zig reference (
src/parsers/yaml.zigMappingProps.merge) has no such limit.Fix
Remove the separate
merge_props_budgetand instead charge everymerge()call's input slice againstalias_expansion_budget(MAX_ALIAS_EXPANSION= 16M, added in #31495), returningParseError::ExcessiveAliasingwhen exhausted.For the direct case (
<<: *anchor) this is already nearly redundant: every*anchorreference is charged its full subtree viacharge_alias_expansionbeforemerge()runs. The reproduction above charges ~2.1M for the alias references plus ~1.1M for the merge copies, well under 16M.The merge charge is still needed for nested inline wrappers:
{<<: {<<: ... {<<: *big}}}resolves*bigonce but re-materializes its properties at every nesting level. Without a per-copy charge, a ~1.5 MB document could allocate tens of millions ofG::Propertystructs. With the merge charge each level draws from the same budget.Verification