Skip to content

yaml: remove raw parser back-pointers from plain-scalar string builder#31847

Merged
Jarred-Sumner merged 4 commits into
mainfrom
claude/complex-16-yaml-remove-raw-parser-back-pointers-fro
Jun 5, 2026
Merged

yaml: remove raw parser back-pointers from plain-scalar string builder#31847
Jarred-Sumner merged 4 commits into
mainfrom
claude/complex-16-yaml-remove-raw-parser-back-pointers-fro

Conversation

@Jarred-Sumner

Copy link
Copy Markdown
Collaborator

The YAML plain-scalar scanner used raw *mut Parser back-references in both StringBuilder and ScalarResolverCtx, with a parser!() macro re-deriving &mut Parser per expansion to dodge the borrow checker. Any future edit that interleaved a derived &mut Parser with a live builder borrow risked aliased-&mut UB. This change makes StringBuilder own the pending-whitespace buffer plus a shared &'i view of the input (the parser keeps a reusable whitespace_buf Vec: string_builder() moves it in, done() hands it back cleared, preserving capacity reuse), and restructures ScalarResolverCtx so methods that need parser state take &Parser/&mut Parser explicitly. All unsafe code and raw pointers in the plain-scalar path are gone; scan_plain_scalar now uses self directly. No behavior change intended. Tested with new plain-scalar whitespace-buffer coverage in test/js/bun/yaml/yaml.test.ts: internal/trailing/pre-newline whitespace handling, buffer reuse across many sibling scalars, number/keyword resolution followed by more content, special float forms and near-misses, flow context, UTF-16 input, and folded multiline scalars.

Verification

Implemented and verified on a unified integration branch: full debug build (linux-x64, ASAN), cargo check across the workspace, and the affected test files run against the debug build (failures cross-checked against main's build to exclude pre-existing issues). Each change was reviewed twice (compile/API correctness and GC/concurrency/semantics lenses) with findings repaired before landing.

@robobun

robobun commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs, but the core change is a ~400-line ownership/lifetime restructuring of the YAML plain-scalar state machine (replacing raw *mut Parser backrefs with explicit &Parser/&mut Parser threading and moving whitespace_buf ownership into the builder) — worth a human pass to confirm no subtle whitespace-handling drift, and CI is currently red on a couple of build targets.

Extended reasoning...

Overview

The PR has two layers: (1) an autofix.ci formatting sweep across ~200 files (trailing-newline removal, #[derive] consolidation, import reordering — all no-ops), and (2) the substantive change in src/parsers/yaml.rs plus new tests in test/js/bun/yaml/yaml.test.ts. The yaml.rs change removes the raw *mut Parser back-references from StringBuilder and ScalarResolverCtx, drops the parser!() re-derive macro, has StringBuilder own input: &'i [Enc::Unit] and the whitespace_buf Vec directly (moved in by Parser::string_builder(), handed back cleared by done(&mut Parser)), and rewrites scan_plain_scalar / try_resolve_number to thread self / &mut Parser explicitly. The newline handler's parser!().whitespace_buf.clear() becomes ctx.str_builder.clear_whitespace(), which targets the same buffer under the new ownership.

Security risks

None identified. This is internal parser state management with no I/O, auth, or untrusted-input surface beyond what the YAML parser already exposes; the change strictly removes unsafe blocks rather than adding any.

Level of scrutiny

Medium-high. The intent (eliminate a Stacked-Borrows hazard with no behavior change) is clearly correct and well-tested, and the mechanical parser!()self / &mut parser substitution is straightforward to verify. But scan_plain_scalar is a dense character-level state machine with many branches, and the buffer-ownership move (parser → builder, with mem::take on construction and hand-back on done()) changes when/where the whitespace buffer lives. I checked that StringBuilder is only constructed via the new string_builder() helper (sole call site at scan_plain_scalar), that other scanners (scan_literal_scalar etc.) still touch self.whitespace_buf directly and are unaffected, and that the ctx.done()? calls elsewhere belong to a different block-scalar ctx type. The one behavioral edge I noticed — on an AllocError early-return the buffer's capacity is dropped with ctx rather than retained on the parser — is a non-issue (OOM aborts the parse anyway).

Other factors

The new test suite is thorough (internal/trailing/pre-newline whitespace, buffer reuse across 256 sibling scalars, special-float near-misses, flow context, UTF-16, folded multiline). CI currently shows two build-rust failures (x64-musl, freebsd-x64) plus a linker warning on aarch64; they look infrastructural rather than related to this diff, but I can't confirm from here. Given the size of the state-machine rewrite and the red CI, a human should sign off.

Base automatically changed from claude/todo-audit-fixes to main June 5, 2026 03:30
@Jarred-Sumner Jarred-Sumner requested a review from alii as a code owner June 5, 2026 03:30
Jarred-Sumner and others added 2 commits June 5, 2026 03:31
The YAML plain-scalar scanner used raw `*mut Parser` back-references in both `StringBuilder` and `ScalarResolverCtx`, with a `parser!()` macro re-deriving `&mut Parser` per expansion to dodge the borrow checker. Any future edit that interleaved a derived `&mut Parser` with a live builder borrow risked aliased-&mut UB. This change makes `StringBuilder` own the pending-whitespace buffer plus a shared `&'i` view of the input (the parser keeps a reusable `whitespace_buf` Vec: `string_builder()` moves it in, `done()` hands it back cleared, preserving capacity reuse), and restructures `ScalarResolverCtx` so methods that need parser state take `&Parser`/`&mut Parser` explicitly. All unsafe code and raw pointers in the plain-scalar path are gone; `scan_plain_scalar` now uses `self` directly. No behavior change intended. Tested with new plain-scalar whitespace-buffer coverage in test/js/bun/yaml/yaml.test.ts: internal/trailing/pre-newline whitespace handling, buffer reuse across many sibling scalars, number/keyword resolution followed by more content, special float forms and near-misses, flow context, UTF-16 input, and folded multiline scalars.
@Jarred-Sumner Jarred-Sumner force-pushed the claude/complex-16-yaml-remove-raw-parser-back-pointers-fro branch from d026c05 to c77f746 Compare June 5, 2026 03:31
@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@Jarred-Sumner, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 13 minutes and 48 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e1be855e-e151-4b71-bc41-8eeca671c718

📥 Commits

Reviewing files that changed from the base of the PR and between 8add345 and 41c3fe0.

📒 Files selected for processing (1)
  • test/js/bun/yaml/yaml.test.ts

Walkthrough

The PR refactors YAML plain-scalar parsing to eliminate unsafe raw *mut Parser pointers from helper structs. StringBuilder now owns its whitespace buffer and input slice, returning the buffer via an explicit done(parser) call. ScalarResolverCtx threads parser references through method signatures instead of storing raw pointers. scan_plain_scalar and try_resolve_number are rewritten to use the new APIs while preserving their state-machine and numeric-lexing logic. Comprehensive tests validate whitespace handling, number/keyword resolution, and edge cases.

Changes

YAML parser safety refactoring

Layer / File(s) Summary
StringBuilder structure and buffer ownership handoff
src/parsers/yaml.rs
StringBuilder owns whitespace_buf and stores input: &'i [Enc::Unit] directly. The done(self, parser: &mut Parser) method hands the whitespace buffer back to the parser via assignment instead of relying on raw-pointer mutation.
StringBuilder append and slice methods
src/parsers/yaml.rs
Methods append_source_slice, append_expected_source_slice, append, append_slice, and append_n_times now cache and use the builder-owned input slice instead of dereferencing a raw pointer, while preserving YamlString extension logic.
ScalarResolverCtx structure and core methods
src/parsers/yaml.rs
ScalarResolverCtx removes the raw parser back-reference field. Methods done, check_append, and append_source are updated to accept &Parser or &mut Parser parameters for reading line state and finalizing the string.
ScalarResolverCtx append helper methods
src/parsers/yaml.rs
All append helpers (append_source_slice, append_unknown_source_slice, append, append_slice, append_n_times) thread parser references through signatures and forward them to check_append and StringBuilder, eliminating raw-pointer reliance.
Number resolution rewrite with parser threading
src/parsers/yaml.rs
ScalarResolverCtx::try_resolve_number now accepts &mut Parser and uses direct parser.pos/parser.next()/parser.inc() calls and parser.slice() instead of macro-based raw-pointer access. Numeric lexing loop, termination tracking, and post-lex schema validation preserve permissive behavior through threaded references.
scan_plain_scalar integration and Parser::string_builder helper
src/parsers/yaml.rs
New Parser::string_builder(&mut self) helper introduced. scan_plain_scalar is completely rewritten to construct/use new StringBuilder/ScalarResolverCtx APIs (ctx.done(self), ctx.append_source(self, ...), ctx.try_resolve_number(self, ...)). Preserves labeled-switch state-machine structure while threading self through all method calls for proper borrow tracking.
Test coverage for whitespace and number resolution
test/js/bun/yaml/yaml.test.ts
New comprehensive test block covers plain-scalar whitespace preservation, trailing whitespace dropping, multiline folding, whitespace buffer non-leakage across scalars, number/keyword becoming strings when followed by content, special floats (.nan/.inf), flow-context whitespace, UTF-16 folding, document-marker termination, interleaved mixed documents, and error case for dash after mapping key.

Possibly related PRs

  • oven-sh/bun#31569: Both PRs modify YAML plain-scalar parsing in src/parsers/yaml.rs, specifically the scan_plain_scalar and ScalarResolverCtx::try_resolve_number code paths—one refactoring safety/API design and the other adjusting number resolution logic.
🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: removing raw parser back-pointers from the plain-scalar string builder, which is the primary refactoring in src/parsers/yaml.rs.
Description check ✅ Passed The description comprehensively addresses both required template sections: 'What does this PR do?' with detailed explanation of the unsafe code removal and API restructuring, and 'How did you verify your code works?' with verification methodology including debug builds, ASAN, cargo check, and new tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any issues — the parser!()self/&mut Parser translation looks faithful and the unsafe removal is a clear win — but this is a ~400-line reshape of the plain-scalar hot path, so it's worth a human pass to confirm the mechanical changes (especially the whitespace_buf ownership move and the newline-handler's switch to ctx.str_builder.clear_whitespace()).

Extended reasoning...

Overview

This PR refactors src/parsers/yaml.rs to eliminate the raw *mut Parser back-pointers that StringBuilder and ScalarResolverCtx previously carried, along with the parser!() macro that re-derived &mut Parser on every access. StringBuilder now owns the pending-whitespace Vec (moved in from Parser::whitespace_buf via string_builder() and handed back cleared in done()) plus a shared &'i [Enc::Unit] view of the input. ScalarResolverCtx methods that need parser state now take &Parser/&mut Parser explicitly, and scan_plain_scalar uses self directly. All unsafe blocks in this path are removed. A new test block in test/js/bun/yaml/yaml.test.ts covers internal/trailing/pre-newline whitespace, buffer reuse across siblings, special-float resolution and near-misses, flow context, UTF-16, and folding.

Security risks

None introduced. The YAML parser handles untrusted input, but this change strictly removes unsafe raw-pointer dereferences in favor of borrow-checker-verified references, which is a net safety improvement. No new parsing surface, no allocation-pattern changes beyond moving a Vec between owner and parser (capacity preserved on the happy path; lost only on ? error returns, which is harmless).

Level of scrutiny

Medium-high. The diff is large (~700 lines) but almost entirely mechanical token substitution (parser!()self / parser, self.input()self.input, self.parser_mut().whitespace_bufself.whitespace_buf). I traced the non-mechanical bits — done() now clears and mem::takes the buffer back into the parser (equivalent to the old in-place clear), and the \n handler's parser!().whitespace_buf.clear() correctly became ctx.str_builder.clear_whitespace() since the buffer now lives in the builder. I confirmed StringBuilder is constructed only via the new string_builder() and consumed only via ScalarResolverCtx::done; the other ctx.done() calls in the file belong to the unrelated local LiteralScalarCtx. Still, scan_plain_scalar is the largest state machine in the parser and sits on a hot path for untrusted input, so a human should skim the mechanical translation rather than rely solely on automated review.

Other factors

The only CI failure is test/cli/install/bunx.test.ts, unrelated to YAML. No CODEOWNERS entry covers src/parsers/. The added tests are thorough and target exactly the buffer-ownership semantics this refactor touches. No prior reviews from me on this PR.

Comment thread test/js/bun/yaml/yaml.test.ts

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tightening the assertion — no further issues from my side. The refactor looks mechanically sound (and removing the raw-pointer aliasing is a clear win), but at ~500 lines through the plain-scalar state machine it's worth a human pass to confirm no evaluation-order shifts slipped in.

Extended reasoning...

Overview

This PR refactors src/parsers/yaml.rs to eliminate raw *mut Parser back-pointers from StringBuilder and ScalarResolverCtx. The whitespace buffer is now moved into the builder via string_builder() and handed back via done(&mut Parser), and ScalarResolverCtx methods take &Parser/&mut Parser explicitly instead of dereferencing a stored raw pointer. scan_plain_scalar and try_resolve_number are rewritten to use self/parser directly in place of the old parser!() macro. ~500 lines of diff in the parser plus ~120 lines of new targeted tests in test/js/bun/yaml/yaml.test.ts.

Security risks

None introduced. The change strictly removes unsafe blocks and raw-pointer dereferences, replacing SAFETY-comment invariants with borrow-checker-enforced ones. The YAML parser handles untrusted input, so any change here is sensitive, but this is a net reduction in UB surface. The one subtle behavioral edge — buffer capacity not being returned to the parser on the ?-error path — is harmless (parser is done after an error; whitespace_buf is left as a fresh empty Vec from take).

Level of scrutiny

Medium-high. The transformation is highly mechanical (every parser!()self or parser, every self.parser_mut().whitespace_bufself.whitespace_buf), and I spot-checked several call sites for evaluation-order preservation (e.g. self.pos read before self.inc(1), clear_whitespace() replacing the old direct whitespace_buf.clear() in the \n arm). All looked correct. But this is the core plain-scalar state machine — it runs on every unquoted YAML value — and the diff is large enough that a second pair of human eyes on the hot loop is warranted before merge.

Other factors

  • My earlier nit (bare .toThrow()) was addressed in 41c3fe0 — the assertion now pins SyntaxError. The inline thread is resolved.
  • The bug-hunting pass found no issues on the latest revision.
  • CI failures (bunx.test.ts, bun-install-security-provider.test.ts) are in unrelated install code, not YAML.
  • No CODEOWNERS entry covers src/parsers/yaml.rs.
  • New test coverage is thorough: internal/trailing/pre-newline whitespace, buffer reuse across 256 sibling scalars, number/keyword fallback-to-string, special-float near-misses, flow context, UTF-16 input, and document-marker termination.

@Jarred-Sumner Jarred-Sumner merged commit 55f6c89 into main Jun 5, 2026
72 of 75 checks passed
@Jarred-Sumner Jarred-Sumner deleted the claude/complex-16-yaml-remove-raw-parser-back-pointers-fro branch June 5, 2026 05:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants