yaml: remove raw parser back-pointers from plain-scalar string builder#31847
Conversation
There was a problem hiding this comment.
I didn't find any bugs, but the core change is a ~400-line ownership/lifetime restructuring of the YAML plain-scalar state machine (replacing raw *mut Parser backrefs with explicit &Parser/&mut Parser threading and moving whitespace_buf ownership into the builder) — worth a human pass to confirm no subtle whitespace-handling drift, and CI is currently red on a couple of build targets.
Extended reasoning...
Overview
The PR has two layers: (1) an autofix.ci formatting sweep across ~200 files (trailing-newline removal, #[derive] consolidation, import reordering — all no-ops), and (2) the substantive change in src/parsers/yaml.rs plus new tests in test/js/bun/yaml/yaml.test.ts. The yaml.rs change removes the raw *mut Parser back-references from StringBuilder and ScalarResolverCtx, drops the parser!() re-derive macro, has StringBuilder own input: &'i [Enc::Unit] and the whitespace_buf Vec directly (moved in by Parser::string_builder(), handed back cleared by done(&mut Parser)), and rewrites scan_plain_scalar / try_resolve_number to thread self / &mut Parser explicitly. The newline handler's parser!().whitespace_buf.clear() becomes ctx.str_builder.clear_whitespace(), which targets the same buffer under the new ownership.
Security risks
None identified. This is internal parser state management with no I/O, auth, or untrusted-input surface beyond what the YAML parser already exposes; the change strictly removes unsafe blocks rather than adding any.
Level of scrutiny
Medium-high. The intent (eliminate a Stacked-Borrows hazard with no behavior change) is clearly correct and well-tested, and the mechanical parser!() → self / &mut parser substitution is straightforward to verify. But scan_plain_scalar is a dense character-level state machine with many branches, and the buffer-ownership move (parser → builder, with mem::take on construction and hand-back on done()) changes when/where the whitespace buffer lives. I checked that StringBuilder is only constructed via the new string_builder() helper (sole call site at scan_plain_scalar), that other scanners (scan_literal_scalar etc.) still touch self.whitespace_buf directly and are unaffected, and that the ctx.done()? calls elsewhere belong to a different block-scalar ctx type. The one behavioral edge I noticed — on an AllocError early-return the buffer's capacity is dropped with ctx rather than retained on the parser — is a non-issue (OOM aborts the parse anyway).
Other factors
The new test suite is thorough (internal/trailing/pre-newline whitespace, buffer reuse across 256 sibling scalars, special-float near-misses, flow context, UTF-16, folded multiline). CI currently shows two build-rust failures (x64-musl, freebsd-x64) plus a linker warning on aarch64; they look infrastructural rather than related to this diff, but I can't confirm from here. Given the size of the state-machine rewrite and the red CI, a human should sign off.
The YAML plain-scalar scanner used raw `*mut Parser` back-references in both `StringBuilder` and `ScalarResolverCtx`, with a `parser!()` macro re-deriving `&mut Parser` per expansion to dodge the borrow checker. Any future edit that interleaved a derived `&mut Parser` with a live builder borrow risked aliased-&mut UB. This change makes `StringBuilder` own the pending-whitespace buffer plus a shared `&'i` view of the input (the parser keeps a reusable `whitespace_buf` Vec: `string_builder()` moves it in, `done()` hands it back cleared, preserving capacity reuse), and restructures `ScalarResolverCtx` so methods that need parser state take `&Parser`/`&mut Parser` explicitly. All unsafe code and raw pointers in the plain-scalar path are gone; `scan_plain_scalar` now uses `self` directly. No behavior change intended. Tested with new plain-scalar whitespace-buffer coverage in test/js/bun/yaml/yaml.test.ts: internal/trailing/pre-newline whitespace handling, buffer reuse across many sibling scalars, number/keyword resolution followed by more content, special float forms and near-misses, flow context, UTF-16 input, and folded multiline scalars.
d026c05 to
c77f746
Compare
|
Warning Review limit reached
More reviews will be available in 13 minutes and 48 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe PR refactors YAML plain-scalar parsing to eliminate unsafe raw ChangesYAML parser safety refactoring
Possibly related PRs
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
I didn't find any issues — the parser!() → self/&mut Parser translation looks faithful and the unsafe removal is a clear win — but this is a ~400-line reshape of the plain-scalar hot path, so it's worth a human pass to confirm the mechanical changes (especially the whitespace_buf ownership move and the newline-handler's switch to ctx.str_builder.clear_whitespace()).
Extended reasoning...
Overview
This PR refactors src/parsers/yaml.rs to eliminate the raw *mut Parser back-pointers that StringBuilder and ScalarResolverCtx previously carried, along with the parser!() macro that re-derived &mut Parser on every access. StringBuilder now owns the pending-whitespace Vec (moved in from Parser::whitespace_buf via string_builder() and handed back cleared in done()) plus a shared &'i [Enc::Unit] view of the input. ScalarResolverCtx methods that need parser state now take &Parser/&mut Parser explicitly, and scan_plain_scalar uses self directly. All unsafe blocks in this path are removed. A new test block in test/js/bun/yaml/yaml.test.ts covers internal/trailing/pre-newline whitespace, buffer reuse across siblings, special-float resolution and near-misses, flow context, UTF-16, and folding.
Security risks
None introduced. The YAML parser handles untrusted input, but this change strictly removes unsafe raw-pointer dereferences in favor of borrow-checker-verified references, which is a net safety improvement. No new parsing surface, no allocation-pattern changes beyond moving a Vec between owner and parser (capacity preserved on the happy path; lost only on ? error returns, which is harmless).
Level of scrutiny
Medium-high. The diff is large (~700 lines) but almost entirely mechanical token substitution (parser!() → self / parser, self.input() → self.input, self.parser_mut().whitespace_buf → self.whitespace_buf). I traced the non-mechanical bits — done() now clears and mem::takes the buffer back into the parser (equivalent to the old in-place clear), and the \n handler's parser!().whitespace_buf.clear() correctly became ctx.str_builder.clear_whitespace() since the buffer now lives in the builder. I confirmed StringBuilder is constructed only via the new string_builder() and consumed only via ScalarResolverCtx::done; the other ctx.done() calls in the file belong to the unrelated local LiteralScalarCtx. Still, scan_plain_scalar is the largest state machine in the parser and sits on a hot path for untrusted input, so a human should skim the mechanical translation rather than rely solely on automated review.
Other factors
The only CI failure is test/cli/install/bunx.test.ts, unrelated to YAML. No CODEOWNERS entry covers src/parsers/. The added tests are thorough and target exactly the buffer-ownership semantics this refactor touches. No prior reviews from me on this PR.
There was a problem hiding this comment.
Thanks for tightening the assertion — no further issues from my side. The refactor looks mechanically sound (and removing the raw-pointer aliasing is a clear win), but at ~500 lines through the plain-scalar state machine it's worth a human pass to confirm no evaluation-order shifts slipped in.
Extended reasoning...
Overview
This PR refactors src/parsers/yaml.rs to eliminate raw *mut Parser back-pointers from StringBuilder and ScalarResolverCtx. The whitespace buffer is now moved into the builder via string_builder() and handed back via done(&mut Parser), and ScalarResolverCtx methods take &Parser/&mut Parser explicitly instead of dereferencing a stored raw pointer. scan_plain_scalar and try_resolve_number are rewritten to use self/parser directly in place of the old parser!() macro. ~500 lines of diff in the parser plus ~120 lines of new targeted tests in test/js/bun/yaml/yaml.test.ts.
Security risks
None introduced. The change strictly removes unsafe blocks and raw-pointer dereferences, replacing SAFETY-comment invariants with borrow-checker-enforced ones. The YAML parser handles untrusted input, so any change here is sensitive, but this is a net reduction in UB surface. The one subtle behavioral edge — buffer capacity not being returned to the parser on the ?-error path — is harmless (parser is done after an error; whitespace_buf is left as a fresh empty Vec from take).
Level of scrutiny
Medium-high. The transformation is highly mechanical (every parser!() → self or parser, every self.parser_mut().whitespace_buf → self.whitespace_buf), and I spot-checked several call sites for evaluation-order preservation (e.g. self.pos read before self.inc(1), clear_whitespace() replacing the old direct whitespace_buf.clear() in the \n arm). All looked correct. But this is the core plain-scalar state machine — it runs on every unquoted YAML value — and the diff is large enough that a second pair of human eyes on the hot loop is warranted before merge.
Other factors
- My earlier nit (bare
.toThrow()) was addressed in 41c3fe0 — the assertion now pinsSyntaxError. The inline thread is resolved. - The bug-hunting pass found no issues on the latest revision.
- CI failures (
bunx.test.ts,bun-install-security-provider.test.ts) are in unrelated install code, not YAML. - No CODEOWNERS entry covers
src/parsers/yaml.rs. - New test coverage is thorough: internal/trailing/pre-newline whitespace, buffer reuse across 256 sibling scalars, number/keyword fallback-to-string, special-float near-misses, flow context, UTF-16 input, and document-marker termination.
The YAML plain-scalar scanner used raw
*mut Parserback-references in bothStringBuilderandScalarResolverCtx, with aparser!()macro re-deriving&mut Parserper expansion to dodge the borrow checker. Any future edit that interleaved a derived&mut Parserwith a live builder borrow risked aliased-&mut UB. This change makesStringBuilderown the pending-whitespace buffer plus a shared&'iview of the input (the parser keeps a reusablewhitespace_bufVec:string_builder()moves it in,done()hands it back cleared, preserving capacity reuse), and restructuresScalarResolverCtxso methods that need parser state take&Parser/&mut Parserexplicitly. All unsafe code and raw pointers in the plain-scalar path are gone;scan_plain_scalarnow usesselfdirectly. No behavior change intended. Tested with new plain-scalar whitespace-buffer coverage in test/js/bun/yaml/yaml.test.ts: internal/trailing/pre-newline whitespace handling, buffer reuse across many sibling scalars, number/keyword resolution followed by more content, special float forms and near-misses, flow context, UTF-16 input, and folded multiline scalars.Verification
Implemented and verified on a unified integration branch: full debug build (linux-x64, ASAN), cargo check across the workspace, and the affected test files run against the debug build (failures cross-checked against main's build to exclude pre-existing issues). Each change was reviewed twice (compile/API correctness and GC/concurrency/semantics lenses) with findings repaired before landing.