yaml: reject tab in s-indent position before structural tokens (DK95/06, Y79Y/005, Y79Y/008 — 402/402)#31527
Conversation
…tructural token [62]/[63] s-indent(n) is spaces only. A tab between s-indent and the next token is s-separate-in-line — valid before [197] flow-in-block content ([80]/[69] s-flow-line-prefix(n) ::= s-indent(n) s-separate-in-line?), never before a [184]/[192]/[195] structural sibling (`-`/`?`/`:`/key), which sit immediately at s-indent(n). Scanner: `Parser.tab_after_indent: bool` records "tab seen between line-start (or post-indicator additional_parent_indent) and the current token". Set in scan() tab arm + fold_lines() tab arm; reset on newline(). Parser: checked at every structural-sibling recognition site: - parse_block_sequence loop (each `-` at s-indent(n)) - parse_block_mapping loop (each entry at s-indent(n)) - explicit `:` indent check (first + subsequent) - parse_node MappingKey/MappingValue arms (first block entry) - parse_node Scalar arm (scalar → implicit key) - parse_block_indented same-line compact construct Content paths (flow context, s-separate after indicator, plain-scalar fold continuation) correctly do not check it. Activates DK95/06, Y79Y/005, Y79Y/008 — yaml-test-suite is now 402/402.
|
Updated 11:05 PM PT - May 28th, 2026
❌ @dylan-conway, your commit 8e4f8ee has some failures in 🧪 To try this PR locally: bunx bun-pr 31527That installs a local version of the PR into your bun-31527 --bun |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds a parser-level tab-after-indent taint, detects tabs in indentation-sensitive scanner positions, centralizes token initialization, propagates the taint through block sequence/mapping/implicit-key flows and folded-scalar scans, and updates/activates tests to assert the new TabIndentation failures. ChangesYAML tab-position indentation validation
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/parsers/yaml.rs`:
- Around line 3608-3612: The code clears self.tab_after_indent before calling
self.scan(), which loses the leading-tab taint for the token and allows invalid
tab-indented sibling tokens to bypass s-indent checks; fix by preserving the
taint across the re-scan — remove or defer clearing self.tab_after_indent until
after self.scan(ScanOptions::default()) completes (i.e., do not reset
self.tab_after_indent before calling scan()), so that scan() and subsequent
s-indent validation see the original tab state of self.token.
In `@test/js/bun/yaml/yaml.test.ts`:
- Around line 1342-1412: Several nearby tests (the rejection cases covering
"rejects tab before sibling block-seq `-`", "rejects tab before sibling
block-map entry", and "rejects tab before explicit `:` continuation" etc.)
repeat the same pattern; refactor them into a single parameterized test using
test.each that iterates over an array of invalid YAML strings and asserts
expect(() => YAML.parse(input)).toThrow("Unexpected token") for each; keep the
other acceptance tests as-is and ensure the test name for the test.each clearly
indicates these are tab-in-s-indent rejection cases so they remain discoverable.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: dd03a9c0-1f5d-4d1a-a4d3-bdcd6d3ce288
📒 Files selected for processing (3)
src/parsers/yaml.rstest/js/bun/yaml/yaml-test-suite.test.tstest/js/bun/yaml/yaml.test.ts
…x FH7J/3GZX/C4HZ tests
parse_node MappingValue arm built `first_key = E::Null` directly,
dropping any tag/anchor recorded in node_props. `!!str : x` gave
`{null:x}` instead of `{"":x}`. The end-of-function resolve_null only
applies to the returned node (the mapping), not the key.
Now `first_key = node_props.tag().resolve_null(loc)` and the anchor is
registered on the e-node key before parse_block_mapping. Also flips the
`anchor as implicit-key e-node` test.todo.
Test-suite audit fixes (vs upstream yaml-test-suite@data 6ad3d2c):
- FH7J: item[2] expected was `{null:null}`; events say key is
`!!str` empty → `""`, so `{"":null}`. The test was locking in the
wrong parser output.
- 3GZX/C4HZ: shared-reference `.toBe()` assertions used non-existent
keys (asserted `undefined === undefined`, vacuous). Fixed to use the
actual key paths.
|
Actionable comments posted: 0 |
…teraction matrix The rewind+rescan re-resolves the abandoned scalar's value (tag-neutral), not its leading whitespace; the original scan's tab_after_indent is the source of truth and the re-scan (in_indent_position=false, positioned past the tab) cannot re-detect it. Adds a 24-case matrix (3 indicators × 4 property prefixes × 2 tab positions) covering every property-loop × tab-taint interaction.
|
Actionable comments posted: 0 |
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
src/parsers/yaml.rs:4239-4252— The block-scalar body scanner (scan_auto_indented_literal_scalar, shared by|and>) is the third leading-whitespace consumer and is not instrumented, so a tab before a sibling that immediately follows block-scalar content is still accepted — e.g.YAML.parse("- |\n x\n\t- y\n")returns["x\n","y"]instead of throwing, and likewise fora: |\n x\n\tb: yand? |\n x\n\t: y. The phase-2 loop'sother =>arm (and/or the post-space arm) needs to setself.tab_after_indent = truewhen it sees a tab on the terminating line, mirroring what was done forfold_lines().Extended reasoning...
What the bug is
This PR instruments two leading-whitespace consumers to set
tab_after_indent: thescan()tab arm (gated onin_indent_position) andfold_lines()(lines 4242-4244 and 4251). But there is a third consumer that owns the line after a block scalar's content:scan_auto_indented_literal_scalar(yaml.rs:4951-5061), which is the body scanner shared byscan_literal_scalar(|) andscan_folded_scalar(>). It has its own newline / leading-space handling and never callsfold_lines(), so the taint is never recorded for the line that terminates the block scalar.Step-by-step trace of
"- |\n x\n\t- y\n"parse_block_sequence→parse_block_indented→scan(additional_parent_indent=Some(1))hits0x7C→scan_literal_scalar→scan_auto_indented_literal_scalar.- Phase 2 appends
x, then hits the outer0x0Aarm at line 4964:self.newline()(line 4966) resetstab_after_indent = falseandline_indent = NONE. - The inner "newlines" loop reads
nc = 0x09. The tab checks at lines 4980/4989 only fire after inner newlines (i.e. for the 2nd+ consecutive blank line), so a single\nbefore\tfalls through to theother =>arm at lines 5008-5012, which just sets__c = 0x09and breaks — no taint recorded. - The outer catch-all arm at line 5047 sees
line_indent (0/NONE) < min_indent (2)and returnsctx.done(). Scanner state on return:posis at the tab,line_indent = 0,tab_after_indent = false. - Back in
parse_node'sScalararm, line 4071 capturesscalar_tab_after_indent = false, then line 4089 callsself.scan(ScanOptions { tag, outside_context: true, ..Default::default() }). Defaults givefirst_scan = false,additional_parent_indent = None, so at lines 5531-5536count_indentation = falseandin_indent_position = false. scan()reaches the0x09arm at line 5826:count_indentationis false (BlockIn error skipped) andin_indent_positionis false, so the new taint assignment at line 5842 is skipped. The tab is consumed; the next char is-→Token::sequence_entrywithindent = self.line_indent = 0.parse_block_sequenceloop iteration 2 (line 2864): token isSequenceEntry,token.indent (0) == sequence_indent (0), and the new check at line 2868 readsself.tab_after_indent→ false. The sibling is wrongly accepted; result is["x\n", "y"].
The exact same gap applies to:
"a: |\n x\n\tb: y\n"— accepted as{a:"x\n", b:"y"}becauseparse_block_mapping's check at line 3062 seestab_after_indent = false."? |\n x\n\t: y\n"— accepted as{"x\n":"y"}because the explicit-:check at line 2998 seestab_after_indent = false.- All of the above with
>in place of|(same body scanner).
Why existing code doesn't prevent it
fold_lines()is instrumented (lines 4242-4252) but is only used by plain/quoted scalars. Block scalars bypass it entirely.scan()'s tab arm only taints whenin_indent_positionis true, which requiresfirst_scanoradditional_parent_indentor having crossed a newline inside thatscan()call. The post-scalar scan at line 4089 starts mid-line (block-scalar scanner already consumed the newline + spaces), so none of those hold.- The block-scalar scanner's own tab checks at lines 4980/4989 only cover consecutive newlines (empty lines inside the scalar), not the first newline after content.
Compare with the plain-scalar case the PR does cover:
YAML.parse("- a\n\t- b\n")correctly throws becausescan_plain_scalarcallsfold_lines()for its terminating line, whose tab arm (line 4251) sets the flag, which then survives to line 2868.Impact
The PR description states this change "covers every grammar position where s-indent(n) precedes a structural token," but post-block-scalar siblings are exactly such a [184]/[191]/[195] position and are missed. Reference parsers (PyYAML, ruamel, libyaml) reject all of these inputs.
Fix
Instrument
scan_auto_indented_literal_scalar's phase-2 line handling the same wayfold_lines()was: in the inner loop, when the post-space character (line 5003) or the immediate post-newline character (other =>arm, line 5008) is0x09, setself.tab_after_indent = truebefore breaking. That flag will then survive the return (it's only reset bynewline()or by enteringscan()in indent position) and be read by the structural-sibling checks at lines 2868 / 2998 / 3062 / 3090.
"Tab characters cannot be used as indentation" — emitted from every
tab_after_indent check site (parser + scanner). Matches what eemeli/yaml
("Tabs are not allowed as indentation"), js-yaml ("tab characters must
not be used in indentation"), and PyYAML/ruamel ("found character \t
that cannot start any token") do.
The compact-construct check in parse_block_indented is split: only the
tab_after_indent path emits TabIndentation; the indent<=n path (e.g.
`a: ? x` — no tab, no api) keeps UnexpectedToken.
… into claude/yaml-tab-indent # Conflicts: # test/js/bun/yaml/yaml.test.ts
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/parsers/yaml.rs (1)
5840-5845:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPreserve the tab error location before returning from
scan().
ParseResult::fail()resolvesParseError::TabIndentationfromparser.token.start, but this branch returns before a new token is installed. On this path the diagnostic points at the previous token instead of the offending tab.🔧 Minimal fix
if count_indentation && additional_parent_indent.is_none() && self.context.get() == Context::BlockIn { + self.token.start = self.pos; return Err(ParseError::TabIndentation); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/parsers/yaml.rs` around lines 5840 - 5845, The branch in scan() returns Err(ParseError::TabIndentation) before a new token is installed, so ParseResult::fail() uses the previous token.location; fix by capturing the offending location (e.g., set parser.token.start or create a temporary token with the current scanner position) before returning and use that when returning the TabIndentation error so diagnostics point at the tab; update the scan() branch that checks count_indentation && additional_parent_indent.is_none() && self.context.get() == Context::BlockIn to record the current position and propagate the ParseError::TabIndentation with that position.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/parsers/yaml.rs`:
- Around line 5840-5845: The branch in scan() returns
Err(ParseError::TabIndentation) before a new token is installed, so
ParseResult::fail() uses the previous token.location; fix by capturing the
offending location (e.g., set parser.token.start or create a temporary token
with the current scanner position) before returning and use that when returning
the TabIndentation error so diagnostics point at the tab; update the scan()
branch that checks count_indentation && additional_parent_indent.is_none() &&
self.context.get() == Context::BlockIn to record the current position and
propagate the ParseError::TabIndentation with that position.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b47c99a5-a167-4337-a710-42ddbb1d5f5b
📒 Files selected for processing (2)
src/parsers/yaml.rstest/js/bun/yaml/yaml.test.ts
…ng-line lookahead The block-scalar (`|`/`>`) body scanner is the third leading-whitespace consumer (after scan() and fold_lines()). Its phase-2 nested-newlines loop consumes the leading whitespace of each body line, including the line whose dedent terminates the scalar — but did not set tab_after_indent, so `- |\n x\n\t- y` was accepted as `["x\n","y"]`. Now the `0x20`-then-tab and `other == 0x09` cases taint, mirroring fold_lines(). Also: drop leftover `let r = ...; break r?;` binding from removed debug prints; collapse stray blank lines.
|
Actionable comments posted: 0 |
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🔴
src/parsers/yaml.rs:4043-4048— This unconditionally appliesnode_props.anchor()/.tag()to the e-node key and then resetsnode_props, but unlike the sibling Scalar/SequenceStart/MappingStart arms it does not useimplicit_key_anchors(self.token.line)to do the line-based [200]/[193] split. For&a\n: xthe anchor is now applied to the null key instead of the mapping (regression —*alater resolves tonullinstead of{null:"x"}), and for&outer\n&inner\n: xthenode_props = default()wipeshas_mapping_anchorso&outeris silently dropped instead of erroringMultipleAnchorsat line 4171 (same forhas_mapping_tag). The fix for the same-line case (&a : x, FH7J) is correct; it just needs to gate on the property's line matchingself.token.linelike the other arms do.Extended reasoning...
What changed
Commit c407311 (the FH7J/3GZX/C4HZ fix in this PR) rewrote the
TokenData::MappingValuearm ofparse_nodeso that when a bare:opens a block mapping with an e-node first key, any pendingnode_propsare applied to that key:let first_key = node_props.tag().resolve_null(self.token.start.loc()); if let Some(anchor) = node_props.anchor() { self.anchors.put(Enc::key_bytes(anchor.slice(self.input)), first_key)?; } node_props = NodeProperties::default(); break 'node self.parse_block_mapping(first_key, ...)?;
This is correct for the same-line case
&a : x/!!str : x(the e-node key carries the property, per [193]c-ns-propertiesin BLOCK-KEY context), and that is what FH7J and the un-todo'd"anchor as implicit-key e-node"test exercise.But
node_props.anchor()(line 3358-3363) returnshas_anchorwith no line check, andnode_props = NodeProperties::default()then wipeshas_mapping_anchor/has_mapping_tag. The three sibling implicit-key arms (Scalar at 4140, SequenceStart at 3863, MappingStart at 3950) all route throughnode_props.implicit_key_anchors(implicit_key_line)instead, which encodes the spec's line-based split: a property on the same line as the key belongs to the key; a property on a prior line is the [200]s-l+block-collectionproperty and belongs to the mapping. The new MappingValue arm has no such split.Step-by-step proof —
&a\n: x\nb: *aparse_nodeproperty loop consumesAnchor(&a)on line 1 →set_anchor→node_props.has_anchor = Some(&a@line1),has_mapping_anchor = None.- Next scan →
MappingValueon line 2. - New arm:
node_props.anchor()returns&a(no line check).self.anchors.put("a", first_key=null). Thennode_props = default(). parse_block_mappingbuilds{null:"x", b:*a}.*aresolves tonull(the e-node key).break 'nodereturns the mapping; post-'nodecode at 4186 seesnode_props.anchor() == None→ mapping is not anchored.
Old behavior:
node_propsflowed through untouched; post-'nodeline 4186-4188 anchored&a → {null:"x"}(the mapping).*aresolved to the mapping.Codebase's own rule:
implicit_key_anchors(implicit_key_line=2)at 3407-3422 withhas_anchor@line1,has_mapping_anchor=None→mystery_anchor.line(1) != 2→ returns{key_anchor: None, mapping_anchor: &a}. So per the parser's own splitting logic,&abelongs to the mapping, not the key. The spec agrees: [193]/[154] place an implicit key'sc-ns-propertiesin BLOCK-KEY context (s-separate-in-line, same line only); a prior-line property is the [200] collection property.Step-by-step proof —
&outer\n&inner\n: x- Property loop consumes
&outer@line1→has_anchor=&outer. Then&inner@line2→set_anchor(line 3347-3355):previous.line(1) != 2sohas_mapping_anchor = &outer,has_anchor = &inner. - Scan →
MappingValueon line 3. - New arm:
node_props.anchor()returns&inneronly (line 3358 readshas_anchor, neverhas_mapping_anchor). Anchors&inner → null. Thennode_props = NodeProperties::default()— wipeshas_mapping_anchor=&outer. break 'node. Post-'nodecheck at 4171:node_props.has_mapping_anchoris nowNone→ no error.&outeris silently dropped.
Old behavior:
node_propspreserved → line 4171 firedErr(MultipleAnchors).Codebase's own rule:
implicit_key_anchors(3)withhas_mapping_anchor=&outer,has_anchor=&inner@line2→inner.line(2) != 3→Err(MultipleAnchors)(line 3392-3394). So both old behavior and the canonical helper reject this; the new code silently accepts it.The identical shape applies to tags via
has_mapping_tag:!!map\n!!str\n: xpreviously erroredMultipleTagsat 4176; now!!mapvanishes.Why nothing prevents it
The post-
'nodechecks at 4171-4189 are still there but see a blanknode_props. The only test coverage added is the same-line case (&a : x), which is the one shape where.anchor()andimplicit_key_anchors()agree.Impact
Regression on valid (if esoteric) YAML, introduced by this PR. (1)
&a\n: x— anchor moves from mapping to null key, observable via any later*a. (2)&o\n&i\n: x/!!map\n!!str\n: x— error → silent drop. The PR's stated goal is full spec conformance (402/402); silently changing anchor semantics on valid input and turning a rejected double-property into a silent over-accept both run counter to that.Fix
Replicate the line-based split via
node_props.implicit_key_anchors(self.token.line)?(and the tag analogue) exactly as the Scalar/SequenceStart/MappingStart arms do at 3863/3950/4140: applykey_anchor/key_tagtofirst_key, holdmapping_anchor/mapping_tagto apply to theparse_block_mappingresult, and do not blanket-resetnode_props(or reset only after extractinghas_mapping_*). That preserves the FH7J same-line fix while restoring the prior-line semantics.
Phase-1 (auto-indent detection) consumes the FIRST line after the
`|`/`>` header. When that line's spaces+tab terminates the scalar
(spaces ≤ parent block indent), phase-1's `c =>` arm broke without
tainting, and phase-2 immediately returned done() — leaving the next
scan() to read the tab untainted. Only phase-2 (body-loop, lines after
the first) was instrumented before.
Now `if first == 0x09 { tab_after_indent = true }` between phase-1 and
phase-2, mirroring the phase-2 nested-loop additions.
…93] line split
The FH7J fix unconditionally applied node_props to the e-node key,
ignoring whether the property was on the `:`'s line ([193] key
property, BLOCK-KEY = same-line) or a prior line ([200] block-collection
property). Regression: `- &a\n : x\n- *a` gave `[{null:x}, null]`
instead of `[{null:x}, {null:x}]`; `&outer\n&inner\n: x` accepted
instead of erroring.
Now mirrors the Scalar/SequenceStart/MappingStart arms: the tag is only
the key's when tag_line == colon_line; the anchor split goes through
implicit_key_anchors(colon_line) (key_anchor on first_key, mapping_anchor
on the parse_block_mapping result).
There was a problem hiding this comment.
Additional findings (outside current diff — PR may have been updated during review):
-
🟡
src/parsers/yaml.rs:5016-5028— The phase-2 inner-loop0x0D/0x0Aarms just above (yaml.rs:4994, 5003) — and the phase-1 pair at 4898/4908 — still hard-returnParseError::UnexpectedCharacterwhen a column-0 tab follows an empty line, so"- |\n x\n\t- y\n"throws the new "Tab characters cannot be used as indentation" but adding a blank line ("- |\n x\n\n\t- y\n") throws "Unexpected character" for the identical [184] violation. Diagnostic-only (both inputs are correctly rejected); change those four early-returns toParseError::TabIndentation, or drop the phase-2 ones since theotherarm now handles the tab via taint anyway.Extended reasoning...
What
Commit
5f2fd629("taint tab_after_indent in block-scalar body scanner's terminating-line lookahead") addedtab_after_indenttainting to two arms of the phase-2 inner loop inscan_block_scalar— the0x20arm at yaml.rs:5019 and theotherarm at yaml.rs:5028. The two adjacent0x0D/0x0Aarms in the same innermatch(yaml.rs:4994-4996 and 5003-5005) were left unchanged: they still peek at the next byte after an empty line and, if it is a tab, hard-returnParseError::UnexpectedCharacter. The phase-1 leading-empty-line loop has the same pattern at yaml.rs:4898 and 4908.How it manifests
Two inputs that are spec-invalid for the identical reason — a [184] sibling
-after a tab ins-indentposition — produce different diagnostics depending only on whether a blank line sits between the block-scalar body and the tabbed sibling:Input Diagnostic YAML.parse("- |\n x\n\t- y\n")Tab characters cannot be used as indentation✓YAML.parse("- |\n x\n\n\t- y\n")Unexpected character✗The PR's own new test at yaml.test.ts:1380-1394 ("rejects tab before sibling that immediately follows block-scalar body") covers the first shape but not the empty-line variant, so the gap is untested.
Step-by-step trace
Input A —
"- |\n x\n\t- y\n"(no blank line):- Block-scalar header
|, phase 1 detectscontent_indent = 2fromx, phase 2 starts at__c = 'x'. - Outer
c =>arm consumesx; outer0x0Aarm (4978) enters the inner loop. - Inner loop:
nc = self.next()is\t→ falls tootherarm (5025). PR-added code setsself.tab_after_indent = true,__c = 0x09, breaks inner. - Outer
c =>arm:line_indentisNONE(set bynewline())< min_indent (2)→ body terminates gracefully. - Back in
parse_block_sequence, the loop at 2879 seesSequenceEntrywithtab_after_indent = true→ParseError::TabIndentation.
Input B —
"- |\n x\n\n\t- y\n"(blank line before tab):1-2. Same as above through the outer
0x0Aarm.
3. Inner loop:nc = self.next()is\n(the blank line) → inner0x0Aarm (4999). Incrementsleading_newlines, callsnewline(),inc(1).
4. Line 5003:if Enc::wide(self.next()) == 0x09 { return Err(ParseError::UnexpectedCharacter) }.self.next()is\t→ fires before the body can terminate. Theotherarm at 5025 is never reached for this tab.Why the existing code doesn't prevent it
The
0x0D/0x0Aarms' tab lookahead is pre-existing code that predates this PR. Before the PR it was necessary — there was notab_after_indentmechanism, so the only way to reject a column-0 tab after an empty line in block-scalar context was to error inline. After this PR the check is redundant for phase 2: removing it lets the next inner-loop iteration route the tab through theotherarm, which now taints and lets the body terminate so the parser-side structural check at 2879/3076/etc. fires with the dedicated diagnostic. The pre-existing code was internally consistent (everything usedUnexpectedCharacter); this PR created the inconsistency by updating two of the four arms in this loop and leaving the other two.Impact
Diagnostic-message quality only. Both inputs are correctly rejected; there is no accept/reject behavioral change. The PR's stated goal is "the general spec rule, not three test cases" with a dedicated
TabIndentationdiagnostic at everys-indentposition — leaving four sites in the block-scalar scanner on the old generic error works against that.Fix
Lowest-risk: change
ParseError::UnexpectedCharacter→ParseError::TabIndentationat the four sites (yaml.rs:4898, 4908, 4994, 5004).Cleaner (phase 2 only): delete the
if Enc::wide(self.next()) == 0x09 { … }lookahead at 4994-4996 and 5003-5005 entirely. The next inner-loop iteration sees the tab, theotherarm taints, and the parser-side check producesTabIndentation— exactly matching the no-blank-line path. (Phase 1's pair at 4898/4908 should keep the early return withTabIndentation, since thec =>arm there feedsfirstinto the phase-1-exit taint at 4953 only when content_indent has been determined, and the autodetect interaction is subtler.)Worth adding
expect(() => YAML.parse("- |\n x\n\n\t- y\n")).toThrow(TAB_ERR)(and the\r\n\r\nvariant) alongside the existing block-scalar-body cases at yaml.test.ts:1380. - Block-scalar header
… for post-loop guards The blanket node_props reset cleared has_mapping_tag/has_mapping_anchor, masking the post-loop guards at ~4223/4228. `!!str\n!!map\n: x` and `&a\n&b\n&c : x` were accepted; main + 4/4 refs reject. Now only `has_anchor` is cleared (the inner anchor, consumed by implicit_key_anchors as either key_anchor or sole mapping_anchor). The mapping_anchor and tag fields stay, so the existing overflow guards fire. This matches main's behavior (which also over-rejects the spec-valid `&a\n&b : x` 1-prior-+-1-same case via the same guard; pre-existing).
…r mid-line ---, ---<tab>- x)
… col-0 tab checks
…xisting over-reject The post-loop has_mapping_anchor guard catches both overflow (`&a\n&b\n&c : x`) and the spec-valid `&outer\n&inner : x` (the Scalar-key analogue is accepted because that arm `return Ok` bypasses the guard). Pre-existing on main; comment now says so instead of implying it's overflow-only.
What
Per [62]/[63]
s-indent(n)is spaces only. A tab in indent position iss-separate-in-line— valid before [197] flow-in-block content ([69]s-flow-line-prefix(n) ::= s-indent(n) s-separate-in-line?), but never before a [184]/[192]/[195] structural sibling (-/?/:/implicit-key), which the grammar places immediately ats-indent(n)with no separation production in between.Bun was accepting tab-tainted structural tokens because
Token.indentrecords the count of leading spaces (correctly), but the parser had no way to ask "was there additional whitespace between those spaces and the token?"How
Scanner:
Parser.tab_after_indent: boolrecords "a tab was seen between this line'ss-indent(or post-indicatoradditional_parent_indentcolumn) and the current token." Set inscan()'s tab arm (andfold_lines()' tab arm, which is what consumes whitespace for theDK95/06shape). Reset onnewline()and atscan()entry when re-entering indent position.Parser: checked at every structural-sibling recognition site (the complete set per [184]/[192]/[195]):
parse_block_sequenceloop-ats-indent(n)parse_block_mappingloops-indent(n):indent check (first + subsequent)s-indent(n) ":"parse_nodeMappingKey / MappingValue armsparse_nodeScalar armparse_block_indentedsame-line compactContent paths (flow context,
s-separateafter indicator, plain-scalar fold continuation) correctly do not check it — tab is valid there.This is the general spec rule, not three test cases: it covers every grammar position where
s-indent(n)precedes a structural token, not just the three official-suite inputs.yaml-test-suite
DK95/06,Y79Y/005,Y79Y/008activated. 402/402, 0 todos.Validation
#31203test.todo("tab before block construct")(3/3 asserts)