[parser] Fix indentation tracking after line continuations#23299
[parser] Fix indentation tracking after line continuations#23299kar-ganap wants to merge 1 commit intoastral-sh:mainfrom
Conversation
ntBre
left a comment
There was a problem hiding this comment.
Can you add a regression test for this showing that the issue is now resolved?
|
f65948a to
2457be3
Compare
Can you provide a reference to where this is specified? It'd be useful to add a link to it in the comments. |
This example does not raise an error on latest |
|
@dhruvmanila Thanks for the review! Re: Python spec reference — The reference is in the Python Language Reference — Indentation. The original issue also links to the related CPython discussion: python/cpython#90249. I've updated the PR description to include the spec link. Re: reproduction not erroring on main — Good catch, the reproduction in the original PR description was wrong (it showed a mid-expression if True:
pass
\
print("1")This still errors on latest main: I've updated the PR description with the correct reproduction. The regression test ( |
2457be3 to
0917d5f
Compare
| ))); | ||
| } | ||
| indentation = Indentation::root(); | ||
| // test_ok backslash_continuation_indentation |
There was a problem hiding this comment.
Lexer tests are written using Rust code under the #[cfg(test)] section in this file which would create snapshots of the lexed tokens. Can you move this down below and add a few more edge cases?
There was a problem hiding this comment.
Done — added 12 lexer snapshot tests in the #[cfg(test)] section (4 scenarios × 3 EOL variants), following the existing helper + per-EOL pattern. Kept the inline tests as documentation for the fixture infrastructure.
0917d5f to
4dcc24a
Compare
| fn backslash_continuation_indentation_eol(eol: &str) -> LexerOutput { | ||
| let source = format!("if True:{eol} pass{eol} \\{eol} print(\"1\"){eol}"); | ||
| lex_source(&source) | ||
| } |
There was a problem hiding this comment.
Sorry but this is not what I meant, we don't need to test for various EOL cases as that's already tested, the test cases should be specific to the behavior that's being changed.
I've updated the test cases locally and I've found that the following isn't raising a syntax error when it should:
if True:
1
\
2The indentation level at 1 is of 4 spaces while at 2 is of 6 spaces. So, I think there's more that needs to be done. I need to take a closer look at the solution and what CPython is doing.
There was a problem hiding this comment.
I'll look at it a later today unless you can figure out what's wrong, I've put the PR into draft for the time being.
There was a problem hiding this comment.
I investigated the overindented case you raised:
if True:
1
\
2This does produce an error with our fix. The 6 spaces before \ freeze the indentation at 6, so the lexer emits an Indent from 4→6, and the parser rejects it as "Unexpected indentation":
invalid-syntax: Unexpected indentation
--> test.py:3:1
|
1 | if True:
2 | 1
3 | / \
4 | | 2
| |____^
The issue exists on main (without this fix), where indentation = Indentation::root() resets indentation to 0, then re-accumulates 4 spaces from the continuation line — matching the block level and silently accepting the overindent. Our fix corrects this.
I've added a backslash_continuation_overindented snapshot test that captures the token stream for this exact case (showing the two Indent tokens). Also simplified the tests per your feedback — removed the 12 EOL-variant tests and replaced them with 5 targeted tests:
backslash_continuation_in_block— basic valid casebackslash_continuation_at_column_zero— column 0 special casebackslash_continuation_multiple— consecutive continuationsbackslash_continuation_overindented— your case (indentation frozen at 6, Indent 4→6 emitted)backslash_continuation_mismatch— indentation doesn't match stack, lexer error
Verified all cases match CPython behavior.
There was a problem hiding this comment.
I'm sure you didn't mean to do that but you force pushed some of the changes which removed my changes in 879255d. Please avoid making any further changes to avoid any conflict.
879255d to
1efc281
Compare
|
Closing in favor of #23417 |
Summary
Fixes #19301.
Per the Python spec, the whitespace up to the first backslash determines the line's indentation, and continuation lines should not affect the indentation level. The lexer's `eat_indentation` loop was not handling this: after a `\` continuation, it accumulated the continuation line's whitespace into the indentation, causing spurious `IndentationError` and "Expected a statement" errors.
Reproduction
```python
if True:
pass
\
print("1")
```
Before: `SyntaxError: Unexpected indentation` + `Expected a statement`
After: Parses correctly (matches CPython behavior)
Fix
After consuming `\` during indentation tracking in `eat_indentation`, skip the continuation line's whitespace with `eat_while(is_python_whitespace)` without accumulating it into `indentation`. Guard: only skip when `indentation != Indentation::root()` — when `\` is at column 0, let the loop continue so the next line's whitespace is accumulated normally (needed for the `else: \` pattern in RET505 fixtures).
Reference
Test plan
Add lexer test cases, a couple of inline parser tests.