Split `SourceLocation` into `LineColumn` and `SourceLocation` by MichaReiser · Pull Request #17587 · astral-sh/ruff

MichaReiser · 2025-04-23T17:24:50Z

Summary

This PR splits SourceLocation into LineColumn and SourceLocation and moves the TextSize to LSP position conversion logic into LineIndex.

LineIndex as it is before this PR had two methods:

source_location: Converts a TextSize to a SourceLocation
offset: Converts a SourceLocation back to a TextSize

The problem with the current implementation is that source_location trims a leading BOM offset, whereas offset doesn't have any custom BOM handling.
That means, mapping the first character right after a BOM to the old source_location would give row: 0, column: 0, but mapping that position back to an offset would point before instead of after the BOM.

This PR fixes this inconsistency by removing the offset for the old SourceLocation (now LineColumn) because the only case where we need to map back a column is in the formatter but the special BOM handling doesn't matter.

However, we don't want to skip the BOM for LSP operations because LSP operations don't return line/column information; instead, they map a position to a line and the nth character on that line.
This is why this PR introduces a new pair of source_location and offset methods to map between TextSize and a line and character_offset where character_offset is an UTF8, UTF16 or UTF32 offset (bytes, code units, Unicode scalar values).

The reason I dove into all this is because the playground needs to convert the ranges to UTF16 and I wanted to avoid copying the whole conversion logic a third time (ruff server, red knot server, wasm)

Test Plan

Tested the Ruff and VS code extension with unicode content
Tested that the line numbers in the CLI are correct
Tested notebooks
cargo test

MichaReiser · 2025-04-23T17:27:17Z

crates/red_knot_wasm/src/lib.rs


-impl From<(SourceLocation, SourceLocation)> for Range {
-    fn from((start, end): (SourceLocation, SourceLocation)) -> Self {
+impl From<(LineColumn, LineColumn)> for Range {


I'll update the playground to use LineCharacter in a seprate PR

github-actions · 2025-04-23T17:41:59Z

`mypy_primer` results

No ecosystem changes detected ✅

github-actions · 2025-04-23T17:47:55Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

dhruvmanila

This is great, much better API, thanks for taking a look at this!

I've a few comments but otherwise this looks like a pretty straightforward and logical change to me.

crates/ruff_source_file/src/line_index.rs

crates/ruff_wasm/src/lib.rs

dhruvmanila · 2025-04-25T18:04:32Z

Also, apologies for the late review, falling a bit behind on notifications.

MichaReiser · 2025-04-27T10:01:02Z

Also, apologies for the late review, falling a bit behind on notifications.

No worries. I think you prioritized correctly. This PR isn't urgent.

…-sh#17587)

Summary: I'm trying to pull in some latest changes from upstream `ruff_python` libraries to get a sense of what an upgrade would look like (potentially getting more upstream utilities that can be used). The change I'm pulling from is 0.11.12 (released last week). There's a new release 0.11.13 this week which contains T-string support for Python 3.14 (astral-sh/ruff#17851), but the changes there introduced nontrivial downstream breakage (i.e. expr structure gets shuffled around) so I think it makes more sense to do a separate upgrade just for that one feature. The main backward-incompatible change in this upgrade comes from this PR: astral-sh/ruff#17587. The main consequence is that `SourceLocation` now no longer directly contains line and column info for user-visible texts-- a new structure `LineColumn` is now used for that purpose, and `SourceLocation` now represents the "raw" line and character offset data in the original string. The reason why the "raw" numbers and "user-visible" numbers are different seem to come from Unicode's byte-offset-mark (BOM) character (I'm not super familiar with those -- read the original PR if you are interested in the details). For us, I think the main response there should be to rely on `LineColumn` instead of `SourceLocation` now. That's mostly a trivial thing to do, except there are cases where we want to convert line+column number back to a byte offset in the string and there we have to use `SourceLocation` -- technically speaking that conversion can't be made loseless so we need to be careful about where it happens. I think we perform that kind of conversion mostly in tests so we are fine. But I'll mark the place where we do it in prod to raise awareness that in certain cases it might be an issue. There are also big changes in how the "semantic syntax checker" behaves. Good news is that a bunch of new checks were added so we can reliably detect more stuffs. Bad news is that many of the added checks require us to implement an AST visitor to track context and I don't think it's a trivial thing to do. Right now I'm just returning some dummy values to get the very basic checks working. But in the future we could come back and do more of the visit properly. Reviewed By: ndmitchell Differential Revision: D76156394 fbshipit-source-id: 0f55b5888259948d67400389a5efdff69c727dab

MichaReiser force-pushed the micha/line-character-column branch from fccdd82 to 714a2bc Compare April 23, 2025 17:25

MichaReiser added the internal An internal refactor or improvement label Apr 23, 2025

MichaReiser commented Apr 23, 2025

View reviewed changes

MichaReiser force-pushed the micha/line-character-column branch 2 times, most recently from d82d4c5 to c2ef93f Compare April 23, 2025 17:37

MichaReiser changed the title ~~Split SourceLocation into LineColumn and LineCharacter~~ Split SourceLocation into LineColumn and SourceLocation Apr 23, 2025

MichaReiser force-pushed the micha/line-character-column branch 5 times, most recently from a2cc634 to 94c6b83 Compare April 24, 2025 07:31

MichaReiser marked this pull request as ready for review April 24, 2025 07:31

MichaReiser requested review from AlexWaygood, carljm, dcreager, dhruvmanila and sharkdp as code owners April 24, 2025 07:31

MichaReiser removed request for AlexWaygood, carljm, dcreager and sharkdp April 24, 2025 07:31

dhruvmanila approved these changes Apr 25, 2025

View reviewed changes

crates/ruff_source_file/src/line_index.rs Outdated Show resolved Hide resolved

crates/ruff_source_file/src/line_index.rs Outdated Show resolved Hide resolved

crates/ruff_source_file/src/line_index.rs Show resolved Hide resolved

crates/ruff_wasm/src/lib.rs Show resolved Hide resolved

MichaReiser added 3 commits April 27, 2025 12:02

Split SourceLocation into LineColumn and LineCharacter

7bbb933

Simplify line_column

460ac61

Review comments

b868451

MichaReiser force-pushed the micha/line-character-column branch from 94c6b83 to b868451 Compare April 27, 2025 10:21

MichaReiser merged commit 1c65e0a into main Apr 27, 2025
34 checks passed

MichaReiser deleted the micha/line-character-column branch April 27, 2025 10:27

dylwil3 pushed a commit to dylwil3/ruff that referenced this pull request Apr 27, 2025

Split SourceLocation into LineColumn and SourceLocation (astral…

87b9027

…-sh#17587)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split `SourceLocation` into `LineColumn` and `SourceLocation`#17587

Split `SourceLocation` into `LineColumn` and `SourceLocation`#17587
MichaReiser merged 3 commits intomainfrom
micha/line-character-column

MichaReiser commented Apr 23, 2025 •

edited

Loading

Uh oh!

MichaReiser Apr 23, 2025

Uh oh!

github-actions bot commented Apr 23, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 23, 2025 •

edited

Loading

Uh oh!

dhruvmanila left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dhruvmanila commented Apr 25, 2025

Uh oh!

MichaReiser commented Apr 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MichaReiser commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Uh oh!

MichaReiser Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mypy_primer results

Uh oh!

github-actions bot commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

dhruvmanila left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dhruvmanila commented Apr 25, 2025

Uh oh!

MichaReiser commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MichaReiser commented Apr 23, 2025 •

edited

Loading

github-actions bot commented Apr 23, 2025 •

edited

Loading

`mypy_primer` results

github-actions bot commented Apr 23, 2025 •

edited

Loading

`ruff-ecosystem` results

MichaReiser commented Apr 27, 2025 •

edited

Loading