Split SourceLocation into LineColumn and SourceLocation#17587
Merged
MichaReiser merged 3 commits intomainfrom Apr 27, 2025
Merged
Split SourceLocation into LineColumn and SourceLocation#17587MichaReiser merged 3 commits intomainfrom
SourceLocation into LineColumn and SourceLocation#17587MichaReiser merged 3 commits intomainfrom
Conversation
fccdd82 to
714a2bc
Compare
MichaReiser
commented
Apr 23, 2025
|
|
||
| impl From<(SourceLocation, SourceLocation)> for Range { | ||
| fn from((start, end): (SourceLocation, SourceLocation)) -> Self { | ||
| impl From<(LineColumn, LineColumn)> for Range { |
Member
Author
There was a problem hiding this comment.
I'll update the playground to use LineCharacter in a seprate PR
d82d4c5 to
c2ef93f
Compare
Contributor
|
Contributor
|
SourceLocation into LineColumn and LineCharacterSourceLocation into LineColumn and SourceLocation
a2cc634 to
94c6b83
Compare
dhruvmanila
approved these changes
Apr 25, 2025
Member
dhruvmanila
left a comment
There was a problem hiding this comment.
This is great, much better API, thanks for taking a look at this!
I've a few comments but otherwise this looks like a pretty straightforward and logical change to me.
Member
|
Also, apologies for the late review, falling a bit behind on notifications. |
Member
Author
No worries. I think you prioritized correctly. This PR isn't urgent. |
94c6b83 to
b868451
Compare
dylwil3
pushed a commit
to dylwil3/ruff
that referenced
this pull request
Apr 27, 2025
facebook-github-bot
pushed a commit
to facebook/pyrefly
that referenced
this pull request
Jun 6, 2025
Summary: I'm trying to pull in some latest changes from upstream `ruff_python` libraries to get a sense of what an upgrade would look like (potentially getting more upstream utilities that can be used). The change I'm pulling from is 0.11.12 (released last week). There's a new release 0.11.13 this week which contains T-string support for Python 3.14 (astral-sh/ruff#17851), but the changes there introduced nontrivial downstream breakage (i.e. expr structure gets shuffled around) so I think it makes more sense to do a separate upgrade just for that one feature. The main backward-incompatible change in this upgrade comes from this PR: astral-sh/ruff#17587. The main consequence is that `SourceLocation` now no longer directly contains line and column info for user-visible texts-- a new structure `LineColumn` is now used for that purpose, and `SourceLocation` now represents the "raw" line and character offset data in the original string. The reason why the "raw" numbers and "user-visible" numbers are different seem to come from Unicode's byte-offset-mark (BOM) character (I'm not super familiar with those -- read the original PR if you are interested in the details). For us, I think the main response there should be to rely on `LineColumn` instead of `SourceLocation` now. That's mostly a trivial thing to do, except there are cases where we want to convert line+column number back to a byte offset in the string and there we have to use `SourceLocation` -- technically speaking that conversion can't be made loseless so we need to be careful about where it happens. I think we perform that kind of conversion mostly in tests so we are fine. But I'll mark the place where we do it in prod to raise awareness that in certain cases it might be an issue. There are also big changes in how the "semantic syntax checker" behaves. Good news is that a bunch of new checks were added so we can reliably detect more stuffs. Bad news is that many of the added checks require us to implement an AST visitor to track context and I don't think it's a trivial thing to do. Right now I'm just returning some dummy values to get the very basic checks working. But in the future we could come back and do more of the visit properly. Reviewed By: ndmitchell Differential Revision: D76156394 fbshipit-source-id: 0f55b5888259948d67400389a5efdff69c727dab
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR splits
SourceLocationintoLineColumnandSourceLocationand moves theTextSizeto LSP position conversion logic intoLineIndex.LineIndexas it is before this PR had two methods:source_location: Converts aTextSizeto aSourceLocationoffset:Converts aSourceLocationback to aTextSizeThe problem with the current implementation is that
source_locationtrims a leading BOM offset, whereasoffsetdoesn't have any custom BOM handling.That means, mapping the first character right after a BOM to the old
source_locationwould giverow: 0,column: 0, but mapping that position back to an offset would point before instead of after the BOM.This PR fixes this inconsistency by removing the
offsetfor the oldSourceLocation(nowLineColumn)because the only case where we need to map back a column is in the formatter but the special BOM handling doesn't matter.However, we don't want to skip the BOM for LSP operations because LSP operations don't return line/column information; instead, they map a position to a line and the
nthcharacter on that line.This is why this PR introduces a new pair of
source_locationandoffsetmethods to map betweenTextSizeand alineandcharacter_offsetwherecharacter_offsetis anUTF8,UTF16orUTF32offset (bytes, code units, Unicode scalar values).The reason I dove into all this is because the playground needs to convert the ranges to UTF16 and I wanted to avoid copying the whole conversion logic a third time (ruff server, red knot server, wasm)
Test Plan
cargo test