[ty] Improve completions by leveraging scopes by BurntSushi · Pull Request #18281 · astral-sh/ruff

BurntSushi · 2025-05-23T16:27:18Z

This PR improves completions by leveraging scopes. That is, we try to
identify an expression corresponding to the cursor's current position.
We then use that expression to find it scope. Our completions are then
the names in that scope's (and its ancestors') symbol tables.

The most straight-forward way this improves on the status quo can be
seen from a simple example:

def foo():
    def foofoo(): ...
<CURSOR>

Today, this would include foofoo in the completions. But with this
PR, foofoo is not included.

I had originally started off with a bit more shenanigans in terms of
identifying the expression under the cursor. But in the end, at least
for scopes, a very simple solution seems to work well.

The major caveat here is that this doesn't work well when requesting
completions before having typed anything. While this doesn't fail in
every case, it is somewhat common. For example:

def foo():
    def foofoo(): ...
    <CURSOR>

In this instance, the AST nodes for foo and foofoo actually end
at exactly the same byte offset, so there is nothing for us to easily
latch on to that comes after foofoo but before the end of foo.
In particular, we really want to enumerate the symbols for the scope
contained in the function body of foo. But it's unclear how to
(easily) actually get the scope given the cursor's position. One
approach might be to tweak the ranges of the AST nodes to make finding
overlapping AST nodes easier. Either way, in this case currently, we
fall back to the global scope. So we don't end up with a complete
failure, but I'm guessing this will prove to be frustrating for users.

(I do not think we need to fix this in this PR.)

Note that this doesn't apply when something has been typed. For example:

def foo():
    def foofoo(): ...
    f<CURSOR>

In this case, we see that f is part of the body of foo and
correctly provide completions for both foo and foofoo.

There are some other cases where a <CURSOR> being in leading/trailing
whitespace results in sub-optimal failures. The commented out test
cases in the second commit cover a smattering of them.

Ref astral-sh/ty#86

github-actions · 2025-05-23T16:30:16Z

`mypy_primer` results

No ecosystem changes detected ✅

github-actions · 2025-05-23T16:35:34Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser

Nice. I think this is a great starting point! We can look into those tricky cases in a separate PR.

I recommend uncommenting the tests and instead add a TODO to each test with the expected snapshot

MichaReiser · 2025-05-23T17:50:13Z

crates/ty_ide/src/completion.rs

+
+        assert_snapshot!(test.completions(), @r"
+        foo
+        f


This is funny but makes sense :)

MichaReiser · 2025-05-23T17:50:55Z

crates/ty_ide/src/completion.rs

+        );
+
+        assert_snapshot!(test.completions(), @r"
+        foo


I forgot that our completion is indeed naive... I was like, wait, why does foo show up but makes sense

It looks like some other part of the LSP stack filters this out, though? Like if I try this in the editor, it won't actually show "foo" when "g" is already typed.

Yeah, filtering is usually done on the client side so that it's consistent across other language servers but the server can do it's own filtering as well.

MichaReiser · 2025-05-23T17:54:08Z

crates/ty_ide/src/completion.rs

+    // If we didn't want to change the ranges on the AST nodes, another
+    // approach here would be to get the inner most scope, and explore
+    // its ancestors until we get to a level that matches the current
+    // cursor's indentation. This seems fraught however. It's not clear
+    // to me that we can always assume a correspondence between scopes
+    // and indentation level.


Instead of assuming any relationship between node and scope. I think I'd prefer if the IDE integration traverses downwards to find a node who's indentation matches the indentation at the current position, then lookup the scope for that node. This may require passing the scope to model.completions because our AST, unfortunately, doesn't support traversing upwards once you have an AnyNodeRef

I don't think scopes are a public API, but I think I get what you're saying. I'm not sure that will work, but I think it will work in some cases and certainly seems worth exploring!

crates/ty_ide/src/completion.rs

sharkdp · 2025-05-23T19:54:46Z

crates/ty_ide/src/completion.rs

+
+    let model = ty_python_semantic::SemanticModel::new(db.upcast(), file);
+    model
+        .completions(target.node())


Just something I randomly noticed: there's no deduplication here. If you have something like the following, foo will show up twice in the suggested completions:

def foo(): pass class C: def foo(self): pass def bar(self): f<CURSOR>

Yeah, we would need to model shadowing here somehow because the completions should understand the kind of "foo" symbol.

I guess a naive way to solve this would be to keep a running list of completions as we traverse the scope upwards and avoid adding a symbol if it already exists.

I think another important issue revealed by this example is that we aren't modeling the quirks of Python name resolution, e.g. that names in a class scope are not visible within nested function scopes (so only the global foo should even be visible here.) Doesn't necessarily need to be in this PR, but I think this will be important to fix; having all methods of a class wrongly show up as autocomplete suggestions within other methods of the class will be pretty annoying.

For now, I've "fixed" this by just sorting and de-duplicating the completions returned by the server. I had assumed the client would de-duplicate since it was already doing prefix matching, but that does not appear to be the case. I've also added tests covering these cases (along with an appropriate FIXME for the false positives that @carljm points out).

I agree that what we're doing right now is very dumb. I'm working my way up to fancier things. :-) I agree that the false positives here should be fixed and that completions should be more than just name strings.

crates/ty_ide/src/completion.rs

sharkdp · 2025-05-23T20:07:23Z

crates/ty_python_semantic/src/semantic_model.rs

+        let mut symbols = vec![];
+        for (file_scope, _) in index.ancestor_scopes(file_scope) {
+            for symbol in index.symbol_table(file_scope).symbols() {
+                symbols.push(symbol.name().clone());
+            }
+        }
+        symbols


This does not take type information into account. For example, it will include symbols from unreachable branches:

import sys if sys.version_info < (3, 0): legacy_symbol = 1 l<CURSOR> # will include legacy_symbol

The extend_with_declarations_and_bindings function in #18251 does something similar to this, but excludes definitely-unbound symbols.

I'm not sure what the desired functionality here is (or if there is a performance tradeoff). Maybe it's fine to error on the side of providing too many completions.

I think we would at least want to make sure that unreachable bindings are included in the autocomplete suggestions if the user is editing a branch of code that is itself understood by ty to be unreachable

Yeah, I think it would be useful to exclude symbols that are unreachable although that doesn't necessarily need to be done in this PR itself.

Note that I'm actually saying the opposite to Dhruv and David here. I agree that excluding unreachable bindings makes sense if the user is editing reachable code. But what if we've got a situation like this?

Config: --python-version=3.10

import sys if sys.version_info >= (3, 11): long_variable_name = [] lo<CURSOR>

I think it'll be pretty annoying for the user there if we don't include long_variable_name in the list of autocomplete suggestions when the cursor is at that location. But the binding is in unreachable code!

That's why I was wondering if we should lean towards providing too many completions.

But note that other LSPs also have degraded experience in unreachable code sections:

That's not a great reason for not trying to do better here. But I also don't think it's high priority to provide the best-possible LSP experience inside unreachable sections of code. For other languages, I often deliberately change the config when I'm editing otherwise unreachable code.

Yeah I agree we can do better here, but I think I generally prefer offering too many completions in a case like this until we can implement what @AlexWaygood suggests.

dhruvmanila

Nice!

dhruvmanila · 2025-05-26T10:48:21Z

crates/ty_python_semantic/src/semantic_model.rs

+        let mut symbols = vec![];
+        for (file_scope, _) in index.ancestor_scopes(file_scope) {
+            for symbol in index.symbol_table(file_scope).symbols() {
+                symbols.push(symbol.name().clone());
+            }
+        }
+        symbols


Yeah, I think it would be useful to exclude symbols that are unreachable although that doesn't necessarily need to be done in this PR itself.

dhruvmanila · 2025-05-26T11:08:12Z

crates/ty_ide/src/completion.rs

+        );
+
+        assert_snapshot!(test.completions(), @r"
+        foo


Yeah, filtering is usually done on the client side so that it's consistent across other language servers but the server can do it's own filtering as well.

dhruvmanila · 2025-05-26T11:15:30Z

crates/ty_ide/src/completion.rs

+        // It's not totally clear why `for` shows up in the
+        // symbol tables of the detected scopes here. My guess
+        // is that there's perhaps some sub-optimal behavior
+        // here because the list comprehension as written is not
+        // valid.


This is due to error recovery. The source code contains a syntax error because the identifier is absent ([ for bar in [1, 2, 3]) and the parser recovers by converting for into an identifier. The fix is to improve error recovery where the parser is converting keywords into an identifier. There are benefits for doing this but maybe not doing this conversion might be more beneficial. Point 12 in #10653.

dhruvmanila · 2025-05-26T11:19:23Z

crates/ty_ide/src/completion.rs

+        // <CURSOR> is inside leading or trailing whitespace. (Even
+        // when enclosed in parentheses.)


Yeah, parentheses aren't part of the node's range unless it's relevant e.g., tuples.

dhruvmanila · 2025-05-26T11:27:11Z

crates/ty_ide/src/completion.rs

+
+    let model = ty_python_semantic::SemanticModel::new(db.upcast(), file);
+    model
+        .completions(target.node())


Yeah, we would need to model shadowing here somehow because the completions should understand the kind of "foo" symbol.

I guess a naive way to solve this would be to keep a running list of completions as we traverse the scope upwards and avoid adding a symbol if it already exists.

This is analogous to the existing `Tokens::after` method. Its implementation is almost identical. We plan to use this for looking at the tokens immediately before the cursor when fetching completions.

Previously, completions were based on just returning every identifier parsed in the current Python file. In this commit, we change it to identify an expression under the cursor and then return all symbols available to the scope containing that expression. This is still returning too much, and also, in some cases, not enough. Namely, it doesn't really take the specific context into account other than scope. But this does improve on the status quo. For example: def foo(): ... def bar(): def fast(): ... def foofoo(): ... f<CURSOR> When asking for completions here, the LSP will no longer include `fast` as a possible completion in this context. Ref astral-sh/ty#86

BurntSushi · 2025-05-29T14:15:27Z

OK, I've addressed what feedback I think is plausible to do in this PR. I've recorded the notes about where completions can be improved (which I'll try to summarize in GitHub issues when I get a chance) for future work. I don't have a good sense of how hard some of these things are yet. e.g., Detecting whether the cursor is in an unreachable scope and ensuring that instance methods aren't treated as if they were normal functions.

BurntSushi requested review from AlexWaygood, MichaReiser, carljm, dcreager, dhruvmanila and sharkdp as code owners May 23, 2025 16:27

BurntSushi removed request for carljm, dcreager and sharkdp May 23, 2025 16:51

MichaReiser added server Related to the LSP server ty Multi-file analysis & type inference labels May 23, 2025

MichaReiser approved these changes May 23, 2025

View reviewed changes

BurntSushi force-pushed the ag/completion-scopes branch from 850c3e3 to 43b4337 Compare May 23, 2025 18:27

sharkdp reviewed May 23, 2025

View reviewed changes

crates/ty_ide/src/completion.rs Show resolved Hide resolved

sharkdp reviewed May 23, 2025

View reviewed changes

BurntSushi force-pushed the ag/completion-scopes branch 2 times, most recently from 83b4eb8 to da80c2a Compare May 23, 2025 20:33

dhruvmanila approved these changes May 26, 2025

View reviewed changes

ruff_python_parser: add Tokens::before method

a2ed5fd

This is analogous to the existing `Tokens::after` method. Its implementation is almost identical. We plan to use this for looking at the tokens immediately before the cursor when fetching completions.

BurntSushi force-pushed the ag/completion-scopes branch 2 times, most recently from 669fc64 to 9acee50 Compare May 29, 2025 14:09

BurntSushi force-pushed the ag/completion-scopes branch from 9acee50 to 9134bc2 Compare May 29, 2025 14:13

BurntSushi merged commit 33ed502 into main May 29, 2025
35 checks passed

BurntSushi deleted the ag/completion-scopes branch May 29, 2025 14:31

MichaReiser mentioned this pull request Oct 20, 2025

Missing completions if cursor is at the end of the file astral-sh/ty#1392

Closed

		// <CURSOR> is inside leading or trailing whitespace. (Even
		// when enclosed in parentheses.)

Conversation

BurntSushi commented May 23, 2025

Uh oh!

github-actions bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

mypy_primer results

Uh oh!

github-actions bot commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Uh oh!

MichaReiser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sharkdp May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BurntSushi May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AlexWaygood May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dhruvmanila left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BurntSushi commented May 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

github-actions bot commented May 23, 2025 •

edited

Loading

`mypy_primer` results

github-actions bot commented May 23, 2025 •

edited

Loading

`ruff-ecosystem` results

sharkdp May 23, 2025 •

edited

Loading

BurntSushi May 29, 2025 •

edited

Loading

AlexWaygood May 26, 2025 •

edited

Loading