Skip to content

[ty] Improve completions by leveraging scopes#18281

Merged
BurntSushi merged 2 commits intomainfrom
ag/completion-scopes
May 29, 2025
Merged

[ty] Improve completions by leveraging scopes#18281
BurntSushi merged 2 commits intomainfrom
ag/completion-scopes

Conversation

@BurntSushi
Copy link
Member

This PR improves completions by leveraging scopes. That is, we try to
identify an expression corresponding to the cursor's current position.
We then use that expression to find it scope. Our completions are then
the names in that scope's (and its ancestors') symbol tables.

The most straight-forward way this improves on the status quo can be
seen from a simple example:

def foo():
    def foofoo(): ...
<CURSOR>

Today, this would include foofoo in the completions. But with this
PR, foofoo is not included.

I had originally started off with a bit more shenanigans in terms of
identifying the expression under the cursor. But in the end, at least
for scopes, a very simple solution seems to work well.

The major caveat here is that this doesn't work well when requesting
completions before having typed anything. While this doesn't fail in
every case, it is somewhat common. For example:

def foo():
    def foofoo(): ...
    <CURSOR>

In this instance, the AST nodes for foo and foofoo actually end
at exactly the same byte offset, so there is nothing for us to easily
latch on to that comes after foofoo but before the end of foo.
In particular, we really want to enumerate the symbols for the scope
contained in the function body of foo. But it's unclear how to
(easily) actually get the scope given the cursor's position. One
approach might be to tweak the ranges of the AST nodes to make finding
overlapping AST nodes easier. Either way, in this case currently, we
fall back to the global scope. So we don't end up with a complete
failure, but I'm guessing this will prove to be frustrating for users.

(I do not think we need to fix this in this PR.)

Note that this doesn't apply when something has been typed. For example:

def foo():
    def foofoo(): ...
    f<CURSOR>

In this case, we see that f is part of the body of foo and
correctly provide completions for both foo and foofoo.

There are some other cases where a <CURSOR> being in leading/trailing
whitespace results in sub-optimal failures. The commented out test
cases in the second commit cover a smattering of them.

Ref astral-sh/ty#86

@github-actions
Copy link
Contributor

github-actions bot commented May 23, 2025

mypy_primer results

No ecosystem changes detected ✅

@github-actions
Copy link
Contributor

github-actions bot commented May 23, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser MichaReiser added server Related to the LSP server ty Multi-file analysis & type inference labels May 23, 2025
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I think this is a great starting point! We can look into those tricky cases in a separate PR.

I recommend uncommenting the tests and instead add a TODO to each test with the expected snapshot


assert_snapshot!(test.completions(), @r"
foo
f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is funny but makes sense :)

);

assert_snapshot!(test.completions(), @r"
foo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that our completion is indeed naive... I was like, wait, why does foo show up but makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like some other part of the LSP stack filters this out, though? Like if I try this in the editor, it won't actually show "foo" when "g" is already typed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, filtering is usually done on the client side so that it's consistent across other language servers but the server can do it's own filtering as well.

Comment on lines 225 to 230
// If we didn't want to change the ranges on the AST nodes, another
// approach here would be to get the inner most scope, and explore
// its ancestors until we get to a level that matches the current
// cursor's indentation. This seems fraught however. It's not clear
// to me that we can always assume a correspondence between scopes
// and indentation level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of assuming any relationship between node and scope. I think I'd prefer if the IDE integration traverses downwards to find a node who's indentation matches the indentation at the current position, then lookup the scope for that node. This may require passing the scope to model.completions because our AST, unfortunately, doesn't support traversing upwards once you have an AnyNodeRef

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think scopes are a public API, but I think I get what you're saying. I'm not sure that will work, but I think it will work in some cases and certainly seems worth exploring!

@BurntSushi BurntSushi force-pushed the ag/completion-scopes branch from 850c3e3 to 43b4337 Compare May 23, 2025 18:27

let model = ty_python_semantic::SemanticModel::new(db.upcast(), file);
model
.completions(target.node())
Copy link
Contributor

@sharkdp sharkdp May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just something I randomly noticed: there's no deduplication here. If you have something like the following, foo will show up twice in the suggested completions:

def foo():
    pass

class C:
    def foo(self):
        pass

    def bar(self):
        f<CURSOR>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we would need to model shadowing here somehow because the completions should understand the kind of "foo" symbol.

I guess a naive way to solve this would be to keep a running list of completions as we traverse the scope upwards and avoid adding a symbol if it already exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another important issue revealed by this example is that we aren't modeling the quirks of Python name resolution, e.g. that names in a class scope are not visible within nested function scopes (so only the global foo should even be visible here.) Doesn't necessarily need to be in this PR, but I think this will be important to fix; having all methods of a class wrongly show up as autocomplete suggestions within other methods of the class will be pretty annoying.

Copy link
Member Author

@BurntSushi BurntSushi May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've "fixed" this by just sorting and de-duplicating the completions returned by the server. I had assumed the client would de-duplicate since it was already doing prefix matching, but that does not appear to be the case. I've also added tests covering these cases (along with an appropriate FIXME for the false positives that @carljm points out).

I agree that what we're doing right now is very dumb. I'm working my way up to fancier things. :-) I agree that the false positives here should be fixed and that completions should be more than just name strings.

Comment on lines +60 to +66
let mut symbols = vec![];
for (file_scope, _) in index.ancestor_scopes(file_scope) {
for symbol in index.symbol_table(file_scope).symbols() {
symbols.push(symbol.name().clone());
}
}
symbols
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not take type information into account. For example, it will include symbols from unreachable branches:

import sys

if sys.version_info < (3, 0):
    legacy_symbol = 1

l<CURSOR>  # will include legacy_symbol

The extend_with_declarations_and_bindings function in #18251 does something similar to this, but excludes definitely-unbound symbols.

I'm not sure what the desired functionality here is (or if there is a performance tradeoff). Maybe it's fine to error on the side of providing too many completions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would at least want to make sure that unreachable bindings are included in the autocomplete suggestions if the user is editing a branch of code that is itself understood by ty to be unreachable

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it would be useful to exclude symbols that are unreachable although that doesn't necessarily need to be done in this PR itself.

Copy link
Member

@AlexWaygood AlexWaygood May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I'm actually saying the opposite to Dhruv and David here. I agree that excluding unreachable bindings makes sense if the user is editing reachable code. But what if we've got a situation like this?

Config: --python-version=3.10

import sys

if sys.version_info >= (3, 11):
    long_variable_name = []
    lo<CURSOR>

I think it'll be pretty annoying for the user there if we don't include long_variable_name in the list of autocomplete suggestions when the cursor is at that location. But the binding is in unreachable code!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why I was wondering if we should lean towards providing too many completions.

But note that other LSPs also have degraded experience in unreachable code sections:

image

That's not a great reason for not trying to do better here. But I also don't think it's high priority to provide the best-possible LSP experience inside unreachable sections of code. For other languages, I often deliberately change the config when I'm editing otherwise unreachable code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree we can do better here, but I think I generally prefer offering too many completions in a case like this until we can implement what @AlexWaygood suggests.

@BurntSushi BurntSushi force-pushed the ag/completion-scopes branch 2 times, most recently from 83b4eb8 to da80c2a Compare May 23, 2025 20:33
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Comment on lines +60 to +66
let mut symbols = vec![];
for (file_scope, _) in index.ancestor_scopes(file_scope) {
for symbol in index.symbol_table(file_scope).symbols() {
symbols.push(symbol.name().clone());
}
}
symbols
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it would be useful to exclude symbols that are unreachable although that doesn't necessarily need to be done in this PR itself.

);

assert_snapshot!(test.completions(), @r"
foo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, filtering is usually done on the client side so that it's consistent across other language servers but the server can do it's own filtering as well.

Comment on lines +452 to +458
// It's not totally clear why `for` shows up in the
// symbol tables of the detected scopes here. My guess
// is that there's perhaps some sub-optimal behavior
// here because the list comprehension as written is not
// valid.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is due to error recovery. The source code contains a syntax error because the identifier is absent ([ for bar in [1, 2, 3]) and the parser recovers by converting for into an identifier. The fix is to improve error recovery where the parser is converting keywords into an identifier. There are benefits for doing this but maybe not doing this conversion might be more beneficial. Point 12 in #10653.

Comment on lines 556 to 559
// <CURSOR> is inside leading or trailing whitespace. (Even
// when enclosed in parentheses.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, parentheses aren't part of the node's range unless it's relevant e.g., tuples.


let model = ty_python_semantic::SemanticModel::new(db.upcast(), file);
model
.completions(target.node())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we would need to model shadowing here somehow because the completions should understand the kind of "foo" symbol.

I guess a naive way to solve this would be to keep a running list of completions as we traverse the scope upwards and avoid adding a symbol if it already exists.

This is analogous to the existing `Tokens::after` method. Its
implementation is almost identical.

We plan to use this for looking at the tokens immediately before the
cursor when fetching completions.
@BurntSushi BurntSushi force-pushed the ag/completion-scopes branch 2 times, most recently from 669fc64 to 9acee50 Compare May 29, 2025 14:09
Previously, completions were based on just returning every identifier
parsed in the current Python file. In this commit, we change it to
identify an expression under the cursor and then return all symbols
available to the scope containing that expression.

This is still returning too much, and also, in some cases, not enough.
Namely, it doesn't really take the specific context into account other
than scope. But this does improve on the status quo. For example:

    def foo(): ...
    def bar():
        def fast(): ...
    def foofoo(): ...

    f<CURSOR>

When asking for completions here, the LSP will no longer include `fast`
as a possible completion in this context.

Ref astral-sh/ty#86
@BurntSushi BurntSushi force-pushed the ag/completion-scopes branch from 9acee50 to 9134bc2 Compare May 29, 2025 14:13
@BurntSushi
Copy link
Member Author

OK, I've addressed what feedback I think is plausible to do in this PR. I've recorded the notes about where completions can be improved (which I'll try to summarize in GitHub issues when I get a chance) for future work. I don't have a good sense of how hard some of these things are yet. e.g., Detecting whether the cursor is in an unreachable scope and ensuring that instance methods aren't treated as if they were normal functions.

@BurntSushi BurntSushi merged commit 33ed502 into main May 29, 2025
35 checks passed
@BurntSushi BurntSushi deleted the ag/completion-scopes branch May 29, 2025 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

server Related to the LSP server ty Multi-file analysis & type inference

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants