[ty] Improve completions by leveraging scopes#18281
Conversation
|
|
MichaReiser
left a comment
There was a problem hiding this comment.
Nice. I think this is a great starting point! We can look into those tricky cases in a separate PR.
I recommend uncommenting the tests and instead add a TODO to each test with the expected snapshot
|
|
||
| assert_snapshot!(test.completions(), @r" | ||
| foo | ||
| f |
There was a problem hiding this comment.
This is funny but makes sense :)
| ); | ||
|
|
||
| assert_snapshot!(test.completions(), @r" | ||
| foo |
There was a problem hiding this comment.
I forgot that our completion is indeed naive... I was like, wait, why does foo show up but makes sense
There was a problem hiding this comment.
It looks like some other part of the LSP stack filters this out, though? Like if I try this in the editor, it won't actually show "foo" when "g" is already typed.
There was a problem hiding this comment.
Yeah, filtering is usually done on the client side so that it's consistent across other language servers but the server can do it's own filtering as well.
crates/ty_ide/src/completion.rs
Outdated
| // If we didn't want to change the ranges on the AST nodes, another | ||
| // approach here would be to get the inner most scope, and explore | ||
| // its ancestors until we get to a level that matches the current | ||
| // cursor's indentation. This seems fraught however. It's not clear | ||
| // to me that we can always assume a correspondence between scopes | ||
| // and indentation level. |
There was a problem hiding this comment.
Instead of assuming any relationship between node and scope. I think I'd prefer if the IDE integration traverses downwards to find a node who's indentation matches the indentation at the current position, then lookup the scope for that node. This may require passing the scope to model.completions because our AST, unfortunately, doesn't support traversing upwards once you have an AnyNodeRef
There was a problem hiding this comment.
I don't think scopes are a public API, but I think I get what you're saying. I'm not sure that will work, but I think it will work in some cases and certainly seems worth exploring!
850c3e3 to
43b4337
Compare
crates/ty_ide/src/completion.rs
Outdated
|
|
||
| let model = ty_python_semantic::SemanticModel::new(db.upcast(), file); | ||
| model | ||
| .completions(target.node()) |
There was a problem hiding this comment.
Just something I randomly noticed: there's no deduplication here. If you have something like the following, foo will show up twice in the suggested completions:
def foo():
pass
class C:
def foo(self):
pass
def bar(self):
f<CURSOR>There was a problem hiding this comment.
Yeah, we would need to model shadowing here somehow because the completions should understand the kind of "foo" symbol.
I guess a naive way to solve this would be to keep a running list of completions as we traverse the scope upwards and avoid adding a symbol if it already exists.
There was a problem hiding this comment.
I think another important issue revealed by this example is that we aren't modeling the quirks of Python name resolution, e.g. that names in a class scope are not visible within nested function scopes (so only the global foo should even be visible here.) Doesn't necessarily need to be in this PR, but I think this will be important to fix; having all methods of a class wrongly show up as autocomplete suggestions within other methods of the class will be pretty annoying.
There was a problem hiding this comment.
For now, I've "fixed" this by just sorting and de-duplicating the completions returned by the server. I had assumed the client would de-duplicate since it was already doing prefix matching, but that does not appear to be the case. I've also added tests covering these cases (along with an appropriate FIXME for the false positives that @carljm points out).
I agree that what we're doing right now is very dumb. I'm working my way up to fancier things. :-) I agree that the false positives here should be fixed and that completions should be more than just name strings.
| let mut symbols = vec![]; | ||
| for (file_scope, _) in index.ancestor_scopes(file_scope) { | ||
| for symbol in index.symbol_table(file_scope).symbols() { | ||
| symbols.push(symbol.name().clone()); | ||
| } | ||
| } | ||
| symbols |
There was a problem hiding this comment.
This does not take type information into account. For example, it will include symbols from unreachable branches:
import sys
if sys.version_info < (3, 0):
legacy_symbol = 1
l<CURSOR> # will include legacy_symbolThe extend_with_declarations_and_bindings function in #18251 does something similar to this, but excludes definitely-unbound symbols.
I'm not sure what the desired functionality here is (or if there is a performance tradeoff). Maybe it's fine to error on the side of providing too many completions.
There was a problem hiding this comment.
I think we would at least want to make sure that unreachable bindings are included in the autocomplete suggestions if the user is editing a branch of code that is itself understood by ty to be unreachable
There was a problem hiding this comment.
Yeah, I think it would be useful to exclude symbols that are unreachable although that doesn't necessarily need to be done in this PR itself.
There was a problem hiding this comment.
Note that I'm actually saying the opposite to Dhruv and David here. I agree that excluding unreachable bindings makes sense if the user is editing reachable code. But what if we've got a situation like this?
Config: --python-version=3.10
import sys
if sys.version_info >= (3, 11):
long_variable_name = []
lo<CURSOR>I think it'll be pretty annoying for the user there if we don't include long_variable_name in the list of autocomplete suggestions when the cursor is at that location. But the binding is in unreachable code!
There was a problem hiding this comment.
That's why I was wondering if we should lean towards providing too many completions.
But note that other LSPs also have degraded experience in unreachable code sections:
That's not a great reason for not trying to do better here. But I also don't think it's high priority to provide the best-possible LSP experience inside unreachable sections of code. For other languages, I often deliberately change the config when I'm editing otherwise unreachable code.
There was a problem hiding this comment.
Yeah I agree we can do better here, but I think I generally prefer offering too many completions in a case like this until we can implement what @AlexWaygood suggests.
83b4eb8 to
da80c2a
Compare
| let mut symbols = vec![]; | ||
| for (file_scope, _) in index.ancestor_scopes(file_scope) { | ||
| for symbol in index.symbol_table(file_scope).symbols() { | ||
| symbols.push(symbol.name().clone()); | ||
| } | ||
| } | ||
| symbols |
There was a problem hiding this comment.
Yeah, I think it would be useful to exclude symbols that are unreachable although that doesn't necessarily need to be done in this PR itself.
| ); | ||
|
|
||
| assert_snapshot!(test.completions(), @r" | ||
| foo |
There was a problem hiding this comment.
Yeah, filtering is usually done on the client side so that it's consistent across other language servers but the server can do it's own filtering as well.
| // It's not totally clear why `for` shows up in the | ||
| // symbol tables of the detected scopes here. My guess | ||
| // is that there's perhaps some sub-optimal behavior | ||
| // here because the list comprehension as written is not | ||
| // valid. |
There was a problem hiding this comment.
This is due to error recovery. The source code contains a syntax error because the identifier is absent ([ for bar in [1, 2, 3]) and the parser recovers by converting for into an identifier. The fix is to improve error recovery where the parser is converting keywords into an identifier. There are benefits for doing this but maybe not doing this conversion might be more beneficial. Point 12 in #10653.
crates/ty_ide/src/completion.rs
Outdated
| // <CURSOR> is inside leading or trailing whitespace. (Even | ||
| // when enclosed in parentheses.) |
There was a problem hiding this comment.
Yeah, parentheses aren't part of the node's range unless it's relevant e.g., tuples.
crates/ty_ide/src/completion.rs
Outdated
|
|
||
| let model = ty_python_semantic::SemanticModel::new(db.upcast(), file); | ||
| model | ||
| .completions(target.node()) |
There was a problem hiding this comment.
Yeah, we would need to model shadowing here somehow because the completions should understand the kind of "foo" symbol.
I guess a naive way to solve this would be to keep a running list of completions as we traverse the scope upwards and avoid adding a symbol if it already exists.
This is analogous to the existing `Tokens::after` method. Its implementation is almost identical. We plan to use this for looking at the tokens immediately before the cursor when fetching completions.
669fc64 to
9acee50
Compare
Previously, completions were based on just returning every identifier
parsed in the current Python file. In this commit, we change it to
identify an expression under the cursor and then return all symbols
available to the scope containing that expression.
This is still returning too much, and also, in some cases, not enough.
Namely, it doesn't really take the specific context into account other
than scope. But this does improve on the status quo. For example:
def foo(): ...
def bar():
def fast(): ...
def foofoo(): ...
f<CURSOR>
When asking for completions here, the LSP will no longer include `fast`
as a possible completion in this context.
Ref astral-sh/ty#86
9acee50 to
9134bc2
Compare
|
OK, I've addressed what feedback I think is plausible to do in this PR. I've recorded the notes about where completions can be improved (which I'll try to summarize in GitHub issues when I get a chance) for future work. I don't have a good sense of how hard some of these things are yet. e.g., Detecting whether the cursor is in an unreachable scope and ensuring that instance methods aren't treated as if they were normal functions. |

This PR improves completions by leveraging scopes. That is, we try to
identify an expression corresponding to the cursor's current position.
We then use that expression to find it scope. Our completions are then
the names in that scope's (and its ancestors') symbol tables.
The most straight-forward way this improves on the status quo can be
seen from a simple example:
Today, this would include
foofooin the completions. But with thisPR,
foofoois not included.I had originally started off with a bit more shenanigans in terms of
identifying the expression under the cursor. But in the end, at least
for scopes, a very simple solution seems to work well.
The major caveat here is that this doesn't work well when requesting
completions before having typed anything. While this doesn't fail in
every case, it is somewhat common. For example:
In this instance, the AST nodes for
fooandfoofooactually endat exactly the same byte offset, so there is nothing for us to easily
latch on to that comes after
foofoobut before the end offoo.In particular, we really want to enumerate the symbols for the scope
contained in the function body of
foo. But it's unclear how to(easily) actually get the scope given the cursor's position. One
approach might be to tweak the ranges of the AST nodes to make finding
overlapping AST nodes easier. Either way, in this case currently, we
fall back to the global scope. So we don't end up with a complete
failure, but I'm guessing this will prove to be frustrating for users.
(I do not think we need to fix this in this PR.)
Note that this doesn't apply when something has been typed. For example:
In this case, we see that
fis part of the body offooandcorrectly provide completions for both
fooandfoofoo.There are some other cases where a
<CURSOR>being in leading/trailingwhitespace results in sub-optimal failures. The commented out test
cases in the second commit cover a smattering of them.
Ref astral-sh/ty#86