-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code fragment context building functionality #12566
Comments
José, great start here. I'm going to answer from the perspective of Lexical, which is slightly different than that of To answer your questions: Variables:
I'd prefer
Definitely, we currently complete both, thanks to
Is that legal elixir though? My perspective with the code above is that there would be no completions, as you haven't given any hint yet, but even if you had, I'm ok with not returning a completion. Module Information
Lexical is totally fine with relying on previously compiled modules for its completion of functions, submodules, etc. I know that Additional thoughts:I would like to have blessed access to the tokenizer, or better yet, I would like to have access to an API that tells me where I am in code. Today, I've been playing around with calling calling Imagine I have the following: %Foo.Bar.Baz{ | My Something I've explored by using the tokenizer directly is creating backtracking expanding contexts (something I thought I could get with alias Foo.Bar.{Baz, Quux|} The cursor is inside of a module alias, inside of a multiple alias, which is itself inside of a call to |
I believe the long term goal is for
If you have this:
For this use case, we use To give more context: the goal of
|
defmodule Test do
some_var = 3
def foo do
some_var + 1
end
end
I know who I'm speaking to here ;), but what am I missing?
Ah, I see. I read the As a refinement, it would be easier for me, personally, to be able to pass in a position to each of the functions rather than assuming the end of the document is the cursor position. The most common behavior of a language server is someone editing a document somewhere in the middle rather than appending to the end. Right now, it's very easy for me to have a position and a document's text rather than the substring of the document from the beginning to where the cursor is inserting text. If this isn't in the cards, not all is lost, I can easily make my document abstraction return what's needed, it'd just be so much easier for me to do: document
|> Document.to_string()
|> Code.Fragment.container_cursor_to_quoted(cursor_position: {env.position.line, env.position.character}) but I could easily make this happen too: document
|> Document.fragment(end: env.position)
|> Code.Fragment.container_cursor_to_quoted() I'll give the above things a shot and see if I can use this newfound knowledge of |
As mentioned above, you can unquote it:
I completely understand that but we have been building from the bottom-up. The immediate goal is to provide a bunch of low-level functions so you no longer need to rely on private APIs. The existing functions solve specific concerns instead of the whole thing. You want a Ferrari, but first we have to build the wheels, the engine... :) |
Container means we return information about the parent container of the current node where the cursor is rather than the cursor itself.
That's natural. The |
Well yes, it's just surprising to see an exhaust, half a frame and seventeen cylinders over in the corner ;). Right now, I'd take a gas powered pogo stick.
Yes, but I'm going to quibble here (and naming is subjective), since most nodes are containers. This feels more like you're building the AST until you get to the cursor, which is then marked. This isn't the biggest deal in the world, and can be improved with documentation.
Yep, this module definitely needs some help there. I'll use my ignorance of it to improve the docs, though it's really hard when you don't know the intention of the functions. I'm still coming to grips with them, and how to use them. That said, I'll give it a shot. Just as I was going to submit this, I stumbled upon the following. I thought that "alias MyThing.Other" |> Code.Fragment.container_cursor_to_quoted
{:ok, {:__cursor__, [line: 1], []}} Where's the rest? I was expecting to see the alias call, the module name, but I only have the cursor. My goal is to determine if I'm inside of an alias call, in its many forms. What is the algorithm I should use to do that? |
Or alternatively you could have nothing, not even a half frame, and only use private APIs which may break on every new Elixir release. :) Anyway, my main point was not the metaphor. But to acknowledge that, although those functions are low-level, they do have their uses and they have been helpful to IEx, Livebook, and parts of Elixir LS. Now let's continue moving forward.
There is no container in this case. As the docs explain: “A container is any Elixir expression starting with (, {, and [. This includes function calls, tuples, lists, maps, and so on”. But I would need to understand a bit more what you want to achieve. Why do you want to know if you are inside an alias call? |
Jose, I'm being silly here, that was meant to amuse, not insult :)
Because I would like to determine which completions are relevant to the user. If I'm inside of an alias call, I will not suggest functions, for example. This really improves the user experience quite a bit. Lexical does this now, but it uses the tokenizer and backtracks, which isn't the easiest thing to understand and code. |
Right. You don't need to know if you are in an
Or this tells you are at the beginning of an alias call:
Check IEx.Autocomplete for more examples. The general goal for autocompletion is: use cursor_context to find what you need to complete. If ambiguous, use container_cursor_to_quoted to find more surrounding information. Exactly how you complete it we can't help right now. |
In general, I'm trying to build language server that's easy for people to contribute to, and to that end, I've made some abstractions around completion, visible in the I'll also point out that including parenthesis makes the definition of container fairly arbitrary in a language like elixir, as they can be left out with a formatter configuration, which would mean that it's possible that a container differs from project to project. |
I don't really like that though. There are cases where it's telling me I don't like or use the pattern below, but I've seen it a bunch and the above logic will fail. alias :et|
with the inevitable goal of alias :ets, as: Ets I get Currently, iex will show me functions if I type |
Ah, that was the example I was missing. Let me think a bit more about this. :) |
Completions are only one of the 3 main aspects. The second one is ability to identify what the symbol under cursor is given the context. Is this a variable or attribute? Can we statically evaluate it? Third one is identifying the context itself. Are we inside a function body/header/list/map/module/type.
I know this is tricky to achieve, especially with elixir being lisp like. It would have to be smarter than container_cursor_to_quited which as I understand it blindly inserts closing tokens. It would need to take into account indentation and common formatting style. elixir_sense tries to do that with regex but the code is nasty and only works in certain cases.
I'd add attributes, types, structs, records, sigils. elixir_sense extract that information from AST only but the code is pretty complex and tends to break every release. Having a stable interface would be nice. It has another shortcoming - it only understands a handful of builtin macros as it's not doing expansion. Being able to leverage complier here would also increase accuracy and make understanding of ecto/phoenix/my_fancy_one_use_dsl work out of the box in more cases
Context or scope is important
Most elixirLS completions already uses it and I plan to migrate some more after a bump to min 1.14. It works pretty well besides occasional
Like above elixir_sense has its own engine extracting those from AST
elixir_sense can expand
Sure, vars are tricky but I'd say aliases come close. Simply collecting references is trivial doesn't help much. Actually understanding where variable is defined and where referenced is much better and elixir_sense does that. It allows for navigation to variable definition, finding usages, better contextual completions. elixir_sense also tries to infer the type of the variable (only a handful of types is supported). I envision it should be able to access type info when types land in elixir. The simple approach may be enough for IEx but it's definitely not sufficient for IDEs.
No problem with false positives - they will be filtered out by the user or will trigger compile diagnostics later. It's false negatives that are more frustrating.
elixir_sense extracts local functions, typespecs and structs from AST.
Yep, global dictionary for funs/types/structs/modules vs code position based scope for alias/import/require/vars/attributes. I know there are a few exceptions here (e.g. private macros are only available after the definition in module body)
That was the approach in original alchemist server (based on IEx). The main problem was it only was aware of public funs. I integrated AST based completions in elixir_sense. I plan to integrate tracer info here but it's not trivial when there are 2 sources of truth.
Arity is tricky. Macros are one thing,
Sounds good but it might be difficult |
Regarding alias expressions there are irritating cases where
|
@lukaszsamson Yep, that's exactly the same thing you get if you're inside of a struct reference. The nested context matters to us. |
I've been playing around with the Sourceror library, and what would be cool to me would be a function like: Really want to second everything Lukasz said three comments up. He has a lot better handle on the domain than I do, but everything he said tracks. |
So if I get it right, in order to understand structs, you parse the AST explicitly looking for
Can you expand on how ElixirSense does it? AST traversal? Do you expand macros as you do it?
Good call on default args.
This can be solved with |
Exactly
No macro expansion here, only AST traversal with scoping rules. The first occurrence is treated as variable definition, all later as usages. elixir_sense also traverses |
This is now supported on Elixir main branch. A combination of iex(4)> demo = fn arg -> arg |> Code.Fragment.container_cursor_to_quoted |> elem(1) |> Macro.path(&match?({:__cursor__, _, []}, &1)) end
#Function<42.125776118/1 in :erl_eval.expr/6>
iex(5)> demo.("alias ")
[
{:__cursor__, [line: 1], []},
{:alias, [line: 1], [{:__cursor__, [line: 1], []}]}
]
iex(6)> demo.("alias Foo.Bar.")
[
{:__cursor__, [line: 1], []},
{:alias, [line: 1], [{:__cursor__, [line: 1], []}]}
]
iex(7)> demo.("alias Foo.Bar.Bar, as: DAS")
[
{:__cursor__, [line: 1], []},
{:as, {:__cursor__, [line: 1], []}},
[as: {:__cursor__, [line: 1], []}],
{:alias, [line: 1],
[
{:__aliases__, [line: 1], [:Foo, :Bar, :Bar]},
[as: {:__cursor__, [line: 1], []}]
]}
]
iex(8)> demo.("alias Foo.Bar.{Baz,")
[
{:__cursor__, [line: 1], []},
{{:., [line: 1], [{:__aliases__, [line: 1], [:Foo, :Bar]}, :{}]}, [line: 1],
[{:__aliases__, [line: 1], [:Baz]}, {:__cursor__, [line: 1], []}]},
{:alias, [line: 1],
[
{{:., [line: 1], [{:__aliases__, [line: 1], [:Foo, :Bar]}, :{}]},
[line: 1],
[{:__aliases__, [line: 1], [:Baz]}, {:__cursor__, [line: 1], []}]}
]}
]
iex(9)> demo.("alias Foo.Bar.{")
[
{:__cursor__, [line: 1], []},
{{:., [line: 1], [{:__aliases__, [line: 1], [:Foo, :Bar]}, :{}]}, [line: 1],
[{:__cursor__, [line: 1], []}]},
{:alias, [line: 1],
[
{{:., [line: 1], [{:__aliases__, [line: 1], [:Foo, :Bar]}, :{}]},
[line: 1], [{:__cursor__, [line: 1], []}]}
]}
] You can match on those code snippets to answer your questions. You can even provide this as a general framework I think where you tell users if they are on the first argument, second, inside a keyword, etc.
I plan to implement this for So in a nutshell, for code completion, we need:
@lukaszsamson now that |
Here is how aliases work: b59907e - For IEx, we only need to look at the container when there is a dot (which is ambiguous) and when nothing was written. |
Great conversation! I only have a small note to add on all this: There might be a slight difference in alignment between the goals of an
These two are very similar but not equal:
|
Hi @scohen and @scottming! (cc @lukaszsamson)
I would discuss some of the context building functionality that you mentioned. I have specific questions in this document that I would like your input on.
When we are working on a file, we need to provide completion on variable names, imports and so on. For example, take this buffer:
Where
<CURSOR>
is where the mouse cursor is. In order to provide completion, we need three things:A parser that is robust to syntax failures
Build a local context with imports, variables, aliases, local functions, etc
Understand which kind of completion we should provide (alias? function call? variable name?).
We only provide step 3 at the moment via
cursor_context
. My goal for now is to focus on step 2 (and then we can discuss step 1). To build a local context, we have 3 main sources of data: lexical, variables, and module data. Let's discuss them one by one.import, aliases, requires
This is the easiest information to gather. It can be done by 1. traversing the AST and then 2. defensively expanding all AST node, collecting all imports, aliases, and requires along the way.
Building the AST can be done with
container_cursor_to_quoted
. We don't have at the moment a safe way to expand its nodes. This could be provided.Variables
Variables are trickier because they have more complex scoping rules than imports, aliases, requires. There are two ways we could attempt to collect variables:
Simply traverse the AST and collect all
{atom, meta, atom}
nodes.Try to use the same approach as import, aliases, requires, but that would be tricky and I suspect it would also be brittle.
QUESTION: Which approach do you prefer? 1 definitely sounds simpler.
QUESTION: The biggest issue with collecting variables in both cases is deciding when a function starts. If we have this code:
some_var
is not immediately visible insidebat
. It is easy to detect those cases fordef
,defp
and friends, but remember anyone can define their own "def"-like function, likeNx
definesdefn
. So how to handle this? One option is to ignore the problem and show "some_var" anyway, for two reasons: they are not common, so the rate of false positive is low, and they can be accessed inside a function if we useunquote
.Module information
Something else we may want to complete is local function names and module attributes. Local function names is the trickier one, so let's focus on that.
All information so far (imports, aliases, variables) have to be defined before the cursor. Local functions are trickier because, even if
local_function
is defined after the cursor, we may still want to suggest it. How to gather this information?One option is to only suggest local functions defined in a previously compiled version of this module. So if we have
defmodule Foo do
previously compiled and it haddef local_function
in it, we know it exists and we can suggest it.Another option is to provide a heuristic. Similar to variables, we can look at the AST and collect all nodes
{atom, meta, list}
. If those nodes exist, it means they were either: previously imported, which we can check on the imports information, or they were locally defined. This has two additional complexities:Given we want to traverse the whole AST, it is not enough to build a parser that builds the AST up until cursor. We need to always attempt to build the AST for the whole file (see Add string_to_quoted in Code.Fragment #11967)
Because of macros such as
|>
, we cannot look at{atom, meta, list}
to fetch the arity because the macro will add one additional argument. So we need a heuristic to expand certain AST nodes in order to fetch the real arity (a simple heuristic is to expand all operators)QUESTION: I believe the best solution in this case is actually a mixture of both 1 and 2. Use the heuristic to build as much information as possible and then refine the heuristic with information from a compiled version of the module (for example, the compiled version can have docs, types, metadata, etc). Do you agree? Anything to add?
Any additional thoughts?
The text was updated successfully, but these errors were encountered: