add tags.scm queries #18

the-mikedavis · 2021-12-17T15:23:03Z

tags.scm powers GitHub code navigation for languages that use tree-sitters.

For example with ruby...

You can see class/method declarations:

And see the references (where those classes/methods end up being used):

These queries ended up taking inspiration from other tags.scm in the @tree-sitter org repos, mostly python and ruby.

I don't think there's a framework in place for testing these queries, but I think it wouldn't be too much trouble to write a small shell script that diffs the output of tree-sitter tags fixture.ex against a known solution for some fixtures. There is the ability to test queries like so: tree-sitter query --test query/tags.scm test/query/fixture.ex (added in tree-sitter#775) but the precedence rules seem to be backward between querying and determining tags (e.g. module/function definitions end up as @reference.calls in the query output but are correctly identified by tree-sitter tags). I can look at making a PR to tree-sitter that adds a flag for testing tree-sitter tags or maybe somehow reverses precedence in tree-sitter query. We should be able to use tree-sitter#1547 to test these queries if/when merged.

A sample output of tree-sitter tags path/to/mint/mix.exs with this Mint mix.exs:

Mint.MixProject	| module  	def (0, 10) - (0, 25) `defmodule Mint.MixProject do`
Mix.Project	| call    	ref (1, 6) - (1, 17) `use Mix.Project`
project   	| function	def (6, 6) - (6, 13) `def project do`
Mix       	| call    	ref (11, 23) - (11, 26) `start_permanent: Mix.env() == :prod,`
env       	| call    	ref (11, 27) - (11, 30) `start_permanent: Mix.env() == :prod,`
elixirc_paths	| call    	ref (12, 21) - (12, 34) `elixirc_paths: elixirc_paths(Mix.env()),`
Mix       	| call    	ref (12, 35) - (12, 38) `elixirc_paths: elixirc_paths(Mix.env()),`
env       	| call    	ref (12, 39) - (12, 42) `elixirc_paths: elixirc_paths(Mix.env()),`
deps      	| call    	ref (13, 12) - (13, 16) `deps: deps(),`
package   	| call    	ref (29, 15) - (29, 22) `package: package(),`
application	| function	def (46, 6) - (46, 17) `def application do`
Mint.Application	| call    	ref (49, 12) - (49, 28) `mod: {Mint.Application, []}`
package   	| function	def (53, 7) - (53, 14) `defp package do`
elixirc_paths	| function	def (60, 7) - (60, 20) `defp elixirc_paths(:test), do: ["lib", "test/support"]`
elixirc_paths	| function	def (61, 7) - (61, 20) `defp elixirc_paths(_env), do: ["lib"]`
deps      	| function	def (64, 7) - (64, 11) `defp deps do`

So the queries tag function/module definitions and usages of functions/modules. It's notable that modules are tagged as @reference.call when not being defmoduled: this lines up with the behavior of the ruby queries and allows you to jump to definitions of modules, it's not saying that those modules are function invocations. These queries aren't smart enough to figure out some language semantics like resolving aliased modules. For "precise navigation" which respects language semantics, we'd need to write some stack-graph construction queries (see this blog post), but stack-graphs are very new and it may be wise to wait until they're matured. Also the python queries aren't open-sourced afaict so I haven't really figured out the tricks yet 😅. Once I figure all that out, I'll try sending in some stack-graphs queries :)

@patrickt is this something I can trouble you with setting up if/once it's merged in (and maybe lend some review eyes)? I appreciate all your help with the integration stuff :)

the-mikedavis · 2021-12-17T15:23:37Z

oh whoops I didn't see #17, I'll try these queries against that branch and see if it changes anything

edit: looks good, these queries are producing the same results on both branches

josevalim · 2021-12-17T15:42:43Z

This is awesome @the-mikedavis! Quick question: do arities play a role in those queries? Should they?

If they do, then we need to consider foo |> bar() is calling bar() with one argument. If they don't, then we are good!

the-mikedavis · 2021-12-17T15:47:10Z

I believe arities are not considered, I think that wouldn't play well with some languages like javascript where arities can be flexible. It seems to me like it's very similar to ctags: the tokens on the left-hand side of the table (albeit parsed out by tree-sitter instead of ctags) become the search candidates.

I haven't dug far enough into stack-graphs and precise navigation to be sure, but that's probably something we'll have to keep in mind for writing those stack-graph queries 👍

josevalim · 2021-12-17T15:50:12Z

@the-mikedavis perhaps one option is to include the arity in the token name:foo/3. But I am not sure if this is a good or bad idea.

the-mikedavis · 2021-12-17T16:04:33Z

Ah I like that idea. Afaik you don't have any control over modifying that token with how tree-sitter tags curently works, but that would make navigation a breeze if it did work, especially in big projects where you have a bunch of functions with the same names but different arities

jonatanklosko · 2021-12-17T16:24:12Z

Hey @the-mikedavis, fantastic!

the precedence rules seem to be backward between querying and determining tags

This is definitely a tree-sitter bug, it's easy to verify using the highlight queries.

# Running highlighting test passes
npx tree-sitter test --filter test/highlight

# Running the query tests with the corresponding queries doesn't work
npx tree-sitter query --test queries/highlights.scm test/highlight/**/*

cc @patrickt @maxbrunsfeld

Do you know how github/semantic fits into the picture? It seems necessary to add Elixir bindings there, but having another look I see that tag matching does the same as the corresponding tag query, so I'm wondering how these two relate.

jonatanklosko · 2021-12-17T16:27:52Z

queries/tags.scm

+  right: (identifier) @name) @reference.call
+
+; * modules
+(alias) @name @reference.call


Do we intentionally use @reference.call for all references or could we use @reference.module?

Interestingly the Python queries have neither of those for classes/modules and yet it works 🤔

Good question! Yep I tried out @reference.module initially but I thought it was not working. I just gave it another try and the above output changes to:

Mint.MixProject | module def (0, 10) - (0, 25) `defmodule Mint.MixProject do` Mix.Project | module ref (1, 6) - (1, 17) `use Mix.Project` project | function def (6, 6) - (6, 13) `def project do` Mix | module ref (11, 23) - (11, 26) `start_permanent: Mix.env() == :prod,` env | call ref (11, 27) - (11, 30) `start_permanent: Mix.env() == :prod,` elixirc_paths | call ref (12, 21) - (12, 34) `elixirc_paths: elixirc_paths(Mix.env()),` Mix | module ref (12, 35) - (12, 38) `elixirc_paths: elixirc_paths(Mix.env()),` env | call ref (12, 39) - (12, 42) `elixirc_paths: elixirc_paths(Mix.env()),` deps | call ref (13, 12) - (13, 16) `deps: deps(),` package | call ref (29, 15) - (29, 22) `package: package(),` application | function def (46, 6) - (46, 17) `def application do` Mint.Application | module ref (49, 12) - (49, 28) `mod: {Mint.Application, []}` package | function def (53, 7) - (53, 14) `defp package do` elixirc_paths | function def (60, 7) - (60, 20) `defp elixirc_paths(:test), do: ["lib", "test/support"]` elixirc_paths | function def (61, 7) - (61, 20) `defp elixirc_paths(_env), do: ["lib"]` deps | function def (64, 7) - (64, 11) `defp deps do`

which looks happy.

I see in some other queries now that there is a little variance like in these c-sharp tags.scm: https://github.com/tree-sitter/tree-sitter-c-sharp/blob/3104df21065af0f3d51e05a96cd0e2ff16a6f982/queries/tags.scm, which also makes me think that what is currently being tagged as @definition.implementation should actually be @reference.interface for code like

defimpl Jason.Encoder, for: List do

On the other hand, the ruby query has this line:

[(identifier) (constant)] @name @reference.call

which tags modules/classes as @reference.calls though and I've seen module references work in ruby code (for example Appsignal), so I'm not sure what the right move is. Maybe all definitions/references are pooled together when performing the search? Or maybe it just helps with ranking?

I think I'll update this line and the @reference.interface as well 👍

Ah, (constant) is the inner node for both class and module, I missed initially!

If the Jason.Encoder alias appears anywhere else we won't know if it's a module or a protocol. I think we should just use module for protocols as well (which is actually true), unless I'm missing some benefits of tagging them separately.

ah yeah I think that makes sense, I could see things being ambiguous if you do a @derive {Jason.Encoder, ..} and search doesn't match up that @reference.module with the @definition.interface

I'll cut out all that interface stuff in preference of modules 👍

Also, definition.function is not consistent with reference.call (as opposed to reference.function), but we're probably missing some information here, so let's wait ^^

the-mikedavis · 2021-12-17T16:28:39Z

Based on this documentation, I think semantic is not in the navigation picture, it seems to all be driven by tree-sitter and tags.scm, but we'll need a GitHubber to weigh in to be sure

jonatanklosko · 2021-12-17T16:33:33Z

Oh wow, maybe the "Used for code navigation on github.com" in github/semantic is no longer accurate then. I believe the GitHub docs were updated since I last checked too :D

Let's wait for some insights then :)

patrickt · 2022-01-03T15:11:01Z

@the-mikedavis @jonatanklosko Correct: semantic is no longer part of the tags pipeline (we switched to tree-sitter native tags support so as to avoid the overhead of pulling in a whole program analysis framework to do a simple analysis).

I’ll take this patch for a spin locally with our tagging backend. I can’t make guarantees about if or when it’ll land—we have to figure out, product/documentation-wise, how to pull in this functionality without making it part of our core supported languages. But I’m super excited about this work; thank you all!

patrickt · 2022-01-07T20:44:06Z

I can confirm that this patch works for our internal code-nav systems. Again, I can’t make promises as to when it will land—but this patch should be merged, in my opinion!

jonatanklosko · 2022-01-08T12:53:46Z

@patrickt fantastic news, thanks a lot! Fingers crossed for the integration to happen :D

@the-mikedavis awesome job!

patrickt · 2022-02-10T18:02:51Z

Elixir code navigation is now active on GitHub!

jonatanklosko · 2022-02-10T21:40:43Z

@patrickt fantastic, thanks again for all the integration work!

@the-mikedavis beautiful job with the tags 🐱

add tags.scm queries

2de043d

jonatanklosko reviewed Dec 17, 2021

View reviewed changes

the-mikedavis added 3 commits December 17, 2021 10:39

mark defimpl modules as references to the interface

4f4db7b

mark modules as module references instead of calls

9cf1a61

define protocol modules as modules, remove interface references

93d17fa

the-mikedavis mentioned this pull request Dec 18, 2021

test tags queries in 'tree-sitter test' tree-sitter/tree-sitter#1547

Merged

the-mikedavis mentioned this pull request Jan 4, 2022

cli: deduplicate query results in 'tree-sitter query --test' tree-sitter/tree-sitter#1573

Closed

jonatanklosko merged commit 1516959 into elixir-lang:main Jan 8, 2022

the-mikedavis deleted the md-tags-query branch January 8, 2022 13:50

the-mikedavis mentioned this pull request Jan 12, 2022

Feature Request: Support stack-graphs to provide code nav in the absence of LSP, building on tree-sitter helix-editor/helix#1252

Open

patrickt mentioned this pull request Feb 11, 2022

Describe tagging and associated naming conventions for syntax captures. tree-sitter/tree-sitter#1649

Merged

the-mikedavis mentioned this pull request Feb 24, 2022

Add to GitHub gleam-lang/tree-sitter-gleam#14

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tags.scm queries #18

add tags.scm queries #18

the-mikedavis commented Dec 17, 2021 •

edited

Loading

the-mikedavis commented Dec 17, 2021 •

edited

Loading

josevalim commented Dec 17, 2021

the-mikedavis commented Dec 17, 2021 •

edited

Loading

josevalim commented Dec 17, 2021

the-mikedavis commented Dec 17, 2021

jonatanklosko commented Dec 17, 2021 •

edited

Loading

jonatanklosko Dec 17, 2021

jonatanklosko Dec 17, 2021

the-mikedavis Dec 17, 2021

jonatanklosko Dec 17, 2021

the-mikedavis Dec 17, 2021

jonatanklosko Dec 17, 2021

the-mikedavis commented Dec 17, 2021 •

edited

Loading

jonatanklosko commented Dec 17, 2021

patrickt commented Jan 3, 2022

patrickt commented Jan 7, 2022

jonatanklosko commented Jan 8, 2022

patrickt commented Feb 10, 2022

jonatanklosko commented Feb 10, 2022

add tags.scm queries #18

add tags.scm queries #18

Conversation

the-mikedavis commented Dec 17, 2021 • edited Loading

the-mikedavis commented Dec 17, 2021 • edited Loading

josevalim commented Dec 17, 2021

the-mikedavis commented Dec 17, 2021 • edited Loading

josevalim commented Dec 17, 2021

the-mikedavis commented Dec 17, 2021

jonatanklosko commented Dec 17, 2021 • edited Loading

jonatanklosko Dec 17, 2021

Choose a reason for hiding this comment

jonatanklosko Dec 17, 2021

Choose a reason for hiding this comment

the-mikedavis Dec 17, 2021

Choose a reason for hiding this comment

jonatanklosko Dec 17, 2021

Choose a reason for hiding this comment

the-mikedavis Dec 17, 2021

Choose a reason for hiding this comment

jonatanklosko Dec 17, 2021

Choose a reason for hiding this comment

the-mikedavis commented Dec 17, 2021 • edited Loading

jonatanklosko commented Dec 17, 2021

patrickt commented Jan 3, 2022

patrickt commented Jan 7, 2022

jonatanklosko commented Jan 8, 2022

patrickt commented Feb 10, 2022

jonatanklosko commented Feb 10, 2022

the-mikedavis commented Dec 17, 2021 •

edited

Loading

the-mikedavis commented Dec 17, 2021 •

edited

Loading

the-mikedavis commented Dec 17, 2021 •

edited

Loading

jonatanklosko commented Dec 17, 2021 •

edited

Loading

the-mikedavis commented Dec 17, 2021 •

edited

Loading