Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure get_range works for all syntax nodes parsed using Sourceror.parse_string #104

Merged
merged 2 commits into from
Sep 10, 2023

Conversation

zachallaun
Copy link
Contributor

@zachallaun zachallaun commented Sep 9, 2023

Fixes #103 and some other issues I found.

The goal here is that any syntax node parsed by Sourceror.parse_string should be passable to get_range without raising. I had the idea that we could walk the Sourceror codebase, call get_range on every syntax node, and ensure that it doesn't raise -- worked great and found a handful of bugs!

Summary of fixes:

  • Single-quote charlists and interpolated charlists: 'foo', 'foo#{bar}'
  • Stabs without args, such as in: fn -> :ok end
  • Arguments inside of captures: &1
  • "Raw" dot-calls, e.g. {:., _, args}

@zachallaun
Copy link
Contributor Author

The Elixir tree-sitter implementation's corpus is likely a better representation of "all valid syntax" than Sourceror's source.

It's licensed under Apache 2.0 and MIT (some parts of corpus are MIT according to the NOTICE in the repo root), so it would certainly be possible to vendor it in.

@zachallaun zachallaun marked this pull request as ready for review September 10, 2023 18:28
@zachallaun
Copy link
Contributor Author

Ready for review!

Copy link
Owner

@doorgan doorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the fixes! The PR looks good to merge to me 👍

Comment on lines +692 to +701
# This range currently ends on column 5, though it should be column 6,
# and appears to be a limitation of the parser, which does not include
# any metadata about the parens. That is, this currently holds:
#
# Sourceror.parse_string!("& &1") == Sourceror.parse_string!("&(&1)")
#
# assert to_range(~S"&(&1)") == %{
# start: [line: 1, column: 1],
# end: [line: 1, column: 6]
# }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can propose a change to the elixir parser? Something similar used to happen for qualified module aliases like

Foo
.
   Bar

Which is a super contrived example but still valid, we added the last metadata to allow range calculation on them

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I can post an issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may actually just be a bug. If you look at the underlying AST, & is just a call and should have a :closing metadata when parens are present:

iex(1)> Code.string_to_quoted!("foo(bar)", token_metadata: true)
{:foo, [closing: [line: 1], line: 1], [{:bar, [line: 1], nil}]}

iex(2)> Code.string_to_quoted!("foo bar", token_metadata: true)
{:foo, [line: 1], [{:bar, [line: 1], nil}]}

iex(3)> Code.string_to_quoted!("&(&1)", token_metadata: true)
{:&, [line: 1], [{:&, [line: 1], [1]}]}

iex(4)> Code.string_to_quoted!("& &1", token_metadata: true)
{:&, [line: 1], [{:&, [line: 1], [1]}]}

Copy link
Contributor Author

@zachallaun zachallaun Sep 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +727 to +731
test "should not raise on any three-element tuple parsed by parse_string" do
for relative_path <- Path.wildcard("lib/*/**.ex") do
assert_can_get_ranges(relative_path)
end
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a cool trick, and while for now it works it might be annoying for future PRs that unintendedly add code that braks range calculation.

I think your idea of bringing the examples from the elixir tree-sitter corpus sounds good, so we can add that in another PR 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your idea of bringing the examples from the elixir tree-sitter corpus sounds good, so we can add that in another PR 👍

Sounds good! So just to be clear: leave this for now, and then we can strip it out when we pull in the better corpus of syntax examples?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup!

Sourceror.get_range(quoted)
rescue
e ->
flunk("""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about ExUnit.Assertions.flunk/1, this is cool!

@doorgan doorgan merged commit 2eba98f into doorgan:main Sep 10, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

get_range raises on dot-call nodes
2 participants