Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the AST to make it easier to pattern match and remove ambiguities #24

Open
doorgan opened this issue Jul 11, 2021 · 7 comments
Open
Labels
enhancement New feature or request

Comments

@doorgan
Copy link
Owner

doorgan commented Jul 11, 2021

This is derived from a discussion at #23. The tl;dr is that wrapping literals in :__block__ works, but makes it cumbersome to pattern match against. Some other nodes could be changed as well, like sigils.

The changes to be made would be such that the following syntaxes would be mapped as follows:
foo -> {:var, metadata, :foo}
400_000 -> {:int, [token: "400_000" | metadata], 400000}
42.0 -> {:float, [token: "42.0" | metadata], 42.0}
:foo -> {:atom, metadata, :foo}
[1, 2] -> {[], metadata, [1, 2]}
{1, 2} -> {:{}, metadata, [1. 2]}
~w[a b c] -> {{:sigil, "w"}, metadata, [segments, modifiers]}

{:sigil, meta, ["w", segments, modifiers]} for sigils would be ambiguous with function calls, for example. ~w/foo #{:bar}/a and foo("w", "foo #{:bar}", 'a') already produce the very same AST, with the only exception that the sigil has a :delimiter metadata that lets us recognize it as an actual sigil. Because AST nodes metadata order is not guaranteed, we can't just pattern match the metadata to assert it's a sigil. Hence the idea of using a tuple as the node type instead.

The same goes for lists. If the list node is {:list, meta, elements}, then [1, 2, 3] and list(1, 2, 3) would produce exactly the same AST, so I was thinking on using {[], meta, elements} instead. I'm not very happy with the loss in readability there, but it would work.

There are some opportunities to enrich the AST further, for example, the AST for A.B.C.fun is this:

{{:., _, [{:__aliases__, [], [:A, :B, :C], :fun]}, _, []}

By leveraging the :static_atoms_encoder parser option, we could transform it to this:

{{:., _, [{:__aliases__, [], [
  {:atom, _, :A}, {:atom, _, :B}, {:atom, _, :C}
], {:atom, _, :fun}]}, _, []}

Which, of course, it a lot more verbose, but on the other hand it allows us to:
a) Have a more consistent AST, meaning we no longer have some places were atoms are wrapped and others were they're not
b) We can have more precise information about where exactly each alias segment is happening

For b we would still miss information about where exactly the dots are located, but at least we could accurately replace the inner segments.

The final change I had thought of is to make the parsing safe by default, meaning that parsing a file wont create new atoms. The downside is that the Formatter only works with atoms, so Sourceror.to_string would create new atoms anyways and we're back to square one.

@doorgan doorgan added the enhancement New feature or request label Jul 11, 2021
@doorgan
Copy link
Owner Author

doorgan commented Jul 11, 2021

Some side effects of this:

  • We need a custom traversal function instead of relying on Macro.traverse
  • The Zipper API may need to be updated to handle this new AST, meaning it may no longer work as expected with the regular AST
  • We need an extra step before passing the AST to the formatter, so we give it something it can understand

All of these are easily addressable.

@doorgan
Copy link
Owner Author

doorgan commented Jul 11, 2021

You can try these changes(with the exception of the static_atoms_encoder one) in the rich_ast branch:

{:sourceror, github: "doorgan/sourceror", branch: "rich_ast"}

CC @marcandre

@doorgan
Copy link
Owner Author

doorgan commented Jul 11, 2021

Sigils were changed to {:"~", metadata, [letter, contents, modifiers]}

@msaraiva
Copy link
Contributor

@doorgan would it be wiser to wait for these changes to be applied before using sourceror? I have a couple of mix tasks I want to add to Surface that could make good use of sourceror but I believe I may have to change the implementation a lot when the new AST comes out.

@doorgan
Copy link
Owner Author

doorgan commented Jul 27, 2021

Hi @msaraiva, I don't see these changes applied to the main branch anytime soon, as it's still too experimental and I realize it will take quite some time to find something that works fine for Sourceror use cases.

When the time to merge them comes closer I'll let you know and I'll be happy to help you migrate the Surface scripts :)

@msaraiva
Copy link
Contributor

@doorgan sounds good!

I didn't know these changes were that far away. I'll give v0.8.1 a try then.

Cheers.

@doorgan
Copy link
Owner Author

doorgan commented Jul 27, 2021

@msaraiva I initially intended to apply them in 0.9, but I started to deviate too much from the original goal and I want to have a clearer spec first. There's lot of nuances when it comes to having completely unambiguous node types and keeping the 3-tuple shape that I need to resolve 🙂. So maybe they will land in 0.10 or 0.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants