Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions to patch a string #13

Closed
doorgan opened this issue Jun 17, 2021 · 16 comments
Closed

Add functions to patch a string #13

doorgan opened this issue Jun 17, 2021 · 16 comments
Labels
enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@doorgan
Copy link
Owner

doorgan commented Jun 17, 2021

Sourceror.to_string/2 allows one to convert AST to string ensuring the comments land where they should. The issue is that even if after a manipulation only a little portion of the code is changed, the whole string will be formatted. This is not an issue if the user is already using the elixir formatter, but it is a problem otherwise.

It is possible to replace only subsets of a string by using Sourceror.get_range/1 in combination with Sourceror.to_string/2 to find the range where a modification would occur and replace it with the new text, but it would be nice for that functionality to be included in Sourceror itself.

A Sourceror.patch_string/2 could be introduced, taking the string to be changed as a first argument and a list of patches to be applied to it. A patch is a map holding the range that would be affected, and the text that would replace the old content.
The main issue is that the fixes should not have overlapping ranges. For example, in this code:

unless not allowed? do
  String.to_atom(foo)
end

One of the patches may want to replace String.to_atom(foo) with String.to_existing_atom(foo), while other patch may want to replace the whole unless with an if allowed? do ... end. There is an obvious conflict there, which limits the usefulness of this approach.

Another alternative is to add a function that instead of patching the string based on a list of patches, it accepts a function that traverses and updates the ast, and then does some tree-diffing magic to figure out the final set of changes to be applied, in a way that the two conflicting patches of the previous example would become a single patch that changes all at once.

I have already explored the first idea and found those limitations, which is why I will now start exploring the second one. The other aspect to keep in mind is the indentation of the patches if they span multiple lines, but we should be able to infer it from the surrounding lines.

@doorgan doorgan added enhancement New feature or request help wanted Extra attention is needed labels Jun 17, 2021
@doorgan doorgan added this to the 0.8 milestone Jun 17, 2021
@doorgan
Copy link
Owner Author

doorgan commented Jun 17, 2021

The second approach could be used to generate the list of patches, by picking the outermost changed node, and then feeding those to the function from the first approach?

@msaraiva
Copy link
Contributor

Hi @doorgan!

I have the feeling that trying to apply multiple patches at once might be very hard or even not possible due to the possible conflicts you already mentioned. A change in the inner node might completely invalidate the pattern used to match the outer node in the first place.

Since a change in a node can also impose an update in the metadata of any of the parent nodes as well as in all the nodes after that change, I believe the safest approach could be to apply one by one, like:

  1. Look for the first matching node to replace
  2. If found, replace it and recalculate the metadata of affected nodes. Otherwise, we're done.
  3. Repeat

One thing that needs to be treated with this approach would be a scenario where the user accidentally introduces an infinite loop by adding two opposite replacements, for instance, replace :a -> :b and later replace :b -> :a.

@doorgan
Copy link
Owner Author

doorgan commented Jun 18, 2021

Hi @msaraiva! Thanks for your thoughts :)

I have the same feeling, the more I dig into it the more I see there's not really a way to apply multiple patches without getting in trouble. Also, if the point of the function is to avoid messing up existing code formatting, even applying changes one by one is troublesome, for example if you want to change the name of a function and it's written like this:

defp decode(addr_t(:ind), _dest, data) do
  case data do
    <<0::6, data::bits>>            -> {:t_data_ind, nil, data}
    <<0b01::2, seq::4, data::bits>> -> {:t_data_con, seq, data}
    <<0b1000_0000::8>>              -> {:t_connect, nil, <<>>}
    <<0b1000_0001::8>>              -> {:t_discon, nil, <<>>}
    <<0b11::2, seq::4, 0b10::2>>    -> {:t_ack, seq, <<>>}
    <<0b11::2, seq::4, 0b11::2>>    -> {:t_nak, seq, <<>>}
    _                               -> {:error, :invalid_tpci}
  end
end

By replacing the range occupied by the function node the tabular formatting of the code in the body would be completely overrided.

I've also been digging into tools for other languages, and they don't have this kind of issues because they implement their own printer(recast for instance uses a custom "non-destructive" printer), while Sourceror is built on top of the Elixir formatter.

So I'm tempted to think a better and less hacky way to handle this use case is to make Sourceror.to_string/2 to accept a formatter module as an option, and to implement a formatter that prints code that is as close as possible to the original source code. Of course, this is a lot more work to do, but maybe it's the best approach?

@msaraiva
Copy link
Contributor

Not sure. I mean, by just loading the consolidated configuration of the .formatter files to keep the formatting applied by Sourceror.to_string/2 consistent would already be a very good solution for anyone using the formatter.

For other cases, would it be possible to keep the original related source of each node somewhere? Then have some kind of metadata to mark the node as "changed", allowing the tool to generate the code only for those changed nodes, keeping the others untouched?

I have no idea if this is feasible or not. Probably not, I guess :)

Also, even if we can't go this direction, I believe a Sourceror.patch_string/2 receiving a single range to patch could still be useful. Specially for tools that only need to apply simple small patches. It would also allow those tools to implement their own solutions, even if it requires to do multiples read file -> patch -> read file again -> patch. Depending on the changes required, that might work. For instance, it would have worked for all the changes we had for the Surface converter.

@doorgan
Copy link
Owner Author

doorgan commented Jun 20, 2021

Also, even if we can't go this direction, I believe a Sourceror.patch_string/2 receiving a single range to patch could still be useful. Specially for tools that only need to apply simple small patches. It would also allow those tools to implement their own solutions, even if it requires to do multiples read file -> patch -> read file again -> patch. Depending on the changes required, that might work. For instance, it would have worked for all the changes we had for the Surface converter.

This sounds like a reasonable step for now. I guess what I'm worried about is messing up the content inside the node as exemplified above, but producing code that matches the original source as much as possible could be thought of as a middle-long term goal.

For other cases, would it be possible to keep the original related source of each node somewhere? Then have some kind of metadata to mark the node as "changed", allowing the tool to generate the code only for those changed nodes, keeping the others untouched?
I have no idea if this is feasible or not. Probably not, I guess :)

If think having metadata to mark the node as "changed" is feasible. It's even one approach I've been playing with in my mind. The Clojure's zipper api does this to keep track of the nodes that changed and so reconstruct the tree when returning to the top level node, and it's something I didn't implement in Sourceror's Zipper api because I didn't find a use for it, but this would be one use case. The changed mark could also store the original range so we can map them back to the range they would replace. Then all we need to do is to perform an additional traversal to find which nodes changed, grab their ranges and build a patch from them. If a node changed then we don't care if their children changed too, we only care about the outermost node, so we avoid the "child invalidating parent change and viceversa" issue.

If one uses the Zipper api, then functions like replace/2 or insert_child/2 would already be able to add the "changed" marker. We would need to figure out a way to tell that the node at X range was deleted, but if the zipper metadata holds the list of patches itself, then it wouldn't be an issue. The zipper is a {node, metadata} tuple where the metadata holds the parent tree and siblings, but it could also hold additional metadata. When the zipper is at the top the metadata is nil or :end, but it would be good to change it to a map so it can hold this additional metadata

As for regular pre/postwalk traversals, we could make Sourceror.pre/postwalk to make a simple equality check comparing the node before and after applying the functions over them, and store a patch in the traversal state if the node changed.

Sourceror.patch_string/2 could still accept a list of changes, and if a patch contains or is contained by another patch's range, then that's up to the user to fix, and it would still enable the "apply changes one by one" approach. I already have working code for this, I just need to iron out the indentation handling.

So maybe I can add this, and then figure out ways to generate lists of patches in another issue :)

@msaraiva
Copy link
Contributor

Sounds like very good plan to me! 👍

@doorgan
Copy link
Owner Author

doorgan commented Jun 21, 2021

I just implemented this in 245952a, I'm quite happy with it.
I will now start looking for ways to generate the patch lists with other functions.

I'll let this issue open for a bit just in case before 0.8 is released, but it's ready to be closed.

@msaraiva
Copy link
Contributor

Hi @doorgan. That's fantastic!

I'm tried the following snippet in order to collect the ranges to be used by the new patch_string for one of the changes I needed for the converter. However, I wasn't able to use Sourceror.get_range/1.

The end goal is to rename the slot's option props: ... to args: ....

Not sure if that would be possible already.

Mix.install([{:sourceror, github: "doorgan/sourceror"}])

code = """
defmodule Card do
  use Surface.Component

  slot footer
  slot header, props: [:item]
  slot default, required: true, props: [:item]
end
"""

ranges =
  code
  |> Sourceror.parse_string!()
  |> Sourceror.postwalk([], fn
    {:slot, _, args} = quoted, state ->
      opts_node = Enum.at(args, 1, [])
      props_node = Enum.find(opts_node, &match?({{:__block__, _, [:props]}, _}, &1))

      if props_node do
        range = Sourceror.get_range(props_node)
        {quoted, %{state | acc: [range | state.acc]}}
      else
        {quoted, state}
      end

    quoted, state ->
      {quoted, state}
  end)

IO.inspect(ranges)

The error:

** (FunctionClauseError) no function clause matching in Sourceror.Range.get_range/1    
    
    The following arguments were given to Sourceror.Range.get_range/1:
    
        # 1
        {{:__block__, [trailing_comments: [], leading_comments: [], format: :keyword, line: 5, column: 16], [:props]}, {:__block__, [trailing_comments: [], leading_comments: [], closing: [line: 5, column: 29], line: 5, column: 23], [[{:__block__, [trailing_comments: [], leading_comments: [], line: 5, column: 24], [:item]}]]}}
    
    Attempted function clauses (showing 10 out of 17):
    
        def get_range({:__aliases__, meta, segments})
        def get_range({:__block__, meta, [string]}) when is_binary(string)
        def get_range({:__block__, meta, [number]}) when is_integer(number) or is_float(number)
        def get_range({:__block__, meta, [atom]}) when is_atom(atom)
        def get_range({:__block__, _, args} = quoted)
        def get_range({form, meta, context}) when is_atom(form) and is_atom(context)
        def get_range({{:., _, [Access, :get]}, _, _} = quoted)
        def get_range({{:., _, [_, :{}]}, _, _} = quoted)
        def get_range({{:., _, [:erlang, :binary_to_atom]}, meta, [interpolation, :utf8]})
        def get_range({{:., _, [left, right]}, meta, []} = quoted) when is_atom(right)
        ...
        (7 clauses not shown)
    
    (sourceror 0.7.0) lib/sourceror/range.ex:14: Sourceror.Range.get_range/1
    replace_props_wiht_args.exs:22: anonymous fn/2 in :elixir_compiler_0.__FILE__/1
    (stdlib 3.15) lists.erl:1358: :lists.mapfoldl/3
    (stdlib 3.15) lists.erl:1359: :lists.mapfoldl/3
    (elixir 1.12.1) lib/macro.ex:448: Macro.do_traverse/4
    (elixir 1.12.1) lib/macro.ex:463: Macro.do_traverse/4
    (stdlib 3.15) lists.erl:1358: :lists.mapfoldl/3
    (elixir 1.12.1) lib/macro.ex:468: Macro.do_traverse/4

I'm sure I'm doing something wrong :)

@doorgan
Copy link
Owner Author

doorgan commented Jun 21, 2021

It's actually a bug in get_range, I'm not accounting for the 2-tuples in keyword lists. I just pushed a fix, the script works now :)

@msaraiva
Copy link
Contributor

Nice! 🔥

I just tried the latest Sourceror.patch_string/2.

With this code:

Mix.install([{:sourceror, github: "doorgan/sourceror", tag: "14e58a3caa2cb61fc42dfeeabcc3982fd83c3591"}])

code = """
defmodule Card do
  use Surface.Component

  slot footer
  slot header, props: [:item]
  slot default, required: true, props: [:item]
end
"""

patches =
  code
  |> Sourceror.parse_string!()
  |> Sourceror.postwalk([], fn
    {:slot, _, args} = quoted, state ->
      opts_node = Enum.at(args, 1, [])
      props_node = Enum.find(opts_node, &match?({{:__block__, _, [:props]}, _}, &1))

      if props_node do
        range = Sourceror.get_range(props_node)
        {{:__block__, meta, [:props]}, body} = props_node
        args_node = {{:__block__, meta, [:args]}, body}
        new_code = Sourceror.to_string(args_node)
        patch = %{text: new_code, range: range}

        {quoted, %{state | acc: [patch | state.acc]}}
      else
        {quoted, state}
      end

    quoted, state ->
      {quoted, state}
  end)
  |> elem(1)
  |> Enum.reverse()

code
|> Sourceror.patch_string(patches)
|> IO.puts

I got the following result, which is not valid yet but it's almost what I needed :)

defmodule Card do
  use Surface.Component

  slot footer
  slot header, {:args, [:item]}
  slot default, required: true, {:args, [:item]}
end

I guess, since I'm using Sourceror.to_string(args_node) to generate the replacement text, there's no way for the formatter to know how to properly format the option.

What would be the best way to achieve this? Should I take another path?

Would alternatively accepting patches with an updater instead of a string do the trick?, Like:

%{
  update: &rename_props/1,
  range: %{start: [line: 1, column: 1], end: [line: 3, column: 4]}
}

where rename_props/1 could be something as simple as:

defp update_props(original_code) do
  String.replace_leading(original_code, "props:", "args:")
end

In case you're already satisfied with the current Sourceror.patch_string/2 implementation and you think this is a rare edge case, no worries. I'll try to find an alternative way to handle it, maybe by updating the whole slot call.

Thanks for the quick replies and for the great work! ❤️

@doorgan
Copy link
Owner Author

doorgan commented Jun 22, 2021

I guess, since I'm using Sourceror.to_string(args_node) to generate the replacement text, there's no way for the formatter to know how to properly format the option.

Yes, this is correct. Even if the key node has the format: :keyword metadata, the formatter doesn't know that this is happening in the context of a keyword list and will print it as a tuple. I'm not sure of how to address this in a generic way, but you could wrap the 2-tuple in a list, print it, and remove the wrapping braces with String.slice(1..-2)?

For example:

iex> {_, _, [[tuple]]} = Sourceror.parse_string!("[foo: :bar]")
iex> Sourceror.to_string([tuple]) |> String.slice(1..-2)
"foo: :bar"

In case you're already satisfied with the current Sourceror.patch_string/2 implementation and you think this is a rare edge case, no worries.

I'm open to supporting other updating modes as well, maybe we can use a generic :change field that can be:

  • A string(current behavior)
  • A function of arity 1 that takes the original code in the patch range as argument and outputs a string.

The function patch has the other benefit of allowing you to apply fine grained changes to the code without messing up the surrounding formatting, but doesn't have the issue of making a change to the wrong ranges that regexes have.

What do you think?

@msaraiva
Copy link
Contributor

you could wrap the 2-tuple in a list, print it, and remove the wrapping braces with String.slice(1..-2)?

Yeah, that would work for sure.

The function patch has the other benefit of allowing you to apply fine grained changes to the code without messing up the surrounding formatting, but doesn't have the issue of making a change to the wrong ranges that regexes have.

What do you think?

I think it would be extremely useful, especially because we could still use the AST to locate the minimum spot to be changed and use regex just in that small place, which would be a good middle ground.

@doorgan
Copy link
Owner Author

doorgan commented Jun 22, 2021

Great, I will change that then!

As for printing a keyword list fragment, it seems that's the only special case of that kind, and it's common enough to be an annoyance. I don't think it deserves its own function, but maybe I can make Sourceror.to_string handle that special case?
I'm thinking of adding a :format option that automatically performs the wrapping and slicing when it's set to :keyword? It would mimic the metadata of the keyword tuple.

@msaraiva
Copy link
Contributor

@doorgan the solution using Sourceror.patch_string/1 works perfectly for the cases I tested. Fantastic work! ❤️

I'm thinking of adding a :format option that automatically performs the wrapping and slicing when it's set to :keyword? It would mimic the metadata of the keyword tuple.

So you mean something like:

iex> quoted = Sourceror.parse_string!("[{:a, 1}, {:b, 2}]")
iex> Sourceror.to_string(quoted, format: :keyword)
"a: 1, b: 2"

@doorgan
Copy link
Owner Author

doorgan commented Jun 23, 2021

So you mean something like:

iex> quoted = Sourceror.parse_string!("[{:a, 1}, {:b, 2}]")
iex> Sourceror.to_string(quoted, format: :keyword)
"a: 1, b: 2"

Yes, that's the idea, but there are some caveats when it comes to a list of tuples instead of a single tuple, I opened #18 with more details :)

@doorgan
Copy link
Owner Author

doorgan commented Jun 23, 2021

Closing as the main point of this issue is resolved :)

@doorgan doorgan closed this as completed Jun 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants