-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functions to patch a string #13
Comments
The second approach could be used to generate the list of patches, by picking the outermost changed node, and then feeding those to the function from the first approach? |
Hi @doorgan! I have the feeling that trying to apply multiple patches at once might be very hard or even not possible due to the possible conflicts you already mentioned. A change in the inner node might completely invalidate the pattern used to match the outer node in the first place. Since a change in a node can also impose an update in the metadata of any of the parent nodes as well as in all the nodes after that change, I believe the safest approach could be to apply one by one, like:
One thing that needs to be treated with this approach would be a scenario where the user accidentally introduces an infinite loop by adding two opposite replacements, for instance, replace |
Hi @msaraiva! Thanks for your thoughts :) I have the same feeling, the more I dig into it the more I see there's not really a way to apply multiple patches without getting in trouble. Also, if the point of the function is to avoid messing up existing code formatting, even applying changes one by one is troublesome, for example if you want to change the name of a function and it's written like this: defp decode(addr_t(:ind), _dest, data) do
case data do
<<0::6, data::bits>> -> {:t_data_ind, nil, data}
<<0b01::2, seq::4, data::bits>> -> {:t_data_con, seq, data}
<<0b1000_0000::8>> -> {:t_connect, nil, <<>>}
<<0b1000_0001::8>> -> {:t_discon, nil, <<>>}
<<0b11::2, seq::4, 0b10::2>> -> {:t_ack, seq, <<>>}
<<0b11::2, seq::4, 0b11::2>> -> {:t_nak, seq, <<>>}
_ -> {:error, :invalid_tpci}
end
end By replacing the range occupied by the function node the tabular formatting of the code in the body would be completely overrided. I've also been digging into tools for other languages, and they don't have this kind of issues because they implement their own printer(recast for instance uses a custom "non-destructive" printer), while Sourceror is built on top of the Elixir formatter. So I'm tempted to think a better and less hacky way to handle this use case is to make |
Not sure. I mean, by just loading the consolidated configuration of the For other cases, would it be possible to keep the original related source of each node somewhere? Then have some kind of metadata to mark the node as "changed", allowing the tool to generate the code only for those changed nodes, keeping the others untouched? I have no idea if this is feasible or not. Probably not, I guess :) Also, even if we can't go this direction, I believe a |
This sounds like a reasonable step for now. I guess what I'm worried about is messing up the content inside the node as exemplified above, but producing code that matches the original source as much as possible could be thought of as a middle-long term goal.
If think having metadata to mark the node as "changed" is feasible. It's even one approach I've been playing with in my mind. The Clojure's zipper api does this to keep track of the nodes that changed and so reconstruct the tree when returning to the top level node, and it's something I didn't implement in Sourceror's Zipper api because I didn't find a use for it, but this would be one use case. The changed mark could also store the original range so we can map them back to the range they would replace. Then all we need to do is to perform an additional traversal to find which nodes changed, grab their ranges and build a patch from them. If a node changed then we don't care if their children changed too, we only care about the outermost node, so we avoid the "child invalidating parent change and viceversa" issue. If one uses the Zipper api, then functions like As for regular pre/postwalk traversals, we could make
So maybe I can add this, and then figure out ways to generate lists of patches in another issue :) |
Sounds like very good plan to me! 👍 |
I just implemented this in 245952a, I'm quite happy with it. I'll let this issue open for a bit just in case before 0.8 is released, but it's ready to be closed. |
Hi @doorgan. That's fantastic! I'm tried the following snippet in order to collect the ranges to be used by the new The end goal is to rename the Not sure if that would be possible already. Mix.install([{:sourceror, github: "doorgan/sourceror"}])
code = """
defmodule Card do
use Surface.Component
slot footer
slot header, props: [:item]
slot default, required: true, props: [:item]
end
"""
ranges =
code
|> Sourceror.parse_string!()
|> Sourceror.postwalk([], fn
{:slot, _, args} = quoted, state ->
opts_node = Enum.at(args, 1, [])
props_node = Enum.find(opts_node, &match?({{:__block__, _, [:props]}, _}, &1))
if props_node do
range = Sourceror.get_range(props_node)
{quoted, %{state | acc: [range | state.acc]}}
else
{quoted, state}
end
quoted, state ->
{quoted, state}
end)
IO.inspect(ranges) The error:
I'm sure I'm doing something wrong :) |
It's actually a bug in get_range, I'm not accounting for the 2-tuples in keyword lists. I just pushed a fix, the script works now :) |
Nice! 🔥 I just tried the latest With this code: Mix.install([{:sourceror, github: "doorgan/sourceror", tag: "14e58a3caa2cb61fc42dfeeabcc3982fd83c3591"}])
code = """
defmodule Card do
use Surface.Component
slot footer
slot header, props: [:item]
slot default, required: true, props: [:item]
end
"""
patches =
code
|> Sourceror.parse_string!()
|> Sourceror.postwalk([], fn
{:slot, _, args} = quoted, state ->
opts_node = Enum.at(args, 1, [])
props_node = Enum.find(opts_node, &match?({{:__block__, _, [:props]}, _}, &1))
if props_node do
range = Sourceror.get_range(props_node)
{{:__block__, meta, [:props]}, body} = props_node
args_node = {{:__block__, meta, [:args]}, body}
new_code = Sourceror.to_string(args_node)
patch = %{text: new_code, range: range}
{quoted, %{state | acc: [patch | state.acc]}}
else
{quoted, state}
end
quoted, state ->
{quoted, state}
end)
|> elem(1)
|> Enum.reverse()
code
|> Sourceror.patch_string(patches)
|> IO.puts I got the following result, which is not valid yet but it's almost what I needed :)
I guess, since I'm using What would be the best way to achieve this? Should I take another path? Would alternatively accepting patches with an updater instead of a string do the trick?, Like: %{
update: &rename_props/1,
range: %{start: [line: 1, column: 1], end: [line: 3, column: 4]}
} where defp update_props(original_code) do
String.replace_leading(original_code, "props:", "args:")
end In case you're already satisfied with the current Thanks for the quick replies and for the great work! ❤️ |
Yes, this is correct. Even if the key node has the For example: iex> {_, _, [[tuple]]} = Sourceror.parse_string!("[foo: :bar]")
iex> Sourceror.to_string([tuple]) |> String.slice(1..-2)
"foo: :bar"
I'm open to supporting other updating modes as well, maybe we can use a generic
The function patch has the other benefit of allowing you to apply fine grained changes to the code without messing up the surrounding formatting, but doesn't have the issue of making a change to the wrong ranges that regexes have. What do you think? |
Yeah, that would work for sure.
I think it would be extremely useful, especially because we could still use the AST to locate the minimum spot to be changed and use regex just in that small place, which would be a good middle ground. |
Great, I will change that then! As for printing a keyword list fragment, it seems that's the only special case of that kind, and it's common enough to be an annoyance. I don't think it deserves its own function, but maybe I can make |
@doorgan the solution using
So you mean something like: iex> quoted = Sourceror.parse_string!("[{:a, 1}, {:b, 2}]")
iex> Sourceror.to_string(quoted, format: :keyword)
"a: 1, b: 2" |
Yes, that's the idea, but there are some caveats when it comes to a list of tuples instead of a single tuple, I opened #18 with more details :) |
Closing as the main point of this issue is resolved :) |
Sourceror.to_string/2
allows one to convert AST to string ensuring the comments land where they should. The issue is that even if after a manipulation only a little portion of the code is changed, the whole string will be formatted. This is not an issue if the user is already using the elixir formatter, but it is a problem otherwise.It is possible to replace only subsets of a string by using
Sourceror.get_range/1
in combination withSourceror.to_string/2
to find the range where a modification would occur and replace it with the new text, but it would be nice for that functionality to be included in Sourceror itself.A
Sourceror.patch_string/2
could be introduced, taking the string to be changed as a first argument and a list of patches to be applied to it. A patch is a map holding the range that would be affected, and the text that would replace the old content.The main issue is that the fixes should not have overlapping ranges. For example, in this code:
One of the patches may want to replace
String.to_atom(foo)
withString.to_existing_atom(foo)
, while other patch may want to replace the wholeunless
with anif allowed? do ... end
. There is an obvious conflict there, which limits the usefulness of this approach.Another alternative is to add a function that instead of patching the string based on a list of patches, it accepts a function that traverses and updates the ast, and then does some tree-diffing magic to figure out the final set of changes to be applied, in a way that the two conflicting patches of the previous example would become a single patch that changes all at once.
I have already explored the first idea and found those limitations, which is why I will now start exploring the second one. The other aspect to keep in mind is the indentation of the patches if they span multiple lines, but we should be able to infer it from the surrounding lines.
The text was updated successfully, but these errors were encountered: