Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewriter::Action rewritten #401

Closed
marcandre opened this issue Nov 20, 2017 · 12 comments
Closed

Rewriter::Action rewritten #401

marcandre opened this issue Nov 20, 2017 · 12 comments

Comments

@marcandre
Copy link
Contributor

marcandre commented Nov 20, 2017

Just fresh out of the 🚿 I thought I had a great idea. Here it is:

Rewriter::Action is currently either:

  • remove range
  • insert text at position, or
  • or both (replace range with text)

I think that Rewriter::Action should actually be slightly different:

  • replace range with text / delete it
  • wrap range with before and after texts
  • or both (replace and wrap)

This way, we can order theses actions in a sensical way, by placing them in a tree such that an action strictly containing another is an ancestor. (strict inclusion is a semi-order here; empty ranges are not considered contained if they are at either end of another range).

Actions on the same range are always merged.

Clobbering happens iff two ranges cross (i.e. if they have a different non-trivial intersection, like 1...4 and 2..5).

Example operations on 'Hello':

  • a wrap 'Hell' with '<', '>'
  • b remove 'el'
  • c remove 'ell'
  • d wrap 'e' with '[', ']'
  • e wrap 'H' with nil, '*'

These commands can be given in any order; the resulting tree (left-to-right) is always:

a --c -- d -- b
  \
   e

In this case, c is a removal and swallows d and b. In which ever order, the result should be '<H*>o':

insert_before(range, text) is actually wrap(before, text, nil), while insert_after(range, text) is wrap(range, nil, text).
replace(range, text) is always equivalent to wrap(range, text, nil) + remove(range)
insert_{before|after}_multi have no reason to exist IMO.

Pros

Rewriting operations would always well defined. Order is dependent only when acting on the exact same range: wrappings gets combined, later replacements takeover previous ones.
When rewriting an AST, care is no longer needed to deal with children first, the rewriting will order things correctly.
In particular, insert_after_multi(1...4, '>') and insert_after_multi(2..4, ')') are currently order dependent (even after #399), while the _before versions are order independent. With this tree proposal, the resulting order would always be the same no matter what, since 1...4 contains 2...4.
This supersedes #399, fixes #400, and removes the need for _multi versions.

Incompatibilities

Going deeper into the code, it's time for a cold shower as many of my expectations of the rewriter seem wrong.

First, it allows some intersecting replacements (see #402).

Less importantly it is allowed to removing intersecting ranges (say 1...4 and 2...5). I don't see the use for this, feel it's a bug and not than a feature, but it would be trivial to implement if there's a good usecase.

OTOH, current behavior doesn't allow some scenarios I don't see a problem with:

      @rewriter.
        insert_after(range(3, 1), '*').
        remove(range(2, 7)).

I can imagine a rewriter wanting to do a local change (inserting the '*' here), only to realize somewhere else in the code that a whole section containing that local change should be removed. Why not swallow the insert in this case? Anyways, that's also easy to implement with my "tree" version by not allowing deleting nodes to have inserting children, but I don't see why.

(I hope I'm not abusing the time of contributors to this awesome gem with this idea).

@marcandre
Copy link
Contributor Author

Fixed a few things when testing with seeing_is_believing (which passes). Any other user of Rewriter I should test with?

@whitequark
Copy link
Owner

I really like your new Rewriter, but am unsure of how compatible it actually is. Can you run the RuboCop test suite?

@marcandre
Copy link
Contributor Author

Oh, awesome, makes sense that rubocop is using the Rewriter.

I get 18 failures. 9 look like a single small bug in my code. None are cases where the new code raises a Clobering error (like the intersecting replacements of #402).

I'm looking forward to check them out in detail tomorrow, really time to sleep for me...

As I stated, it wouldn't be too surprising that compatibility would be an issue, I'm proposing to have a switch to use the new rules.

@marcandre
Copy link
Contributor Author

There's apparently a check that the end_pos given is within bounds of the source only for insert_after. The other methods all work:

              insert_before(range(4, 3000), 'quux ').
              replace(range(4, 3000), 'quux ').
              remove(range(4, 3000)).

My implementation checks that, so that explains 9 of the 18 failures I'm getting with rubocop. I opened an issue as it's a sign that there might be something else going on in this particular case.
More to come...

@alexdowad
Copy link
Collaborator

Hmm! Hmm. This is very interesting.

Some background: in 2016, I did a complete overhaul of Source::Rewriter. When I got to it, there were a number of cases where the semantics of how it should behave were not clearly defined. I started by coming up with a minimal set of constraints (or rules) which summarized my intuition about how the class should behave, and thinking them through carefully. I tried to make sure they were consistent -- in other words, there was no case in which it was impossible for all the constraints to be met -- and that they fully specified all possible cases. After writing a number of tests, the implementation came last.

If Rewriter is to be overhauled again, that is fine -- but I would encourage a similarly disciplined approach. Make sure you think through all the cases which can occur. Define the rules which your Rewriter will follow clearly and succinctly, and make sure that they are both consistent and complete. For every possible input, there should be one and only one output which satisfies all your rules.

Your new Rewriter is based on an entirely different concept than the old one. The old Rewriter views Ruby code as a sequence of characters and focuses on insertions and deletions of characters. Your Rewriter is based on a view of Ruby code as having structure, and of code edits as being tied to particular nodes in that nested tree structure.

For example, as you noted, your Rewriter treats insert_after(1..4, '.') and insert_after(2..4, '.') as meaningfully different operations. My Rewriter does not, because as stated, it is only interested in changes to characters and ignores the fact that code has structure.

Since Rewriter is a tool for modifying Ruby code and not just general text, your concept is better.

The fact that insert_after_multi and insert_before_multi will no longer be relevant is good. Those were a distasteful and odious hack.

Other issues:

  • Overlapping deletions: I would encourage you to take a wider view. You may not see a use case for this right now, but the question is: If a client programmer asks for overlapping deletions, is it clear what they want? Is it possible to do what they want in a consistent and predictable way?

The fact is that Rewriter is a general-purpose tool, and it is not possible to foresee what people may use it for. (I never imagined that things like seeing_is_believing would be made with it.) So it is not wise to arbitrarily bar the user from doing things simply because you don't think they will be useful.

  • Overlapping replacements: Again, this feature was not motivated by a particular use case, but by the desire to consistently follow a set of simple rules, without arbitrarily limiting what can be done.

Rewriter has clobbering errors for cases where it's impossible to tell what the user wants, or when what the user says he/she wants is self-contradictory.

So what if the user tries to replace the same range twice, with the exact same replacement text? In that case, it's clear what is wanted. We can handle that. What if one range is a proper subset of the other, but both replacements exactly match the length of their replaced ranges, and they agree where they overlap? That is also not a problem.

But what if the replacement texts do not match their replaced texts in length, but they can still be aligned in such a way that they agree? Worse still, what if neither range is a subset of the other?

In those tricky cases, my Rewriter follows a simple rule for matching the overlapping replacement texts to see if they agree. It does not attempt to look at every possible way the overlapping texts could be aligned against each other. I never imagined that this behavior was the "one right way". The idea was just to follow a simple, understandable rule in a consistent way. Since these are corner cases, just what rule is used did not seem so important, but it did need to be something simple and unambiguous.

Now, you could say that a better rule is that "overlapping replacements always conflict, unless they are replacing the exact same range with the exact same text". That is also a perfectly good rule. It does make sense and I have no objection to it.

  • Insertion falling inside a deleted region: If you want to say that insertions which fall completely inside a deleted region should be "swallowed up" by the deletion, that also makes sense and I have no objection. I think it is less clear what to do if the insertion falls right at the edge of the deleted region.

If you want to make that one of your rules, it is a good idea. Just make sure that it never conflicts with any of your other rules, in any situation, and it is always applied consistently.

  • Issue with end_pos for insert_after: This was not intended and should be fixed.

@marcandre
Copy link
Contributor Author

@alexdowad Wow, thanks for the detailed feedback.

Overlapping deletions: I would encourage you to take a wider view. You may not see a use case for this right now, but the question is: If a client programmer asks for overlapping deletions, is it clear what they want? Is it possible to do what they want in a consistent and predictable way?

I agree 👍 and came to the same conclusion when looking at Rubocop's failures. I hope to have a revised proposal and detailed report ready soon. I'll also have more comments on the rest of you post 😄

@marcandre
Copy link
Contributor Author

marcandre commented Nov 25, 2017

@alexdowad I agree with most of what you wrote 😄

Let me discuss where there is matter for discussion:

So what if the user tries to replace the same range twice [...] What if one range is a proper subset of the other, but both replacements exactly match the length of their replaced ranges, and they agree where they overlap? That is also not a problem.

I see a Rewriter as sequence of commands, with some that are more local than others. I see no issue if local ones are superseded by more global ones. If the ranges are not exactly the same, I believe that trying to decide if they "are compatible" can not be well defined and can create false positives and false negatives. Or maybe I tell myself that because it would be too hard and I'm not as courageous as you are 😸 ! In any case, I have made no attempt to detect that.

Now, you could say that a better rule is that "overlapping replacements always conflict, unless they are replacing the exact same range with the exact same text".

I probably wasn't clear enough, but what I call overlapping ranges are strictly overlapping, i.e. ranges that have non-trivial intersections and non trivial differences. Neither contain the other completely. 1...10 and 4...20 are such ranges, but 1...10 and 1...10 are not, neither are 1...10 and 10...10.

So after your comments and checking out Rubocop's failures, I've revised my rule to say that "strictly overlapping ranges that are pure deletions are fused; other cases always raise errors".

Insertion falling inside a deleted region: If you want to say that insertions which fall completely inside a deleted region should be "swallowed up" by the deletion, that also makes sense and I have no objection. I think it is less clear what to do if the insertion falls right at the edge of the deleted region.

I you think of an insertion "at an end point P", then you are thinking of an insertion at P...P, which would not be swallowed, as it is disjoint. If you think of an insert on a range, then the answer depends on if the range is bigger or smaller than the deleted region. If it is smaller, it gets swallowed. Otherwise it won't.

That's definitely the biggest mind shift with my proposal: seeing some insertions as wrapping a particular range.

@marcandre
Copy link
Contributor Author

marcandre commented Nov 25, 2017

Define the rules which your Rewriter will follow clearly and succinctly, and make sure that they are both consistent and complete. For every possible input, there should be one and only one output which satisfies all your rules.

I like that approach. I tweaked the doc to be more "rule-like", hopefully you'll like it and it remains short enough and understandable.

@marcandre
Copy link
Contributor Author

I'll need to find a different term than overlap though, especially given Range#overlaps?. Mathematically it would be "symmetrically different". Maybe "crossing"?

@alexdowad
Copy link
Collaborator

@marcandre What you say makes a lot of sense and I support it. Trying to merge "compatible" modifications was probably a bad idea.

It seems that "overlapping" is the term used in set theory to describe sets with a non-empty intersection. If you want to borrow mathematical terminology, maybe your test for clobbering is:

range1.overlaps?(range2) && !range1.subset?(range2) && !range2.subset?(range1)

@MaxLap
Copy link

MaxLap commented Nov 27, 2017

"crossing" seems to make sense, "crosses_bounds" if you want to be as explicit as possible.

@marcandre
Copy link
Contributor Author

TreeRewriter has been merge 🎆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants