Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the visitor pattern #61

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

camertron
Copy link

@camertron camertron commented Nov 22, 2019

Hey @ammar, thanks for the great gem! This PR is an attempt to implement the visitor pattern in regexp_parser such that clients can create their own visitors capable of walking the parse tree. It's a fairly common pattern that I've found useful when working with various parser tools.

I think the functionality should be opt-in, so there are two classes: Visitor is the base visitor class that clients would inherit from to implement their own visitors. The visitor_nodes.rb file contains extensions that add the accept method to all the various parser nodes. By requireing regexp_parser/visitor you also require regexp_parser/visitor_nodes. To opt out, simply don't require the visitor.

There's also a new rake task that can generate the visitor and node extensions programmatically, since doing so by hand can be pretty tedious.

Let me know if this is something you'd be interested in merging :)

@ammar
Copy link
Owner

ammar commented Nov 27, 2019

Hi @camertron. Sorry for the late response, and thanks for the PR. It is very interesting.

I understand the pattern in general, but I'm curious if there was a specific use case that prompted this.

Despite the quite impressive use of reflection and ERB (:heart:), I wonder, and please excuse any ignorance on my part here, if the following visitor would be equivalent.

# lib/regexp_parser/visitor.rb
require_relative 'visitor/node'

class Regexp::Visitor
  def visit(node)
    node.accept(self)
  end

  def method_missing(m, *args, &block)
    node = args.first

    if node.respond_to?(:expressions)
      node.expressions.each do |exp|
        exp.accept(self)
      end
    end
  end
end

And similarly for nodes, using your method from the generator, would the following satisfy the pattern?

# lib/regexp_parser/visitor/node.rb
module Regexp::Expression
  class Base
    def accept(visitor)
      visitor_method = self.class.name.split('::')[2..-1]
        .join('_')
        .gsub(/([a-z])([A-Z])/) { "#{$1}_#{$2}" }
        .downcase

      visitor.send(:"visit_#{visitor_method}", self)
    end
  end
end

I don't like generating the visitor method name on every call. I think that can be memoized, if I'm not missing something obvious and what I coded even works, that is.

Making the functionality opt-in is a good idea.

@jaynetics
Copy link
Collaborator

Hi!

I like that the VisitorGenerator gives you a file that lets you know what you need to handle.

On the other hand, I'm also wondering what the use case is, and in which way the visitor pattern helps it?

In my purely personal opinion, the visitor pattern mostly adds unnecessary overhead in a language like Ruby.

To me, code like the following seems to achieve the same in a much more straightforward way:

class MyExpProcessor
  def process_literal_literal(exp)
    # do something
  end

  def process_group_capture(exp)
    # do something
  end

  # ...
end

root = Regexp::Parser.parse(/foo(bar)/)
processor = MyExpProcessor.new

root.each_expression do |exp|
  processor.send("process_#{exp.type}_#{exp.token}", exp)
end

I've used type and token here, but of course,the same could be done with expression classes. type and token offer a bit more fine-grained differentiation than the classes, which might or might not be desired.

A generator for such processors (or whatever you'd want to call it) could look somewhat like this (not tested):

class Regexp::ProcessorGenerator
  def self.call
    Regexp::Syntax::Token::Map.flat_map do |type, tokens|
      tokens.map { |token| "def #{type}_#{token}(exp); end" }
    end.join("\n\n")
  end
end

The downside with this generator (like the one in the PR), is that it won't help you find "new" missing things that get added to Ruby's regular expression engine in the future, or changes to regexp_parsers API, unless you use the generator to build a parent class and then validate your child classes with something like sorbet.

So perhaps the most helpful thing would be an easy way to iterate over all expression classes, or all type-token combinations, to validate code dealing with the parser's output.

@camertron
Copy link
Author

Hey @ammar,

I understand the pattern in general, but I'm curious if there was a specific use case that prompted this.

It's funny, I had a use-case (Unicode regexes as defined by UTS #18) but decided to go in a different direction for the project. I had already written the generator and produced the visitor files, so I thought I'd contribute them back instead of just throwing them away. Correspondingly, I wouldn't be offended at all if you chose to not merge 😉

I wonder, and please excuse any ignorance on my part here, if the following visitor would be equivalent.

Yes, using method_missing is an entirely reasonable approach. I thought of using it myself, but it suffers from a number of drawbacks. First of all, method_missing will always be slower than good 'ol method dispatch. Second of all - and this is the more important reason in my opinion - defining all the methods explicitly leads to much better discoverability. You don't have to understand the internals of regexp_parser, you just have to look at the methods inside the visitor class to be able to start using it.

And similarly for nodes, using your method from the generator, would the following satisfy the pattern?

Hey, that looks great 😄 I don't think there's much of performance hit with send, if any? Might be worth some research, but my hunch is it should be fine.

Hey @jaynetics,

In my purely personal opinion, the visitor pattern mostly adds unnecessary overhead in a language like Ruby.

You're right in thinking that the visitor pattern isn't strictly necessary in a language like Ruby. It's a lot more necessary in languages like Go and Java, which need the accept + visit combo. They call it "double dispatch."

That said, anyone who works with parser technologies (yacc, bison, antlr, etc) will be intimately familiar with the visitor pattern, so I thought it might make regexp_parser a little more approachable from the outside to have an implementation of its own. It doesn't hurt at all that Visitor is also one of the Gang of Four patterns, which further increases its notoriety.

To me, code like the following seems to achieve the same in a much more straightforward way.

Right, your example looks really similar to Nokogiri's SAX listener where the library calls methods on some delegate you pass into it. The biggest downside of that approach in my opinion is that individual process_ methods don't get to decide whether the parser should also iterate over their children. Furthermore, the return values of the process_ methods are ignored, meaning any transformations you'd like to produce along the way have to be stored somewhere (maybe in instance variables?) The visitor pattern by comparison lets each visit_ method decide if/when to visit children and if the return value should be used for intermediate results. For example, here's the difference between the listener and visitor approaches for an imaginary equation solver:

class Listener < BaseListener
  def initialize
    @stack = []
  end

  def process_literal(node)
    @stack << node.value
  end

  def process_addition(node)
    second, first = @stack.pop(2)
    @stack << first + second
  end

  def result
    @stack.first
  end
end

listener = Listener.new
# or maybe this is done during parsing?
parse_tree.traverse(listener)
puts listener.result

# and for comparison, the visitor equivalent

class Visitor < BaseVisitor
  def visit_addition(node)
    visit(node.left) + visit(node.right)
  end
end

visitor = Visitor.new
puts visitor.visit(parse_tree)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants