Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liquid::ParseTreeVisitor #1025

Merged
merged 6 commits into from
Oct 18, 2018
Merged

Liquid::ParseTreeVisitor #1025

merged 6 commits into from
Oct 18, 2018

Conversation

singpolyma-shopify
Copy link
Contributor

@singpolyma-shopify singpolyma-shopify commented Sep 10, 2018

Currently, there is no way to traverse the parse tree of a liquid document without using instance_variable_get and a lot of knowledge of library internals, which is quite brittle.

This change enables writing code to traverse the tree for reasons other than rendering (such as static analysis).

Copy link
Contributor

@pushrax pushrax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially thinking something more declarative, specifically the visitor pattern implemented similarly to graphql-ruby's visitor. Static analysis can be expressed very elegantly in such a form.

@fw42
Copy link
Contributor

fw42 commented Sep 11, 2018

What is this needed for?

@singpolyma-shopify
Copy link
Contributor Author

@singpolyma-shopify
Copy link
Contributor Author

@pushrax Wouldn't such a visitor still need a way to actually get at the nodes, however? Or would you suggest building one like that that knows about the instance variable names, etc?

@singpolyma-shopify
Copy link
Contributor Author

https://github.com/rmosolgo/graphql-ruby/blob/132e274c04e45c9fd17dbd4abb992cd0115ea22a/lib/graphql/language/visitor.rb#L47

The example here uses a node.children method, which is basically what I'm trying to add here.

@pushrax
Copy link
Contributor

pushrax commented Sep 20, 2018

Wouldn't such a visitor still need a way to actually get at the nodes, however?

Yeah, we definitely need a way to get at the children. It doesn't necessarily need to be a public API though, and it could be better-defined. In the context of nodelist, a node is a String or a thing with a render method (Tag, BlockBody, Variable). The thing that makes me uncomfortable about having a public holistic children API as proposed is type ambiguity. If we introduce this, it needs to not collide in terminology, and/or we need to redefine "node" in a specific way. For the sake of argument, wlog, let's try to figure out the definition of "node" we need.

Perusing some types:

  • Tag: base class for tags. Has an empty nodelist method, has a render method.
    • Block: inherits Tag, these are tags that have a start and an end and some stuff in between. They contain >0 BlockBody instances and delegate nodelist to them.
      • If, For, etc. inherit Block
  • BlockBody: contains the contents between the start and end of a Block. Has a render method.
  • Variable: contains a name of a type Expression.parse can return and >=0 filters, which each have >=0 arguments of a type Expression.parse can return. Variable does not inherit from Tag, it's a special thing, but it does still have a render method.
  • VariableLookup: some access to a drop, which can have recursive references to other VariableLookups and other types returned from Expression.parse. This is the closest thing to an expression node we have, but is not currently considered a node at all.
  • Condition: compares a "left" value of any type to a "right" value of any type, so long as those types respond to the comparator method. In practise, these types come from Expression.parse. Contains a BlockBody "attachment" that should be rendered when the condition is met. However, a condition doesn't itself have a nodelist, and it's not a node at the moment; it doesn't have a render method. The parent of the condition (an If, Case, or Unless) maps the attachments for its own nodelist and for rendering. If Condition becomes a node, this probably wouldn't make sense.

So, what is a node in this new world? I guess it's a sum type of all these classes and their members. Expression.parse can return strings, integers, ranges, floats, and VariableLookups, so these must all be node types as well. I suppose there is nothing intrinsically wrong with that, but it's hard to reason about what you'll get when you call children with such a complicated array of possibilities.

Your choice to reject strings in the implementation of BlockBody#each_node is a good example of a how this type system is confusing. A string is probably a valid node type, but it's not obvious at all why without going through how Expression.parse is used. I can't even say it's completely clear to me if you got it wrong or right, because a string in that context might really be a different type than a string in the Expression.parse context.

The code that depends on such a children method is likely to be strongly coupled to the Liquid version, perhaps brittle. This is why I suggested the visitor approach: you declare you want to know about e.g. VariableLookups and that's all you are given. This somewhat hides the complexity of the node sum type from the caller.

A refactor to create a proper Node class that is inherited globally and provides a consistent API is a potential way to alleviate some complexity.

@singpolyma-shopify
Copy link
Contributor Author

@pushrax How would you suggest making it non-public? Have the Liquid::Traversal (or Liquid::Visitor or whoever we call it) use send to bypass access control and get the children? Or giving the knowledge of what instance variables to pull, etc, to the visitor class?

@singpolyma-shopify
Copy link
Contributor Author

Hmm, so, probably having all the subnodes be readable anyway (via public attr_readers) is desirable -- I've run into a few I want in my code outside of the traversal. So maybe the right thing to do is expose all of these, and then the logic of which things are children can live in the visitor but we don't need to use instance_variable_get or similar hacks anymore in that case.

Thoughts?

@pushrax
Copy link
Contributor

pushrax commented Sep 27, 2018

That sounds quite reasonable to me. Having a Ruby-visible public children method that is documented as internal is okay with me as well.

@singpolyma-shopify singpolyma-shopify changed the title WIP: each_node WIP: Liquid::Traversal Sep 27, 2018
@singpolyma-shopify singpolyma-shopify changed the title WIP: Liquid::Traversal Liquid::Traversal Sep 28, 2018
@singpolyma-shopify
Copy link
Contributor Author

Implemented, cleaned up, and added tests

@singpolyma-shopify
Copy link
Contributor Author

Not sure what's up with the policial failures... they don't match local rubocop run. Is policial not using the config from the repo?

@singpolyma-shopify singpolyma-shopify force-pushed the traverse-ast branch 6 times, most recently from 7f69097 to aa5fd4c Compare October 1, 2018 16:32
@singpolyma-shopify
Copy link
Contributor Author

Ok, it passes Policial here now. Local rubocop failures in files I didn't modify persist, but they seem to be present on master

@pushrax
Copy link
Contributor

pushrax commented Oct 1, 2018

Yeah, the rubocop config has been broken in this repo for a few months since there hasn't been much active work. Will address that at some point.

@singpolyma-shopify
Copy link
Contributor Author

@pushrax Ok, cool. So if we're following policial I think this is good for review.

Copy link
Contributor

@pushrax pushrax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the bike shedding, I haven't been in the head space of this problem and haven't made my goals for this solution clear.

# frozen_string_literal: true

module Liquid
class Traversal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visitor is a name that carries more useful connotation for me here. Is it the wrong name for a reason I'm missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt like since this is just one possible instance of the visitor pattern, taking the name of the whole pattern for this case was over-broad. But happy to call it Visitor if that makes more sense to you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllTypesVisitor? I'm basically looking for the ability to have a decent idea what this does without reading the implementation.

module Liquid
class Traversal
def self.for(node, callbacks = Hash.new(proc {}))
kase = CASES.find { |(klass, _)| node.is_a?(klass) }&.last
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CASES isn't used for anything else, maybe it should be inlined to make the code clearer?

traversal_class = case node
when Liquid::Assign then Assign
when Liquid::Case then Case
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it as an inline case expression fails policial because there are too many cases. Doing it with data also makes it easier to extend this to a registry-style model in the future (maybe for custom liquid tags to add to or similar?)

def children
@node.lookups
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is better than colocating these definitions alongside their corresponding classes. Perhaps the convention could be:

# variable_lookup.rb
module Liquid
  class VariableLookup
    class Traversal < Liquid::Traversal
      def children
        @node.lookups
      end
    end
  end
end

or do away with this abstraction altogether and put a children method on VariableLookup etc directly. If the latter, picking the method name is the hardest problem, given the rant about type consistency I posted earlier. Perhaps something to the effect of children_of_all_types, maybe phrased more eloquently. Then the new public accessors can be removed as well. Reflecting back, I think my dislike of the method added in the original PR was:

  • the method name was ambiguous
  • I'd rather have consumers use the visitor pattern to access the data

Having the method at all wasn't the issue for me.

@callbacks = callbacks
end

def callback_for(*classes, &block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy with this name, but can't think of any great alternatives. Some options I considered:

  • add_callback
  • add_callback_for
  • add_visitor
  • add_visitor_for
  • on_visit
  • []=

WDYT?

Copy link
Contributor

@pushrax pushrax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment about why you think the "subclass Traversal and add missing getters" is better than "private-ish method" would be useful for posterity.

The last concern I have is that there isn't test coverage for the individual Traversal classes. Table tests are a concise way to implement tests for this sort of thing.

# frozen_string_literal: true

module Liquid
class Traversal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AllTypesVisitor? I'm basically looking for the ability to have a decent idea what this does without reading the implementation.

@singpolyma-shopify
Copy link
Contributor Author

They individual classes do have test coverage from the integration tests -- I did it that way because setting up the tree by parsing a Liquid snippet seemed much cleaner than doing manual object-unit setup.

I chose the full Visitor pattern instead of a private-ish method for a couple reasons: (1) the idea of a private-ish method is very bothersome, nothing but docs to even suggest it is not meant to be used directly (2) no pollution of the core-LIquid namespaces with methods that are only used by a different module

Copy link
Contributor

@pushrax pushrax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the only non-hacky way to make sure the variables visited is equal to the variables that exist would be to force storage in an array, but that's a bit of a major refactor and adds a runtime cost. This is probably an issue we can live with.

@fw42 can you give this a review?

Copy link
Contributor

@fw42 fw42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bunch of small comments. I'm not entirely sure what the purpose and the intended API of this is exactly. Could you update the PR description so that that's more clear (especially to people outside of Shopify who can't read the Slack conversation that you linked)?

lib/liquid/parse_tree_visitor.rb Outdated Show resolved Hide resolved
lib/liquid/parse_tree_visitor.rb Outdated Show resolved Hide resolved

def add_callback_for(*classes, &block)
cb = block
cb = ->(node, _) { block[node] } if block.arity.abs == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the abs needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when there are optional arguments, ruby returns a negative number for arity

lib/liquid/parse_tree_visitor.rb Outdated Show resolved Hide resolved
lib/liquid/parse_tree_visitor.rb Outdated Show resolved Hide resolved
attr_reader :collection_name
attr_reader :variable_name
attr_reader :collection_name, :variable_name, :limit, :from,
:for_block, :else_block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did for_block and else_block need to be accessible? doesn't seem like you are using that anywhere

[
@node.template_name_expr,
@node.variable_name_expr
] + @node.attributes.values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only the values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the keys in the hash are the name of attributes to be assigned, not expressions

class ParseTreeVisitorTest < Minitest::Test
include Liquid

def traversal(template)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I think helpers like this are conventionally put at the end of the test and made private

ParseTreeVisitor
.for(Template.parse(template).root)
.add_callback_for(VariableLookup, &:name)
.visit.flatten.compact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the visit.flatten.compact here? by making this a part of the test, I might accidentally break the API that you wanted without your tests covering it. For example I could change the code to always add a nil to the return value of every visit and that would most likely break consumers of this new API, but none of your tests would cover that. Same for the flatten (I could add some garbage array nesting and your tests wouldn't catch it).

I think you should add at least one or two tests or so to explicitly assert that the array structure is what you wanted. If you always want this to be flat and compact then maybe #visit itself should do that (rather than the caller)?

module Liquid
class ParseTreeVisitor
def self.for(node, callbacks = Hash.new(proc {}))
if defined?(node.class::ParseTreeVisitor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this subclassing pattern seems overkill. Why not just have each class define a children method (then you also wouldn't have to make so many accessors public) and then only have one ParseTreeVisitor class (rather than one subclass per tag)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed above, using a full visitor pattern instead of a method that has to be public but that we don't want anyone to call seemed cleaner, and also allows us to keep traversal-related things out of the main namespace.

If you feel very strongly, I can change this back again.

@singpolyma-shopify singpolyma-shopify changed the title Liquid::Traversal Liquid::ParseTreeVisitor Oct 15, 2018
This enables traversal over whole document tree.
@singpolyma-shopify singpolyma-shopify force-pushed the traverse-ast branch 3 times, most recently from 19e9f0b to cf8dcaf Compare October 15, 2018 14:26
Copy link
Contributor

@fw42 fw42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks correct to me. Don't feel strongly about the extra class comment. Not sure why you marked my other comments as resolved even though you didn't respond to them nor change any of the code I commented on. I think those comments are still valid. But if you disagree then feel free to merge.

@singpolyma-shopify
Copy link
Contributor Author

@fw42 Sorry about that -- somehow didn't push all my changes but thought I had. Made those other changes you requested now.

@singpolyma-shopify singpolyma-shopify merged commit fe4034c into master Oct 18, 2018
@singpolyma-shopify singpolyma-shopify deleted the traverse-ast branch October 18, 2018 13:43
@shopify-shipit shopify-shipit bot temporarily deployed to rubygems-4-0-stable January 11, 2023 15:52 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants