Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ChunkResolver to partially resolve expressions #525

Merged
merged 11 commits into from
Nov 17, 2020

Conversation

jasmith-hs
Copy link
Contributor

@jasmith-hs jasmith-hs commented Nov 5, 2020

Part of #532

This PR introduces what I have called the ChunkResolver as a part of my Eager Execution project (draft PR for context).

Purpose

The purpose of the ChunkResolver is to evaluate as much of an expression as possible, generally when there is something deferred within that expression. This is to minimize the number of values, macros, etc. that actually need to be deferred for a second rendering pass. The ChunkResolver will only be used when pre-rendering and will not be enabled by default. It will only be used when a future eagerExecutionEnabled flag is set to true.

When a value is deferred, or something cannot be resolved because it directly depends on a deferred value (e.g. as range(0, deferred)), all the non-reserved words get added to the ChunkResolver's deferredWords. In that former example, getDeferredWords() would then return a list: ["range", "deferred"]. Clearly, for a second pass this expression would be resolvable once the deferred value gets resolved because the range function would likely exist in the second-pass-jinjava instance, but if it were a function such as hubdb_table_rows, that may prevent running a second pass if the second jinjava instance can't evaluate that function.

Say, however, we're resolving an expression like:

{% if hubdb_table_row(table_id, row_id).company == contact.company %}

and contact is deferred (table_id, row_id are known).

Without the ChunkResolver, the expression within the if tag would be simply reconstructed in the output, and a dependency on hubdb_table_row would remain (but also there wouldn't be any indication that it does remain). Output:

{% if hubdb_table_row(table_id, row_id).company == contact.company %}

Using the ChunkResolver, the expression could be resolved in chunks. The table_id and row_id get resolved to their values and then hubdb_table_row(1234, 56789).company can get resolved to 'HubSpot', for example. contact.company is then deferred, and the output is:

{% if 'HubSpot' == contact.company %}

And chunkResolver.getDeferredWords() is: ["contact.company"].


In the case that a value gets resolved as something like a Map, then it gets output in JSON, which can be read in by a Jinjava interpreter on the second pass. (Dates need their own serialization override so that they are formatted correctly and also don't use the thread-unsafe SimpleDateFormat that's built into Jackson.)

For example, if foobar is a map where foobar.foo='FooSpot', foobar.bar='HubBar' then the ChunkResolver would resolve deferred == foobar as deferred == {'foo':'FooSpot','bar':'HubBar'}

Logic

The ChunkResolver gets its name by evaluating expressions in "chunks". It also uses "mini-chunks" and "tokens".
Tokens are words that are continuous strings made of letters, digits, underscores, and periods. (foo, foo_bar, baz99.zap, etc).
Mini-chunks are multiple tokens concatenated by their splitters ( , |, :, etc.). Mini-chunks are broken up by commas. Some examples of mini-chunks are (0, 19 || deferred, 1 + 2 + 3, 'foo' + 'bar')
Chunks are grouped by (), [], or {} so in range(0, deferred + 1), 0, deferred + 1 is a chunk.

Chunks are resolved recursively because there can be nested chunks with many layers of brackets/parentheses. Tokens are either retraced to variables or resolved as expressions, and chunks/mini-chunks are resolved as expressions. If they cannot be resolved, then they get deferred.


Note that because the ChunkResolver resolves expressions in small pieces, interpreter errors are often encountered as, for example = by itself is not a valid expression. A flag is added to the context when the chunk resolver is resolving the expression to prevent errors from being unnecessarily added to the interpreter.

Feel free to make any suggestions for this class or new test cases to add.

Copy link
Contributor

@boulter boulter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very specific to HubSpot. Should it live in the public project?

I'm also wondering why we need another parser when jinjava already has a parser that we could configure.

);

private static final Set<String> RESERVED_KEYWORDS = ImmutableSet.of(
"and",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you reference constants from the tags themselves here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a list of reserved keywords from here https://svn.python.org/projects/external/Jinja-1.1/docs/build/designerdoc.html#reserved-keywords though I think it makes sense to not hardcode as many of these as possible. Just don't know where to get a more complete list, but I'll use the constants from tags to remove some of these

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this some more, this doesn't need to include any of the tag names as we only need to look for words that appear within expressions like in and or, etc.

throws JsonProcessingException {
return OBJECT_MAPPER
.writeValueAsString(val)
.replaceAll("(?<!\\\\)(?:\\\\\\\\)*(\\\\n)", "\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do these do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These swap out a couple of characters from the JSON string. Double quotes are replaced with single quotes, and an escaped n is replaced with the newline character, and an escaped " is replaced with a double-quote character.

The n and " replacements are because if they don't get replaced, when the JSON string is output, it would have literal \ns and \"s in it where we actually would want newlines and "s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, adding a comment here would be helpful for non-obvious code like this.

@jasmith-hs
Copy link
Contributor Author

jasmith-hs commented Nov 5, 2020

@boulter It would be used alongside DeferredValue functionality so it can be useful outside of HubSpot.
As for using the Abstract Syntax Tree parsers to parse this, I think that it might be possible by making ChunkResolver extend ELResolver (or JinjavaInterpreterResolver).

I'm not sure right now how we could handle and flag the deferred values while evaluating AstNodes, but it could probably work. Going with that approach would be a modification of the low-level parsing rather than decorating the parsing and adding functionality on top of it. Modifying the Ast parsing would make the ChunkResolver faster as it would only need to process each piece of an expression a single time, but I'm not too concerned about speed as eager execution wouldn't ever be happening live.

With this project, I am keeping the eager execution functionality as decoupled from the rest of Jinjava as possible. I will think about modifying the resolveELExpression()/ELResolver.getValue() logic and figure out how much work that would be to implement

Edit: The ChunkResolver is similar to the HelperStringTokenizer, but it allows for expressions with deferred values to be partially evaluated. I think that modifying the Parser would be more complicated as it would require all types of AstNode to have a way to handle DeferredValues (currently if a deferred value is hit when evaluating them, an exception is thrown and the evaluation is cancelled. This is what the ChunkResolver decorates, but it resolves/creates ASTs from substrings of the expression rather than creating an AST of the whole expression.

@jasmith-hs jasmith-hs requested a review from boulter November 13, 2020 22:06
@@ -526,4 +527,12 @@ public void addDependencies(SetMultimap<String, String> dependencies) {
public SetMultimap<String, String> getDependencies() {
return this.dependencies;
}

public boolean isHideInterpreterErrors() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even though this is a boolean, it would probably be more readable with a get instead of is

@jasmith-hs jasmith-hs merged commit 9808515 into HubSpot:master Nov 17, 2020
@jasmith-hs jasmith-hs deleted the chunk-resolver branch November 17, 2020 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants