Add ChunkResolver to partially resolve expressions #525

jasmith-hs · 2020-11-05T18:30:09Z

Part of #532

This PR introduces what I have called the ChunkResolver as a part of my Eager Execution project (draft PR for context).

Purpose

The purpose of the ChunkResolver is to evaluate as much of an expression as possible, generally when there is something deferred within that expression. This is to minimize the number of values, macros, etc. that actually need to be deferred for a second rendering pass. The ChunkResolver will only be used when pre-rendering and will not be enabled by default. It will only be used when a future eagerExecutionEnabled flag is set to true.

When a value is deferred, or something cannot be resolved because it directly depends on a deferred value (e.g. as range(0, deferred)), all the non-reserved words get added to the ChunkResolver's deferredWords. In that former example, getDeferredWords() would then return a list: ["range", "deferred"]. Clearly, for a second pass this expression would be resolvable once the deferred value gets resolved because the range function would likely exist in the second-pass-jinjava instance, but if it were a function such as hubdb_table_rows, that may prevent running a second pass if the second jinjava instance can't evaluate that function.

Say, however, we're resolving an expression like:

{% if hubdb_table_row(table_id, row_id).company == contact.company %}

and contact is deferred (table_id, row_id are known).

Without the ChunkResolver, the expression within the if tag would be simply reconstructed in the output, and a dependency on hubdb_table_row would remain (but also there wouldn't be any indication that it does remain). Output:

{% if hubdb_table_row(table_id, row_id).company == contact.company %}

Using the ChunkResolver, the expression could be resolved in chunks. The table_id and row_id get resolved to their values and then hubdb_table_row(1234, 56789).company can get resolved to 'HubSpot', for example. contact.company is then deferred, and the output is:

{% if 'HubSpot' == contact.company %}

And chunkResolver.getDeferredWords() is: ["contact.company"].

In the case that a value gets resolved as something like a Map, then it gets output in JSON, which can be read in by a Jinjava interpreter on the second pass. (Dates need their own serialization override so that they are formatted correctly and also don't use the thread-unsafe SimpleDateFormat that's built into Jackson.)

For example, if foobar is a map where foobar.foo='FooSpot', foobar.bar='HubBar' then the ChunkResolver would resolve deferred == foobar as deferred == {'foo':'FooSpot','bar':'HubBar'}

Logic

The ChunkResolver gets its name by evaluating expressions in "chunks". It also uses "mini-chunks" and "tokens".
Tokens are words that are continuous strings made of letters, digits, underscores, and periods. (foo, foo_bar, baz99.zap, etc).
Mini-chunks are multiple tokens concatenated by their splitters ( , |, :, etc.). Mini-chunks are broken up by commas. Some examples of mini-chunks are (0, 19 || deferred, 1 + 2 + 3, 'foo' + 'bar')
Chunks are grouped by (), [], or {} so in range(0, deferred + 1), 0, deferred + 1 is a chunk.

Chunks are resolved recursively because there can be nested chunks with many layers of brackets/parentheses. Tokens are either retraced to variables or resolved as expressions, and chunks/mini-chunks are resolved as expressions. If they cannot be resolved, then they get deferred.

Note that because the ChunkResolver resolves expressions in small pieces, interpreter errors are often encountered as, for example = by itself is not a valid expression. A flag is added to the context when the chunk resolver is resolving the expression to prevent errors from being unnecessarily added to the interpreter.

Feel free to make any suggestions for this class or new test cases to add.

boulter

This seems very specific to HubSpot. Should it live in the public project?

I'm also wondering why we need another parser when jinjava already has a parser that we could configure.

boulter · 2020-11-05T20:17:57Z

src/main/java/com/hubspot/jinjava/util/ChunkResolver.java

+    );
+
+  private static final Set<String> RESERVED_KEYWORDS = ImmutableSet.of(
+    "and",


can you reference constants from the tags themselves here?

Got a list of reserved keywords from here https://svn.python.org/projects/external/Jinja-1.1/docs/build/designerdoc.html#reserved-keywords though I think it makes sense to not hardcode as many of these as possible. Just don't know where to get a more complete list, but I'll use the constants from tags to remove some of these

Thinking about this some more, this doesn't need to include any of the tag names as we only need to look for words that appear within expressions like in and or, etc.

boulter · 2020-11-05T20:20:32Z

src/main/java/com/hubspot/jinjava/util/ChunkResolver.java

+    throws JsonProcessingException {
+    return OBJECT_MAPPER
+      .writeValueAsString(val)
+      .replaceAll("(?<!\\\\)(?:\\\\\\\\)*(\\\\n)", "\n")


what do these do?

These swap out a couple of characters from the JSON string. Double quotes are replaced with single quotes, and an escaped n is replaced with the newline character, and an escaped " is replaced with a double-quote character.

The n and " replacements are because if they don't get replaced, when the JSON string is output, it would have literal \ns and \"s in it where we actually would want newlines and "s

OK, adding a comment here would be helpful for non-obvious code like this.

jasmith-hs · 2020-11-05T23:17:56Z

@boulter It would be used alongside DeferredValue functionality so it can be useful outside of HubSpot.
As for using the Abstract Syntax Tree parsers to parse this, I think that it might be possible by making ChunkResolver extend ELResolver (or JinjavaInterpreterResolver).

I'm not sure right now how we could handle and flag the deferred values while evaluating AstNodes, but it could probably work. Going with that approach would be a modification of the low-level parsing rather than decorating the parsing and adding functionality on top of it. Modifying the Ast parsing would make the ChunkResolver faster as it would only need to process each piece of an expression a single time, but I'm not too concerned about speed as eager execution wouldn't ever be happening live.

With this project, I am keeping the eager execution functionality as decoupled from the rest of Jinjava as possible. I will think about modifying the resolveELExpression()/ELResolver.getValue() logic and figure out how much work that would be to implement

Edit: The ChunkResolver is similar to the HelperStringTokenizer, but it allows for expressions with deferred values to be partially evaluated. I think that modifying the Parser would be more complicated as it would require all types of AstNode to have a way to handle DeferredValues (currently if a deferred value is hit when evaluating them, an exception is thrown and the evaluation is cancelled. This is what the ChunkResolver decorates, but it resolves/creates ASTs from substrings of the expression rather than creating an AST of the whole expression.

boulter · 2020-11-17T16:29:16Z

src/main/java/com/hubspot/jinjava/interpret/Context.java

@@ -526,4 +527,12 @@ public void addDependencies(SetMultimap<String, String> dependencies) {
  public SetMultimap<String, String> getDependencies() {
    return this.dependencies;
  }
+
+  public boolean isHideInterpreterErrors() {


even though this is a boolean, it would probably be more readable with a get instead of is

jasmith-hs added 8 commits November 5, 2020 09:50

Checkout ChunkResolver from eager-execution-a branch

d0b1360

Add pyish date serializer to json

09803ac

Add hideInterpreterErrors flag

ac5a8a1

Add dict resolve test

ccd1501

Remove unnecessary useMiniChunks flag

c309fdb

Add date serialization test

560961d

Remove incorrectly dated license

50063a2

Update comment for splitChunks()

3d50717

jasmith-hs requested review from boulter, Joeoh and hs-lsong November 5, 2020 18:30

boulter reviewed Nov 5, 2020

View reviewed changes

Comment regex replaceAll

83a4ab6

This was referenced Nov 6, 2020

Implement ChunkResolver by extending ExtendedParser #527

Closed

Introduce Eager Tokens #529

Merged

Eager Execution #532

Closed

Remove unnecessary reserved words

a7b256e

jasmith-hs requested a review from boulter November 13, 2020 22:06

boulter approved these changes Nov 17, 2020

View reviewed changes

Change isHide to getHide

ad4b922

jasmith-hs merged commit 9808515 into HubSpot:master Nov 17, 2020

jasmith-hs deleted the chunk-resolver branch November 17, 2020 17:10

This was referenced Nov 24, 2020

Eager expression node #548

Merged

Implement EagerPrintTag and EagerDoTag #558

Merged

jasmith-hs added the Eager Execution label Jul 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ChunkResolver to partially resolve expressions #525

Add ChunkResolver to partially resolve expressions #525

jasmith-hs commented Nov 5, 2020 •

edited

Loading

boulter left a comment

boulter Nov 5, 2020

jasmith-hs Nov 13, 2020

jasmith-hs Nov 13, 2020

boulter Nov 5, 2020

jasmith-hs Nov 5, 2020

boulter Nov 5, 2020

jasmith-hs commented Nov 5, 2020 •

edited

Loading

boulter Nov 17, 2020

Add ChunkResolver to partially resolve expressions #525

Add ChunkResolver to partially resolve expressions #525

Conversation

jasmith-hs commented Nov 5, 2020 • edited Loading

Part of #532

Purpose

Logic

boulter left a comment

Choose a reason for hiding this comment

boulter Nov 5, 2020

Choose a reason for hiding this comment

jasmith-hs Nov 13, 2020

Choose a reason for hiding this comment

jasmith-hs Nov 13, 2020

Choose a reason for hiding this comment

boulter Nov 5, 2020

Choose a reason for hiding this comment

jasmith-hs Nov 5, 2020

Choose a reason for hiding this comment

boulter Nov 5, 2020

Choose a reason for hiding this comment

jasmith-hs commented Nov 5, 2020 • edited Loading

boulter Nov 17, 2020

Choose a reason for hiding this comment

jasmith-hs commented Nov 5, 2020 •

edited

Loading

jasmith-hs commented Nov 5, 2020 •

edited

Loading