ZEPPELIN-289: User can now enter custom expressions in notebooks' input fields #320
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Actually, with Zeppelin we can use Spark SQL UDFs perfectly fine.
We developed a custom UDF library that parses absolute and relative dates. Feeding this library into Spark SQL using the standard UDF mechanism is suboptimal, since each UDF call is repeated for each row of the queried table.
Example:
This repeats the call to parseDate(...) for every single row of 'my_table'.
Even worse, if we filter for a date range like in:
the call to parseDate(...) is performed twice for each row in the table.
Since Spark's UDFs do not have a concept of 'execution context' we were not able to overcome the problem.
We implemented a mechanism of UDF evaluation in Zeppelin, before the query parameters are sent to the interpreter. Parametrizing queries as usual in Zeppelin, in Zeppelin's input forms you can now enter expressions like:
or:
this is similar to how standard SQL works, where parameters are evaluated before being sent to the execution engine.
You can find more info in the org.apache.zeppelin.display.Evaluator javadoc.
The above mentioned query over a table of 1 million records lasts about 1 minute. Applying this PR the execution time is reduced to 15 seconds.