Support JSON_EXISTS, JSON_VALUE and JSON_QUERY functions#9831
Support JSON_EXISTS, JSON_VALUE and JSON_QUERY functions#9831kasiafi merged 9 commits intotrinodb:masterfrom
Conversation
fe236ff to
70e48c9
Compare
0acac64 to
e1f3ccd
Compare
5fa3108 to
1866670
Compare
a53e24a to
5b355b5
Compare
4a4b801 to
a17b750
Compare
core/trino-parser/src/main/antlr4/io/trino/sql/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
core/trino-parser/src/main/antlr4/io/trino/jsonpath/JsonPath.g4
Outdated
Show resolved
Hide resolved
b1c768f to
b5d5f26
Compare
There was a problem hiding this comment.
In that case, the rewriter should create the JsonExists node with the same location as the original one, and we should remove the location-less constructor. Same for the other ones.
There was a problem hiding this comment.
What's this class for? The name is too vague and the method below is not very clear. I'm thinking of whether it should be renamed or placed elsewhere.
There was a problem hiding this comment.
I added a javadoc, and renamed the class to ParameterUtil.
There was a problem hiding this comment.
When function is not registered, which shouldn't happen. Should I remove the try-catch?
There was a problem hiding this comment.
Call .build() here directly and declare the variable as List<Type>. No need to have an explicit variable to hold the intermediate.
There was a problem hiding this comment.
This should not be in the "planner" package, but in the io.trino.json package.
There was a problem hiding this comment.
This seems backward. The fixed part should be the path (i.e., the "program" or "IR"), the functions, etc. The variable part should be the json over which the path is being evaluated.
That will allow caching resolutions and other similar decisions and attach them to the corresponding IR nodes.
There was a problem hiding this comment.
Where is the operator being cached? Ideally, it should be attached to the node (via a map of node->operator) to avoid having to re-resolve it every time.
There was a problem hiding this comment.
The operators are cached in CachingResolver.
b5341a7 to
04d22e4
Compare
|
@martint I refactored operator caching according to our discussion. Please take a look at the last commit. |
32c0f72 to
4838a88
Compare
There was a problem hiding this comment.
It's not obvious by looking at this why it's safe to skip re-initializing the evaluator if it's already set. Technically, and from the JsonPathInvocationContext's perspective, the path could be different. The fact that it's not, is a side effect of how the JSON constructs in SQL work (the path is constant) and how they are mapped to functions. But that knowledge is too far removed from this site.
I don't think this class is needed. Instead, the functions should initialize and hold on to the JsonPathEvaluator instance upon first invocation (i.e., wherever they are calling initializeIfNecessary)
There was a problem hiding this comment.
I think that JsonPathInvocationContext is needed. It is passed through the instanceFactory and this way an instance of JsonPathInvocationContext persists through multiple rows processed by the JSON-funciton.
We can't skip it and just pass JsonPathEvaluator as the context object, because we can create the JsonPathEvaluator only after the function is invoked with certain arguments (including path). The context object already exists at that moment (it comes as one of the arguments to the function).
There was a problem hiding this comment.
Yes, you're right. This is the state object for the (stateful) function. However, it could be changed to be just that -- a state object, and leave all logic to be done in the caller.
Let `handleSubqueries` find subqueries nested under `Expression` nodes, linked via children not being `Expression`.
`json_extract` and `json_extract_scalar` are old JSON-processing functions which have very limited capabilities, and are not compliant with the spec. However, they have simple lightweight implementation, optimized for the use case. `json_query` and `json_value` are new spec-compliant JSON-processing functions, which support the complete specification for JSON path. Their implementation is much more complicated, resulting both from the scope of the supported feature, and the "streaming" semantics. These benchmarks are to compare both implementations applied to the common use-case. Benchmark json_extract vs json_query: Benchmark (depth) Mode Cnt Score Error Units BenchmarkJsonFunctions.benchmarkJsonExtractFunction 1 avgt 15 813.030 ± 12.263 ns/op BenchmarkJsonFunctions.benchmarkJsonExtractFunction 3 avgt 15 939.202 ± 94.621 ns/op BenchmarkJsonFunctions.benchmarkJsonExtractFunction 6 avgt 15 1004.352 ± 12.945 ns/op BenchmarkJsonFunctions.benchmarkJsonQueryFunction 1 avgt 15 1136.371 ± 17.502 ns/op BenchmarkJsonFunctions.benchmarkJsonQueryFunction 3 avgt 15 1399.780 ± 27.967 ns/op BenchmarkJsonFunctions.benchmarkJsonQueryFunction 6 avgt 15 1666.810 ± 29.572 ns/op Benchmark json_extract_scalar vs json_value: Benchmark (depth) Mode Cnt Score Error Units BenchmarkJsonFunctions.benchmarkJsonExtractScalarFunction 1 avgt 15 644.762 ± 9.195 ns/op BenchmarkJsonFunctions.benchmarkJsonExtractScalarFunction 3 avgt 15 720.244 ± 13.288 ns/op BenchmarkJsonFunctions.benchmarkJsonExtractScalarFunction 10 avgt 15 928.069 ± 17.623 ns/op BenchmarkJsonFunctions.benchmarkJsonValueFunction 1 avgt 15 811.158 ± 14.694 ns/op BenchmarkJsonFunctions.benchmarkJsonValueFunction 3 avgt 15 1042.795 ± 40.555 ns/op BenchmarkJsonFunctions.benchmarkJsonValueFunction 10 avgt 15 1663.328 ± 27.320 ns/op
There was a problem hiding this comment.
Why not flatten the caches an use a composite key made of IrPathNode ref, left type and right type?
There was a problem hiding this comment.
Then one node could take too much space.
There was a problem hiding this comment.
That should be ok, especially if there's a case where a single node sees many combinations of types.
It would also make the code simpler. If later we find that this is not sufficient, we can think of better caching and eviction strategies, or JIT-style codegen/execution
There was a problem hiding this comment.
The cache is now flattened.
|
Nice! |
No description provided.