fix(optimizer): Fix PlanRemoteProjections to keep JsonPath arguments inline for local functions#27323
Conversation
…nline for local functions Summary: JsonPath type is not serializable (its `createBlockBuilder` throws). When `PlanRemoteProjections` processes a local function that has a JsonPath-typed argument (e.g., `json_extract_scalar(remote_func(), '$.key')`), it previously extracted the JsonPath argument into a separate ProjectNode output variable. This caused a runtime failure when the engine tried to materialize the JsonPath-typed block. The fix keeps JsonPath-typed arguments inline within the local function call when `argumentProjection` is empty (i.e., the argument is a simple leaf expression that doesn't need further decomposition). This mirrors the existing behavior for ConstantExpression arguments. Differential Revision: D96397113
Reviewer's GuideAdjusts PlanRemoteProjections so that JsonPath-typed arguments to local functions are kept inline rather than being extracted into separate projection variables, and adds a regression test to verify JsonPath arguments are not materialized as ProjectNode outputs. Updated class diagram for PlanRemoteProjections argument handlingclassDiagram
class PlanRemoteProjections {
+List~ProjectionContext~ processArguments(List~RowExpression~ arguments, boolean local)
}
class RowExpression {
<<interface>>
+Type getType()
+List~ProjectionContext~ accept(RowExpressionVisitor visitor, Object context)
}
class ProjectionContext {
+Map~VariableReferenceExpression, RowExpression~ projections
+boolean remote
+ProjectionContext(Map~VariableReferenceExpression, RowExpression~ projections, boolean remote)
}
class VariableReferenceExpression {
+Type getType()
}
class Type {
}
class JsonPathType {
}
class VariableAllocator {
+VariableReferenceExpression newVariable(RowExpression expression)
}
PlanRemoteProjections --> VariableAllocator : uses
PlanRemoteProjections --> ProjectionContext : creates
PlanRemoteProjections --> RowExpression : processes
RowExpression --> ProjectionContext : returns from accept
VariableReferenceExpression --> Type : has
RowExpression --> Type : has
JsonPathType --|> Type
%% Key logic in PlanRemoteProjections.processArguments
class ProcessArgumentsLogic {
+handleArgument(RowExpression argument, boolean local)
}
PlanRemoteProjections ..> ProcessArgumentsLogic : contains logic
%% Relevant conditional branches
class ProcessArgumentsLogic {
+if argumentProjection not empty: useProjection()
+if argumentProjection empty and local and type == JSON_PATH: keepInline()
+if argumentProjection empty and not JSON_PATH: createVariableProjection()
}
Flow diagram for JsonPath handling in PlanRemoteProjections.processArgumentsflowchart TD
A["Start processing function arguments"] --> B["Take next argument"]
B --> C{More arguments?}
C -- No --> Z["Done"]
C -- Yes --> D["Call argument.accept(visitor, null)"]
D --> E{argumentProjection is empty?}
E -- No --> F["Use existing argumentProjection
(newArguments add variable from projection)"]
F --> B
E -- Yes --> G{local function call?}
G -- No --> H["Allocate new VariableReferenceExpression
from variableAllocator"]
H --> I["Wrap in ProjectionContext
and add to projections"]
I --> J["newArguments add variable"]
J --> B
G -- Yes --> K{argument type is JSON_PATH?}
K -- No --> H
K -- Yes --> L["Keep argument inline:
newArguments add argument directly"]
L --> B
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The special-case check for
local && argument.getType().equals(JSON_PATH)hardcodes a single non-serializable type; consider centralizing this as a predicate (e.g.,isInlineOnlyType(Type)) so other non-serializable types can reuse the behavior without further branching here. - In
testJsonPathArgumentKeptInline, you could tighten the assertion by deriving the JSON path variables from the plannedProjectNodeoutputs (rather than all projection variables) to ensure the test is specifically validating that no JSON path is ever materialized as a project output.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The special-case check for `local && argument.getType().equals(JSON_PATH)` hardcodes a single non-serializable type; consider centralizing this as a predicate (e.g., `isInlineOnlyType(Type)`) so other non-serializable types can reuse the behavior without further branching here.
- In `testJsonPathArgumentKeptInline`, you could tighten the assertion by deriving the JSON path variables from the planned `ProjectNode` outputs (rather than all projection variables) to ensure the test is specifically validating that no JSON path is ever materialized as a project output.
## Individual Comments
### Comment 1
<location path="presto-main-base/src/test/java/com/facebook/presto/sql/planner/iterative/rule/TestPlanRemoteProjections.java" line_range="193-202" />
<code_context>
assertEquals(rewritten.get(3).getProjections().size(), 5);
}
+ @Test
+ void testJsonPathArgumentKeptInline()
+ {
+ PlanBuilder planBuilder = new PlanBuilder(TEST_SESSION, new PlanNodeIdAllocator(), getMetadata());
+
+ PlanRemoteProjections rule = new PlanRemoteProjections(getFunctionAndTypeManager());
+ List<ProjectionContext> rewritten = rule.planRemoteAssignments(Assignments.builder()
+ .put(planBuilder.variable("a"), planBuilder.rowExpression("json_extract_scalar(CAST(unittest.memory.remote_foo() AS VARCHAR), '$.key')"))
+ .build(), new VariableAllocator(planBuilder.getTypes().allVariables()));
+ assertEquals(rewritten.size(), 2);
+ // Verify no projection context extracts a JsonPath-typed variable
+ for (ProjectionContext context : rewritten) {
</code_context>
<issue_to_address>
**suggestion (testing):** Consider asserting the structure of the rewritten projections to prove the JsonPath argument is actually kept inline
The current checks (no `JsonPath`-typed variables and exactly two contexts) are good for guarding against the regression. To more directly validate the behavior, consider also asserting that the rewritten projection still contains the `json_extract_scalar(CAST(unittest.memory.remote_foo() AS VARCHAR), '$.key')` expression inline, e.g. via `PlanMatchPattern` or by inspecting `ProjectionContext.getProjections().values()`. That would detect cases where the rule drops or unexpectedly restructures this expression while still avoiding a JsonPath-typed variable projection.
Suggested implementation:
```java
import static com.facebook.presto.sql.planner.iterative.rule.PlanRemoteProjections.ProjectionContext;
import static com.facebook.presto.type.JsonPathType.JSON_PATH;
import static org.testng.Assert.assertEquals;
import static org.testng.Assert.assertFalse;
import static org.testng.Assert.assertTrue;
```
```java
@Test
void testJsonPathArgumentKeptInline()
{
PlanBuilder planBuilder = new PlanBuilder(TEST_SESSION, new PlanNodeIdAllocator(), getMetadata());
PlanRemoteProjections rule = new PlanRemoteProjections(getFunctionAndTypeManager());
List<ProjectionContext> rewritten = rule.planRemoteAssignments(Assignments.builder()
.put(planBuilder.variable("a"), planBuilder.rowExpression("json_extract_scalar(CAST(unittest.memory.remote_foo() AS VARCHAR), '$.key')"))
.build(), new VariableAllocator(planBuilder.getTypes().allVariables()));
assertEquals(rewritten.size(), 2);
// Verify no projection context extracts a JsonPath-typed variable
for (ProjectionContext context : rewritten) {
context.getProjections().keySet().forEach(variable ->
assertFalse(variable.getType().equals(JSON_PATH),
"JsonPath type should not be extracted as a ProjectNode output"));
}
// Verify the json_extract_scalar(...) expression with the JsonPath argument is still kept inline
boolean foundInlineJsonExtract = false;
for (ProjectionContext context : rewritten) {
for (RowExpression projection : context.getProjections().values()) {
String projectionString = projection.toString();
if (projectionString.contains("json_extract_scalar")
&& projectionString.contains("unittest.memory.remote_foo")
&& projectionString.contains("'$.key'")) {
foundInlineJsonExtract = true;
break;
}
}
if (foundInlineJsonExtract) {
break;
}
}
assertTrue(foundInlineJsonExtract,
"Expected json_extract_scalar(CAST(unittest.memory.remote_foo() AS VARCHAR), '$.key') to be kept inline in rewritten projections");
}
```
1. Ensure `RowExpression` is imported if not already present in the file:
`import com.facebook.presto.sql.relational.RowExpression;`
2. If the existing tests use a different way to render or inspect `RowExpression` (e.g. a utility for formatting row expressions), you may want to replace the `toString()`-based checks with that preferred mechanism to make the assertion more robust.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| @Test | ||
| void testJsonPathArgumentKeptInline() | ||
| { | ||
| PlanBuilder planBuilder = new PlanBuilder(TEST_SESSION, new PlanNodeIdAllocator(), getMetadata()); | ||
|
|
||
| PlanRemoteProjections rule = new PlanRemoteProjections(getFunctionAndTypeManager()); | ||
| List<ProjectionContext> rewritten = rule.planRemoteAssignments(Assignments.builder() | ||
| .put(planBuilder.variable("a"), planBuilder.rowExpression("json_extract_scalar(CAST(unittest.memory.remote_foo() AS VARCHAR), '$.key')")) | ||
| .build(), new VariableAllocator(planBuilder.getTypes().allVariables())); | ||
| assertEquals(rewritten.size(), 2); |
There was a problem hiding this comment.
suggestion (testing): Consider asserting the structure of the rewritten projections to prove the JsonPath argument is actually kept inline
The current checks (no JsonPath-typed variables and exactly two contexts) are good for guarding against the regression. To more directly validate the behavior, consider also asserting that the rewritten projection still contains the json_extract_scalar(CAST(unittest.memory.remote_foo() AS VARCHAR), '$.key') expression inline, e.g. via PlanMatchPattern or by inspecting ProjectionContext.getProjections().values(). That would detect cases where the rule drops or unexpectedly restructures this expression while still avoiding a JsonPath-typed variable projection.
Suggested implementation:
import static com.facebook.presto.sql.planner.iterative.rule.PlanRemoteProjections.ProjectionContext;
import static com.facebook.presto.type.JsonPathType.JSON_PATH;
import static org.testng.Assert.assertEquals;
import static org.testng.Assert.assertFalse;
import static org.testng.Assert.assertTrue; @Test
void testJsonPathArgumentKeptInline()
{
PlanBuilder planBuilder = new PlanBuilder(TEST_SESSION, new PlanNodeIdAllocator(), getMetadata());
PlanRemoteProjections rule = new PlanRemoteProjections(getFunctionAndTypeManager());
List<ProjectionContext> rewritten = rule.planRemoteAssignments(Assignments.builder()
.put(planBuilder.variable("a"), planBuilder.rowExpression("json_extract_scalar(CAST(unittest.memory.remote_foo() AS VARCHAR), '$.key')"))
.build(), new VariableAllocator(planBuilder.getTypes().allVariables()));
assertEquals(rewritten.size(), 2);
// Verify no projection context extracts a JsonPath-typed variable
for (ProjectionContext context : rewritten) {
context.getProjections().keySet().forEach(variable ->
assertFalse(variable.getType().equals(JSON_PATH),
"JsonPath type should not be extracted as a ProjectNode output"));
}
// Verify the json_extract_scalar(...) expression with the JsonPath argument is still kept inline
boolean foundInlineJsonExtract = false;
for (ProjectionContext context : rewritten) {
for (RowExpression projection : context.getProjections().values()) {
String projectionString = projection.toString();
if (projectionString.contains("json_extract_scalar")
&& projectionString.contains("unittest.memory.remote_foo")
&& projectionString.contains("'$.key'")) {
foundInlineJsonExtract = true;
break;
}
}
if (foundInlineJsonExtract) {
break;
}
}
assertTrue(foundInlineJsonExtract,
"Expected json_extract_scalar(CAST(unittest.memory.remote_foo() AS VARCHAR), '$.key') to be kept inline in rewritten projections");
}- Ensure
RowExpressionis imported if not already present in the file:
import com.facebook.presto.sql.relational.RowExpression; - If the existing tests use a different way to render or inspect
RowExpression(e.g. a utility for formatting row expressions), you may want to replace thetoString()-based checks with that preferred mechanism to make the assertion more robust.
|
@arhimondr @jaystarshot @NikhilCollooru Can I get a committer approval for it? Thanks! |
Description
JsonPath type is not serializable (its
createBlockBuilderthrows). WhenPlanRemoteProjectionsprocesses a local function that has a JsonPath-typedargument (e.g.,
json_extract_scalar(remote_func(), '$.key')), it previouslyextracted the JsonPath argument into a separate ProjectNode output variable.
This caused a runtime failure when the engine tried to materialize the
JsonPath-typed block.
The fix keeps JsonPath-typed arguments inline within the local function call
when
argumentProjectionis empty (i.e., the argument is a simple leafexpression that doesn't need further decomposition). This mirrors the existing
behavior for ConstantExpression arguments.
Motivation and Context
Remote function calls involving
json_extract_scalarwith JsonPath argumentswould fail at runtime because JsonPath type cannot be serialized into a block.
Impact
No public API changes. Fixes runtime failures for queries using
json_extract_scalar(or similar JsonPath-consuming functions) on results ofremote functions.
Test Plan
Added
testJsonPathArgumentKeptInlineinTestPlanRemoteProjectionsthatverifies
json_extract_scalar(CAST(remote_foo() AS VARCHAR), '$.key')isrewritten without extracting any JsonPath-typed variable.
Contributor checklist
style and commit standards.
Release Notes
Summary by Sourcery
Adjust PlanRemoteProjections handling of JsonPath-typed arguments for local functions to avoid creating non-serializable project outputs and add coverage for this behavior.
Bug Fixes:
Tests: