-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(plugin-native-sidecar): Add native row expression optimizer #24602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
baa80c5 to
5bf64bf
Compare
5bf64bf to
02e81b9
Compare
5c75b74 to
fc9942d
Compare
ccbf3ac to
e7e90f6
Compare
dd590aa to
241f562
Compare
9b1e4f6 to
47fe096
Compare
Reviewer's GuideAdds a native expression optimizer that delegates row expression optimization to the native sidecar via a new ExpressionOptimizer implementation, wires it into the expression manager and plugin infrastructure, and refactors tests and existing optimizers/serde to use the new path and shared RowExpression serialization. Sequence diagram for native expression optimization via sidecarsequenceDiagram
participant Coordinator as Coordinator
participant ExpressionOptimizerManager as ExpressionOptimizerManager
participant NativeExpressionOptimizer as NativeExpressionOptimizer
participant NativeSidecarExpressionInterpreter as NativeSidecarExpressionInterpreter
participant Sidecar as NativeSidecar
Coordinator->>ExpressionOptimizerManager: optimize(rowExpression, level, session)
ExpressionOptimizerManager->>NativeExpressionOptimizer: ExpressionOptimizer.optimize(expression, level, session, variableResolver)
NativeExpressionOptimizer->>NativeExpressionOptimizer: CollectingVisitor.visitExpression
NativeExpressionOptimizer->>NativeExpressionOptimizer: Collect expressionsToOptimize
NativeExpressionOptimizer->>NativeExpressionOptimizer: ReplacingVisitor(variableResolver)
NativeExpressionOptimizer->>NativeSidecarExpressionInterpreter: optimizeBatch(session, expressions, level)
NativeSidecarExpressionInterpreter->>NodeManager: getSidecarNode()
NodeManager-->>NativeSidecarExpressionInterpreter: sidecarNode
NativeSidecarExpressionInterpreter->>NativeSidecarExpressionInterpreter: build Request
NativeSidecarExpressionInterpreter->>Sidecar: HTTP POST /v1/expressions
Sidecar-->>NativeSidecarExpressionInterpreter: List<RowExpressionOptimizationResult>
NativeSidecarExpressionInterpreter-->>NativeExpressionOptimizer: Map<RowExpression,RowExpression> replacements
NativeExpressionOptimizer->>NativeExpressionOptimizer: ReplacingVisitor(replacements)
NativeExpressionOptimizer-->>ExpressionOptimizerManager: optimized RowExpression
ExpressionOptimizerManager-->>Coordinator: optimized RowExpression
Class diagram for native expression optimizer and context wiringclassDiagram
class ExpressionOptimizerManager {
<<service>>
-NodeManager nodeManager
-FunctionAndTypeManager functionAndTypeManager
-RowExpressionSerde rowExpressionSerde
-FunctionResolution functionResolution
-File configurationDirectory
-Map~String,ExpressionOptimizerFactory~ expressionOptimizerFactories
-Map~String,ExpressionOptimizer~ expressionOptimizers
+ExpressionOptimizerManager(PluginNodeManager nodeManager, FunctionAndTypeManager functionAndTypeManager, RowExpressionSerde rowExpressionSerde)
+ExpressionOptimizerManager(PluginNodeManager nodeManager, FunctionAndTypeManager functionAndTypeManager, RowExpressionSerde rowExpressionSerde, File configurationDirectory)
+void loadExpressionOptimizerFactories()
+void loadExpressionOptimizerFactory(File configurationFile)
+void loadExpressionOptimizerFactory(String factoryName, String optimizerName, Map~String,String~ properties)
+void addExpressionOptimizerFactory(ExpressionOptimizerFactory factory)
}
class ExpressionOptimizerContext {
-NodeManager nodeManager
-RowExpressionSerde rowExpressionSerde
-FunctionMetadataManager functionMetadataManager
-StandardFunctionResolution functionResolution
+ExpressionOptimizerContext(NodeManager nodeManager, RowExpressionSerde rowExpressionSerde, FunctionMetadataManager functionMetadataManager, StandardFunctionResolution functionResolution)
+NodeManager getNodeManager()
+RowExpressionSerde getRowExpressionSerde()
+FunctionMetadataManager getFunctionMetadataManager()
+StandardFunctionResolution getFunctionResolution()
}
class RowExpressionSerde {
<<interface>>
+String serialize(RowExpression expression)
+RowExpression deserialize(String value)
}
class JsonCodecRowExpressionSerde {
-JsonCodec~RowExpression~ codec
+JsonCodecRowExpressionSerde(JsonCodec~RowExpression~ codec)
+String serialize(RowExpression expression)
+RowExpression deserialize(String data)
}
RowExpressionSerde <|.. JsonCodecRowExpressionSerde
ExpressionOptimizerManager --> RowExpressionSerde : uses
ExpressionOptimizerManager --> ExpressionOptimizerContext : creates
ExpressionOptimizerContext --> RowExpressionSerde : exposes
class NativeExpressionOptimizerFactory {
<<factory>>
+static String NAME
-ClassLoader classLoader
+NativeExpressionOptimizerFactory(ClassLoader classLoader)
+String getName()
+ExpressionOptimizer createOptimizer(Map~String,String~ config, ExpressionOptimizerContext context)
}
class NativeExpressionsModule {
<<module>>
-NodeManager nodeManager
-RowExpressionSerde rowExpressionSerde
-FunctionMetadataManager functionMetadataManager
-StandardFunctionResolution functionResolution
+NativeExpressionsModule(NodeManager nodeManager, RowExpressionSerde rowExpressionSerde, FunctionMetadataManager functionMetadataManager, StandardFunctionResolution functionResolution)
+void configure(Binder binder)
}
class NativeExpressionOptimizer {
<<service>>
-FunctionMetadataManager functionMetadataManager
-StandardFunctionResolution resolution
-NativeSidecarExpressionInterpreter rowExpressionInterpreterService
+NativeExpressionOptimizer(NativeSidecarExpressionInterpreter rowExpressionInterpreterService, FunctionMetadataManager functionMetadataManager, StandardFunctionResolution resolution)
+RowExpression optimize(RowExpression expression, ExpressionOptimizer.Level level, ConnectorSession session, Function~VariableReferenceExpression,Object~ variableResolver)
}
class CollectingVisitor {
-FunctionMetadataManager functionMetadataManager
-ExpressionOptimizer.Level optimizationLevel
-StandardFunctionResolution resolution
-Set~RowExpression~ expressionsToOptimize
-Set~RowExpression~ hasOptimizedChildren
+CollectingVisitor(FunctionMetadataManager functionMetadataManager, ExpressionOptimizer.Level optimizationLevel, StandardFunctionResolution resolution)
+boolean canBeOptimized(RowExpression rowExpression)
+List~RowExpression~ getExpressionsToOptimize()
}
class ReplacingVisitor {
-Function~RowExpression,RowExpression~ resolver
+ReplacingVisitor(Map~RowExpression,RowExpression~ replacements)
+ReplacingVisitor(Function~VariableReferenceExpression,RowExpression~ variableResolver)
}
class NativeSidecarExpressionInterpreter {
<<service>>
+static String PRESTO_TIME_ZONE_HEADER
+static String PRESTO_USER_HEADER
+static String PRESTO_EXPRESSION_OPTIMIZER_LEVEL_HEADER
-NodeManager nodeManager
-HttpClient httpClient
-JsonCodec~List~RowExpression~~ rowExpressionCodec
-JsonCodec~List~RowExpressionOptimizationResult~~ rowExpressionOptimizationResultJsonCodec
+NativeSidecarExpressionInterpreter(HttpClient httpClient, NodeManager nodeManager, JsonCodec~List~RowExpressionOptimizationResult~~ rowExpressionOptimizationResultJsonCodec, JsonCodec~List~RowExpression~~ rowExpressionCodec)
+Map~RowExpression,RowExpression~ optimizeBatch(ConnectorSession session, Map~RowExpression,RowExpression~ expressions, ExpressionOptimizer.Level level)
+List~RowExpressionOptimizationResult~ optimize(ConnectorSession session, ExpressionOptimizer.Level level, List~RowExpression~ resolvedExpressions)
}
class RowExpressionSerializer {
-RowExpressionSerde rowExpressionSerde
+RowExpressionSerializer(RowExpressionSerde rowExpressionSerde)
+void serialize(RowExpression rowExpression, JsonGenerator jsonGenerator, SerializerProvider serializerProvider)
}
class RowExpressionDeserializer {
-RowExpressionSerde rowExpressionSerde
+RowExpressionDeserializer(RowExpressionSerde rowExpressionSerde)
+RowExpression deserialize(JsonParser jsonParser, DeserializationContext context)
}
NativeExpressionOptimizerFactory --> NativeExpressionsModule : installs
NativeExpressionsModule --> NativeExpressionOptimizer : binds
NativeExpressionsModule --> NativeSidecarExpressionInterpreter : binds
NativeExpressionsModule --> RowExpressionSerializer : binds
NativeExpressionsModule --> RowExpressionDeserializer : binds
NativeExpressionOptimizer ..> CollectingVisitor : uses
NativeExpressionOptimizer ..> ReplacingVisitor : uses
NativeExpressionOptimizer --> NativeSidecarExpressionInterpreter : uses
RowExpressionSerializer --> RowExpressionSerde : uses
RowExpressionDeserializer --> RowExpressionSerde : uses
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In TestNativeExpressionInterpreter.optimizeRowExpression you use a bare Java
assertto checkgetExpressionFailureInfo().getMessage() != null; consider replacing this with a TestNG assertion (e.g.,assertNotNull) so the check is always enforced regardless of JVM-eaflags. - NativeExpressionsModule sets up its own RowExpression (de)serialization and codecs, while ServerMainModule/PrestoSparkModule also bind RowExpression JSON codecs; it may be worth consolidating these bindings or documenting their intended separation to avoid subtle mismatches between server and sidecar serialization.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In TestNativeExpressionInterpreter.optimizeRowExpression you use a bare Java `assert` to check `getExpressionFailureInfo().getMessage() != null`; consider replacing this with a TestNG assertion (e.g., `assertNotNull`) so the check is always enforced regardless of JVM `-ea` flags.
- NativeExpressionsModule sets up its own RowExpression (de)serialization and codecs, while ServerMainModule/PrestoSparkModule also bind RowExpression JSON codecs; it may be worth consolidating these bindings or documenting their intended separation to avoid subtle mismatches between server and sidecar serialization.
## Individual Comments
### Comment 1
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:218-227` </location>
<code_context>
+ if (node.getForm() == COALESCE) {
</code_context>
<issue_to_address>
**suggestion (bug_risk):** The COALESCE optimization condition is always true for the `size <= arguments.size()` part, so the logic doesn’t match the comment and may over-mark expressions as optimizable.
In the COALESCE branch, `builder` only contains arguments where `canBeOptimized(argument)` is true, and this block only executes when `anyArgumentsOptimizable` is false, so `builder.build().size()` is always 0. As a result, `builder.build().size() <= node.getArguments().size()` is always true for any non-empty COALESCE, making `canBeOptimized` true regardless of duplicates. This contradicts the de‑duplication comment and likely sends many more COALESCEs to the sidecar than intended. If you’re trying to detect duplicates, you probably need a strict `<` and/or to compare the original argument count to the size of a set containing all arguments, not just the optimizable ones.
Suggested implementation:
```java
// If the special form is COALESCE, then we can optimize it if there are any duplicate arguments
if (node.getForm() == COALESCE) {
// Detect duplicates across all arguments, not just the optimizable ones
int originalArgumentCount = node.getArguments().size();
ImmutableSet<RowExpression> uniqueArguments = ImmutableSet.copyOf(node.getArguments());
boolean hasDuplicates = uniqueArguments.size() < originalArgumentCount;
// If there were any duplicates, or if there's no arguments (cancel out), or if there's only one argument (just return it),
// then it's also constant foldable
```
```java
// If there were any duplicates, or if there's no arguments (cancel out), or if there's only one argument (just return it),
// then it's also constant foldable
if (hasDuplicates || originalArgumentCount == 0 || originalArgumentCount == 1) {
visitNode(node, true);
return null;
}
```
If there is an existing `if` block immediately after this COALESCE section that calls `visitNode(node, true)` based on `builder.build().size() <= node.getArguments().size()`, it should be removed, because the new `hasDuplicates` logic fully replaces that condition. Adjust imports if `ImmutableSet` is not already statically imported in this file (it appears to be, given the existing code).
</issue_to_address>
### Comment 2
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:397-401` </location>
<code_context>
+ RowExpressionOptimizationResult response = optimize(expression, level);
</code_context>
<issue_to_address>
**issue (testing):** Avoid using Java `assert` in tests; use TestNG assertions so the checks always run
This uses a plain Java `assert` to check `response.getExpressionFailureInfo().getMessage()` for non-null, which may be disabled in CI or production runs and therefore never execute. Replace it with a TestNG assertion, e.g. `assertNotNull(response.getExpressionFailureInfo().getMessage());`, before the `assertTrue` call so the check always runs.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...n/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java
Show resolved
Hide resolved
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- In
CollectingVisitor.visitSpecialForm, the COALESCE optimization checkbuilder.build().size() <= node.getArguments().size()is always true (since a set cannot be larger than the list), so any COALESCE will be treated as optimizable regardless of duplicates; consider fixing this logic to actually detect and rely on duplicate arguments. - In
NativeExpressionOptimizer.optimize, the mapping from genericLevelto the sidecar-supported levels is done viaordinal()comparison; using an explicit switch or mapping on the enum values (e.g., SERIALIZABLE → OPTIMIZED, OPTIMIZED/EVALUATED → themselves) would avoid brittle ordinal-based coupling.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `CollectingVisitor.visitSpecialForm`, the COALESCE optimization check `builder.build().size() <= node.getArguments().size()` is always true (since a set cannot be larger than the list), so any COALESCE will be treated as optimizable regardless of duplicates; consider fixing this logic to actually detect and rely on duplicate arguments.
- In `NativeExpressionOptimizer.optimize`, the mapping from generic `Level` to the sidecar-supported levels is done via `ordinal()` comparison; using an explicit switch or mapping on the enum values (e.g., SERIALIZABLE → OPTIMIZED, OPTIMIZED/EVALUATED → themselves) would avoid brittle ordinal-based coupling.
## Individual Comments
### Comment 1
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:360-362` </location>
<code_context>
- log.error(e, "Failed to decode RowExpression from sidecar response: %s.", e.getMessage());
- throw new RuntimeException(e);
- }
+ RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
+ assertNotNull(response.getExpressionFailureInfo().getMessage());
+ assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
}
</code_context>
<issue_to_address>
**suggestion (testing):** Also assert that `optimizedExpression` is null when a failure is returned
In `assertEvaluateFails`, the failure contract for `RowExpressionOptimizationResult` is that `expressionFailureInfo` is set and `optimizedExpression` is null. Please also assert that `response.getOptimizedExpression()` is null so tests will catch any regression where a non-null optimized expression is returned alongside a failure.
```suggestion
RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
assertNotNull(response.getExpressionFailureInfo().getMessage());
assertNull(response.getOptimizedExpression());
assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
tdcmeehan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment, otherwise LGTM
...-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java
Outdated
Show resolved
Hide resolved
|
@sourcery-ai review |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- NativeSidecarExpressionInterpreter.optimizeBatch assumes every RowExpressionOptimizationResult contains a non-null optimizedExpression and ignores expressionFailureInfo; consider validating the response and propagating or surfacing failures explicitly instead of blindly dereferencing getOptimizedExpression().
- NativeExpressionOptimizer maps arbitrary levels via ordinal comparisons (level.ordinal() < OPTIMIZED.ordinal() ? OPTIMIZED : level); to avoid subtle bugs if new levels are added or the enum order changes, it would be safer to handle supported levels explicitly (e.g., switch on Level and reject or map only known cases like SERIALIZABLE).
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- NativeSidecarExpressionInterpreter.optimizeBatch assumes every RowExpressionOptimizationResult contains a non-null optimizedExpression and ignores expressionFailureInfo; consider validating the response and propagating or surfacing failures explicitly instead of blindly dereferencing getOptimizedExpression().
- NativeExpressionOptimizer maps arbitrary levels via ordinal comparisons (level.ordinal() < OPTIMIZED.ordinal() ? OPTIMIZED : level); to avoid subtle bugs if new levels are added or the enum order changes, it would be safer to handle supported levels explicitly (e.g., switch on Level and reject or map only known cases like SERIALIZABLE).
## Individual Comments
### Comment 1
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:181-183` </location>
<code_context>
+ optimizationLevel.ordinal() == EVALUATED.ordinal();
+
+ // All arguments must be optimizable in order to evaluate the function
+ boolean allConstantFoldable = node.getArguments().stream()
+ .peek(child -> child.accept(this, context))
+ .allMatch(this::canBeOptimized);
+ if (canBeEvaluated && allConstantFoldable) {
+ visitNode(node, true);
</code_context>
<issue_to_address>
**issue (performance):** Child arguments of a CallExpression may not be fully visited, which can miss optimizable sub-expressions.
In `visitCall`, `allConstantFoldable` is computed via `node.getArguments().stream().peek(child -> child.accept(this, context)).allMatch(this::canBeOptimized);`. Because `allMatch` short-circuits, once an argument is not optimizable, later arguments (and their subtrees) are never visited, so their optimizable sub-expressions never reach `expressionsToOptimize`. To avoid this, visit all arguments unconditionally first (e.g., a simple loop calling `accept` on each child), then in a separate pass compute `allConstantFoldable` with `allMatch(this::canBeOptimized)` over the already-visited children.
</issue_to_address>
### Comment 2
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:360-362` </location>
<code_context>
- log.error(e, "Failed to decode RowExpression from sidecar response: %s.", e.getMessage());
- throw new RuntimeException(e);
- }
+ RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
+ assertNotNull(response.getExpressionFailureInfo().getMessage());
+ assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
}
</code_context>
<issue_to_address>
**suggestion (testing):** The failure-path assertion no longer checks that the optimized expression is null, weakening the contract being tested.
The previous `assertEvaluateFails` also verified that `optimizedExpression` was absent (null) on failure, while the new test only checks the error message. To preserve the endpoint contract, please also assert that `optimizedExpression` is null, for example:
```java
RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
assertNull(response.getOptimizedExpression(), "Expected optimizedExpression to be null on failure");
assertNotNull(response.getExpressionFailureInfo().getMessage());
assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage),
format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
```
This ensures we catch cases where a failure incorrectly includes a non-null optimized expression.
Suggested implementation:
```java
RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
assertNull(response.getOptimizedExpression(), "Expected optimizedExpression to be null on failure");
assertNotNull(response.getExpressionFailureInfo().getMessage());
assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
```
If this test class does not already have a static import for `assertNull` (for example, via `import static org.testng.Assert.*;` or `import static org.testng.Assert.assertNull;`), add the appropriate static import alongside the existing TestNG assertions.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java
Show resolved
Hide resolved
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/relational/RowExpressionOptimizer.java:57-58` </location>
<code_context>
- return null;
- }
- }
+ RowExpressionInterpreter interpreter = new RowExpressionInterpreter(expression, functionAndTypeManager, session, level);
+ return toRowExpression(expression.getSourceLocation(), interpreter.optimize(variableResolver::apply), expression.getType());
}
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against a null `variableResolver` to avoid potential NPEs.
The earlier version safely handled a nullable `variableResolver`, but this change always calls `variableResolver::apply`. If `variableResolver` can be null, this will throw an NPE. Either guarantee non-null at all call sites (and document it) or add a null check here and fall back to `interpreter.optimize()` when no resolver is provided.
</issue_to_address>
### Comment 2
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:73-74` </location>
<code_context>
+ public RowExpression optimize(RowExpression expression, Level level, ConnectorSession session, Function<VariableReferenceExpression, Object> variableResolver)
+ {
+ // Collect expressions to optimize
+ CollectingVisitor collectingVisitor = new CollectingVisitor(functionMetadataManager, level, resolution);
+ expression.accept(collectingVisitor, variableResolver);
+ List<RowExpression> expressionsToOptimize = collectingVisitor.getExpressionsToOptimize();
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Handling of `variableResolver` in `CollectingVisitor` assumes it is non-null, which may not always hold.
`CollectingVisitor.visitVariableReference` casts `context` to `Function<VariableReferenceExpression, Object>` and calls it directly. If `variableResolver` is null or a different context type is passed in recursive calls, this will cause a runtime failure. Please either require a non-null resolver at the `optimize` entry point (e.g., via a precondition) or make `CollectingVisitor` handle a null context safely (for example, treating it as `variable -> null`).
</issue_to_address>
### Comment 3
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeSidecarExpressionInterpreter.java:110-116` </location>
<code_context>
-
- // Replace the namespace with the Java built-in namespace
- FunctionHandle javaNamespaceFunctionHandle;
- try {
- javaNamespaceFunctionHandle = functionAndTypeManager.lookupFunction(
- QualifiedObjectName.valueOf(JAVA_BUILTIN_NAMESPACE, call.getDisplayName()),
</code_context>
<issue_to_address>
**suggestion:** The error code `INVALID_ARGUMENTS` for failed sidecar communication may be misleading.
Treating `httpClient.execute` failures (network/serialization/5xx) as `INVALID_ARGUMENTS` misrepresents an internal/system error as a user input issue. Consider using a more suitable code (e.g., generic internal error or sidecar-specific error) so operational failures aren’t surfaced as argument validation problems.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java
Show resolved
Hide resolved
Co-authored-by: Pratik Joseph Dabre <[email protected]>
Co-authored-by: Pratik Joseph Dabre <[email protected]>
…mizer Co-authored-by: Pratik Joseph Dabre <[email protected]>
69f9864 to
f99b31d
Compare
|
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and found some issues that need to be addressed.
- In NativeExpressionOptimizer.CollectingVisitor the
canBeEvaluatedcondition based onoptimizationLevel.ordinal()meansLevel.SERIALIZABLEwill never allow any function evaluation (neither deterministic nor non-deterministic), effectively disabling optimization at that level even though the sidecar call later maps SERIALIZABLE to OPTIMIZED; consider making the level check explicit rather than ordinal-based so SERIALIZABLE behaves as intended. - In the sidecar flow, success and failure on
RowExpressionOptimizationResultare inferred implicitly fromexpressionFailureInfo/message(e.g.rePackageExceptionstreats any non-null failure info as an exception, and tests check forgetMessage().isEmpty()), which is easy to misinterpret; it would be clearer and safer to treat an empty/absent message as success explicitly (e.g., by short‑circuiting inrePackageExceptionswhen the message is empty) and update the test assertions to checkexpressionFailureInfonullness separately from message content.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In NativeExpressionOptimizer.CollectingVisitor the `canBeEvaluated` condition based on `optimizationLevel.ordinal()` means `Level.SERIALIZABLE` will never allow any function evaluation (neither deterministic nor non-deterministic), effectively disabling optimization at that level even though the sidecar call later maps SERIALIZABLE to OPTIMIZED; consider making the level check explicit rather than ordinal-based so SERIALIZABLE behaves as intended.
- In the sidecar flow, success and failure on `RowExpressionOptimizationResult` are inferred implicitly from `expressionFailureInfo`/`message` (e.g. `rePackageExceptions` treats any non-null failure info as an exception, and tests check for `getMessage().isEmpty()`), which is easy to misinterpret; it would be clearer and safer to treat an empty/absent message as success explicitly (e.g., by short‑circuiting in `rePackageExceptions` when the message is empty) and update the test assertions to check `expressionFailureInfo` nullness separately from message content.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/relational/RowExpressionOptimizer.java:57-58` </location>
<code_context>
- return null;
- }
- }
+ RowExpressionInterpreter interpreter = new RowExpressionInterpreter(expression, functionAndTypeManager, session, level);
+ return toRowExpression(expression.getSourceLocation(), interpreter.optimize(variableResolver::apply), expression.getType());
}
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against null variableResolver to preserve previous behavior and avoid NPEs.
The old code allowed a null `variableResolver` by passing `null` into `RowExpressionInterpreter.optimize`. The new version always calls `variableResolver::apply`, which will NPE if `variableResolver` is null (it’s not marked `@Nullable`, so this is easy to miss). Either preserve the null check and call `optimize(null)` when it’s null, or enforce non-null with `requireNonNull(variableResolver, ...)` and update callers accordingly. The first option avoids a behavior change.
</issue_to_address>
### Comment 2
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:70-78` </location>
<code_context>
@Override
public RowExpression optimize(RowExpression expression, Level level, ConnectorSession session, Function<VariableReferenceExpression, Object> variableResolver)
{
- return getRowExpression(expression, level, session, variableResolver);
</code_context>
<issue_to_address>
**issue (bug_risk):** Handle possible null variableResolver consistently when resolving variables.
This now assumes `variableResolver` is non-null and passes it into `ReplacingVisitor` as `variable -> variableResolver.apply(variable)`, which will throw an NPE if callers pass `null` (previously allowed by `RowExpressionOptimizer`). To preserve that behavior, either skip the replacement pass when `variableResolver` is null or wrap it in a null-safe adapter before passing it to `ReplacingVisitor`.
</issue_to_address>
### Comment 3
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeSidecarExpressionInterpreter.java:110-116` </location>
<code_context>
-
- // Replace the namespace with the Java built-in namespace
- FunctionHandle javaNamespaceFunctionHandle;
- try {
- javaNamespaceFunctionHandle = functionAndTypeManager.lookupFunction(
- QualifiedObjectName.valueOf(JAVA_BUILTIN_NAMESPACE, call.getDisplayName()),
</code_context>
<issue_to_address>
**suggestion:** Consider using a more appropriate error code for sidecar communication failures.
Treating all HTTP exceptions as `INVALID_ARGUMENTS` conflates user input issues with connectivity or server-side failures. Prefer an internal/generic error code (e.g., `GENERIC_INTERNAL_ERROR`) for sidecar/infrastructure problems, and reserve `INVALID_ARGUMENTS` for real user errors, potentially by checking the HTTP status or response payload.
</issue_to_address>
### Comment 4
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:75` </location>
<code_context>
-import static org.testng.Assert.assertNull;
import static org.testng.Assert.assertTrue;
public class TestNativeExpressionInterpreter
extends AbstractTestExpressionInterpreter
{
</code_context>
<issue_to_address>
**suggestion (testing):** There are no dedicated tests covering the new `NativeExpressionOptimizer` and sidecar optimizer wiring; key behaviors are currently untested.
This class only covers the native sidecar constant folding endpoint via `NativeSidecarExpressionInterpreter`; it doesn’t directly exercise `NativeExpressionOptimizer` or its interaction with `ExpressionOptimizerManager`.
Given the optimizer’s complexity (expression collection, COALESCE de-duplication, short-circuit handling, lambda optimization, variable resolver usage) and the behavior in `NativeSidecarExpressionInterpreter.optimizeBatch`/`rePackageExceptions` (batch mapping and error aggregation), please add targeted unit tests, for example:
* A `TestNativeExpressionOptimizer` suite that verifies:
* Only eligible subexpressions are sent to the sidecar (deterministic vs non-deterministic, no-op casts, COALESCE with duplicate args, lambda/nested lambda bodies, AND/OR special forms).
* Correct application of the variable resolver (constants vs unresolved variables).
* No double-optimization of nested expressions (subtree de-dup in `CollectingVisitor`).
* Tests for `optimizeBatch` and `rePackageExceptions` that cover:
* Successful optimization of all expressions and correct mapping back to originals.
* Failures in one or more `RowExpressionOptimizationResult` entries, asserting a `PrestoException` with multiple suppressed causes.
These would ensure the optimizer/sidecar wiring can be safely refactored or extended without regressions.
Suggested implementation:
```java
import static com.facebook.presto.metadata.MetadataManager.createTestMetadataManager;
import static com.google.inject.multibindings.Multibinder.newSetBinder;
import static java.lang.String.format;
import static org.testng.Assert.assertEquals;
import static org.testng.Assert.assertNotNull;
import static org.testng.Assert.assertTrue;
import static org.testng.Assert.assertNull;
```
I only see the top of `TestNativeExpressionInterpreter`, so I can’t safely patch in the new tests at the correct location. Below is what you should add to this file (after the existing tests, before the final `}` of the class) to implement the comment:
1. **Add fields / helper wiring for the optimizer + sidecar interpreter**
Inside `TestNativeExpressionInterpreter` (as private fields, likely next to `metadata`, `translator`, `queryRunner`), add:
```java
private final ExpressionOptimizerManager optimizerManager;
private final NativeExpressionOptimizer nativeExpressionOptimizer;
private final NativeSidecarExpressionInterpreter sidecarExpressionInterpreter;
```
Initialize them in the existing constructor / `@BeforeClass` (wherever the current `metadata`, `translator`, `queryRunner` are initialized), roughly like this (adjust ctor signatures to match actual code):
```java
this.optimizerManager = new ExpressionOptimizerManager(
metadata,
/* typeManager */ metadata.getTypeManager(),
/* functionResolution */ new FunctionResolution(metadata.getFunctionAndTypeManager()),
/* deterministicEvaluator */ new DeterminismEvaluator(metadata.getFunctionAndTypeManager()),
/* features/config */ new FeaturesConfig());
this.nativeExpressionOptimizer = new NativeExpressionOptimizer(
optimizerManager,
/* rowExpressionDomainTranslator */ new RowExpressionDomainTranslator(metadata.getFunctionAndTypeManager()),
/* variableResolver */ new VariableResolver(/* if you have a concrete impl, e.g., ConstantSymbolResolver */),
/* metadata */ metadata);
this.sidecarExpressionInterpreter = new NativeSidecarExpressionInterpreter(
nativeExpressionOptimizer,
/* sidecar client/service: */ new TestingNativeSidecarExpressionClient(),
/* batch size / config */ new NativeSidecarConfig().setMaxBatchSize(16));
```
Use the real constructor parameters/types from your codebase; the above is a sketch.
2. **Add tests for `NativeExpressionOptimizer` expression collection & eligibility**
Add a new `@Test` method that verifies only eligible subexpressions are collected and optimized. Example:
```java
@Test
public void testNativeExpressionOptimizerEligibleSubexpressions()
{
// Example: AND with deterministic & non-deterministic expressions, COALESCE with duplicates,
// and a lambda expression.
RowExpression deterministic = translator.translate("1 + 2", metadata);
RowExpression nonDeterministic = translator.translate("rand()", metadata);
RowExpression coalesceWithDuplicates = translator.translate("coalesce(a, a, 3)", metadata);
RowExpression lambdaExpression = translator.translate("transform(array[1,2,3], x -> x + 1)", metadata);
RowExpression compound = translator.translate(
format("(%s AND %s) OR %s", "1 + 2", "rand()", "coalesce(a, a, 3)"),
metadata);
// Run through optimizer
RowExpression optimized = nativeExpressionOptimizer.optimize(compound);
// Assert that:
// - deterministic subexpressions are folded (e.g., 1 + 2 -> 3)
// - non-deterministic ones (rand()) are preserved
// - COALESCE duplicates are de-duplicated
// - lambda body is optimized but not double-optimized or duplicated
assertTrue(containsLiteral(optimized, 3L), "Expected 1 + 2 to be constant folded to 3");
assertTrue(containsCall(optimized, "rand"), "rand() should not be folded away");
assertTrue(coalesceHasNoDuplicateArgs(optimized), "COALESCE duplicate arguments should be removed");
assertTrue(lambdaBodyOptimized(optimized), "Lambda body should be optimized");
assertFalse(hasDuplicateSubtrees(optimized), "Subtrees should be de-duplicated");
}
```
Implement the helper methods (`containsLiteral`, `containsCall`, `coalesceHasNoDuplicateArgs`, `lambdaBodyOptimized`, `hasDuplicateSubtrees`) as private methods in this test class, using the existing row-expression visitor utilities from the test base (`AbstractTestExpressionInterpreter` usually has helpers to walk `RowExpression` trees).
3. **Add tests for variable resolver behavior**
Add a test ensuring constants are resolved and unknown variables are left unresolved:
```java
@Test
public void testNativeExpressionOptimizerVariableResolver()
{
// Assume `x` is bound to a constant and `y` is unbound
Map<String, Object> bindings = ImmutableMap.of("x", 5);
VariableResolver resolver = new MapBackedVariableResolver(bindings);
NativeExpressionOptimizer optimizerWithResolver = new NativeExpressionOptimizer(
optimizerManager,
/* domain translator */ new RowExpressionDomainTranslator(metadata.getFunctionAndTypeManager()),
resolver,
metadata);
RowExpression expression = translator.translate("x + y + 1", metadata);
RowExpression optimized = optimizerWithResolver.optimize(expression);
// Expect `x` and literal `1` to be folded, but `y` stays a variable
assertTrue(containsVariable(optimized, "y"), "Unresolved variable y should remain");
assertFalse(containsVariable(optimized, "x"), "Resolved variable x should be folded away");
assertTrue(containsLiteral(optimized, 6L), "x + 1 should fold to 6");
}
```
Again, implement `MapBackedVariableResolver` and `containsVariable` as private helpers in the test (or re-use existing infrastructure if there is one in the module).
4. **Add tests for no double-optimization / subtree de-duplication**
Add a test that constructs an expression with repeated subtrees and verifies optimizer sees/optimizes them once and doesn’t re-insert duplicates:
```java
@Test
public void testNativeExpressionOptimizerNoDoubleOptimization()
{
// repeated subtree: (1 + 2) appears twice
RowExpression expression = translator.translate("(1 + 2) + (1 + 2)", metadata);
RowExpression optimized = nativeExpressionOptimizer.optimize(expression);
// Optimizer should fold (1 + 2) once and reuse the result instead of re-optimizing
assertTrue(countLiteral(optimized, 3L) <= 2, "3 should not appear excessively due to double-optimization");
assertFalse(hasDuplicateSubtrees(optimized), "Subtrees should be de-duplicated");
}
```
`countLiteral` and `hasDuplicateSubtrees` reuse the visitors from step 2.
5. **Add tests for `NativeSidecarExpressionInterpreter.optimizeBatch`**
Add a test to verify successful batch optimization and mapping back to the original expressions:
```java
@Test
public void testOptimizeBatchSuccess()
{
List<RowExpression> expressions = ImmutableList.of(
translator.translate("1 + 2", metadata),
translator.translate("3 * 4", metadata),
translator.translate("coalesce(1, 2)", metadata));
List<RowExpressionOptimizationResult> results =
sidecarExpressionInterpreter.optimizeBatch(expressions);
assertEquals(results.size(), expressions.size());
// All entries should be successful and match original order
for (int i = 0; i < expressions.size(); i++) {
RowExpressionOptimizationResult result = results.get(i);
assertTrue(result.getException().isEmpty(), "Expected no exception at index " + i);
assertNotNull(result.getOptimizedExpression(), "Optimized expression should be present at index " + i);
assertTrue(isConstant(result.getOptimizedExpression()),
"Expressions should be constant folded at index " + i);
}
}
```
Implement or reuse `isConstant` (true if `RowExpression` is a literal) using existing test utilities.
6. **Add tests for `NativeSidecarExpressionInterpreter.rePackageExceptions`**
Add a test that simulates multiple failures in `RowExpressionOptimizationResult` and verifies they are wrapped into a `PrestoException` with suppressed causes:
```java
@Test
public void testRePackageExceptionsAggregatesFailures()
{
RowExpression expr1 = translator.translate("1 / 0", metadata);
RowExpression expr2 = translator.translate("unknown_function()", metadata);
RowExpression expr3 = translator.translate("1 + 1", metadata);
RuntimeException ex1 = new RuntimeException("division by zero");
RuntimeException ex2 = new RuntimeException("unknown function");
List<RowExpressionOptimizationResult> results = ImmutableList.of(
RowExpressionOptimizationResult.failure(expr1, ex1),
RowExpressionOptimizationResult.failure(expr2, ex2),
RowExpressionOptimizationResult.success(expr3, expr3));
PrestoException aggregated = sidecarExpressionInterpreter.rePackageExceptions(results);
assertNotNull(aggregated);
assertEquals(aggregated.getErrorCode(), SOME_EXPECTED_ERROR_CODE); // use the real error code constant
Throwable[] suppressed = aggregated.getSuppressed();
assertEquals(suppressed.length, 2);
assertEquals(suppressed[0].getMessage(), "division by zero");
assertEquals(suppressed[1].getMessage(), "unknown function");
}
```
If `rePackageExceptions` returns `null`/`Optional`/throws directly instead, adjust assertions accordingly. The important part is: multiple `RowExpressionOptimizationResult` failures lead to a single `PrestoException` that contains the individual failures.
7. **Add a test for the mixed success/failure path in `optimizeBatch` + `rePackageExceptions`**
Add a test that runs `optimizeBatch` with a `TestingNativeSidecarExpressionClient` that intentionally fails on some expressions and succeeds on others, then calls `rePackageExceptions` and asserts:
- successes are still usable,
- failures are aggregated into a single `PrestoException` with multiple suppressed exceptions.
For example:
```java
@Test
public void testOptimizeBatchWithPartialFailures()
{
TestingNativeSidecarExpressionClient client = new TestingNativeSidecarExpressionClient()
.failOnIndex(1)
.failOnIndex(3);
NativeSidecarExpressionInterpreter interpreter = new NativeSidecarExpressionInterpreter(
nativeExpressionOptimizer,
client,
new NativeSidecarConfig().setMaxBatchSize(16));
List<RowExpression> expressions = ImmutableList.of(
translator.translate("1 + 2", metadata),
translator.translate("1 / 0", metadata),
translator.translate("3 * 4", metadata),
translator.translate("unknown_function()", metadata));
List<RowExpressionOptimizationResult> results = interpreter.optimizeBatch(expressions);
// Validate successes / failures by index
assertTrue(results.get(0).getException().isEmpty());
assertTrue(results.get(2).getException().isEmpty());
assertTrue(results.get(1).getException().isPresent());
assertTrue(results.get(3).getException().isPresent());
PrestoException aggregated = interpreter.rePackageExceptions(results);
assertNotNull(aggregated);
assertEquals(aggregated.getSuppressed().length, 2);
}
```
You will need a small `TestingNativeSidecarExpressionClient` in the test code (either in this file as a private static class or in a dedicated test helper file) that implements whatever sidecar client interface `NativeSidecarExpressionInterpreter` uses and allows configuring which indexes fail.
Once you wire these tests into `TestNativeExpressionInterpreter` with the real constructor parameters and helper utilities from your codebase, you will have coverage for:
- `NativeExpressionOptimizer` expression collection and eligibility (deterministic vs non-deterministic, COALESCE de-duplication, logical short-circuit structure, lambdas, nested lambdas).
- Variable resolver behavior.
- Subtree de-duplication (no double-optimization).
- `NativeSidecarExpressionInterpreter.optimizeBatch` success and mixed success/failure batches.
- `NativeSidecarExpressionInterpreter.rePackageExceptions` and the aggregated `PrestoException` with suppressed causes.
Adjust names, constructors, and error-code constants to the actual classes in `com.facebook.presto.sidecar.expressions`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/sql/relational/RowExpressionOptimizer.java
Show resolved
Hide resolved
The check is simple, if level is |
|
@sourcery-ai review |
SourceryAI
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/relational/RowExpressionOptimizer.java:57-58` </location>
<code_context>
- return null;
- }
- }
+ RowExpressionInterpreter interpreter = new RowExpressionInterpreter(expression, functionAndTypeManager, session, level);
+ return toRowExpression(expression.getSourceLocation(), interpreter.optimize(variableResolver::apply), expression.getType());
}
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against null variableResolver in optimize(...) overload to avoid NPEs.
The previous implementation tolerated a nullable `variableResolver` and forwarded `null` to `RowExpressionInterpreter.optimize(...)`. The new code always calls `variableResolver::apply`, which will NPE if any existing caller still passes `null` (likely, given the prior behavior and lack of `@Nonnull`). Please either fail fast with `requireNonNull(variableResolver, ...)` or restore null-safe handling (e.g., conditional lambda vs `null`) to avoid a regression.
</issue_to_address>
### Comment 2
<location> `presto-native-sidecar-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java:113-117` </location>
<code_context>
+ // The native endpoint only supports optimizer levels OPTIMIZED or EVALUATED.
+ // In the sidecar, SERIALIZABLE is effectively the same as OPTIMIZED,
+ // so if SERIALIZABLE is requested, we use OPTIMIZED instead.
+ replacements.putAll(
+ rowExpressionInterpreterService.optimizeBatch(
+ session,
+ expressions,
+ level.ordinal() < OPTIMIZED.ordinal() ? OPTIMIZED : level));
+ }
+
</code_context>
<issue_to_address>
**issue (bug_risk):** The optimizer level mapping to the sidecar doesn’t match the comment and may pass unsupported levels.
The mapping only adjusts levels *below* `OPTIMIZED`, so any level like `SERIALIZABLE` with an ordinal above `OPTIMIZED` will be forwarded unchanged, which contradicts the comment and may violate the sidecar’s supported levels. An explicit mapping would be safer, e.g. treat `SERIALIZABLE` and anything below `OPTIMIZED` as `OPTIMIZED`, allow only `OPTIMIZED`/`EVALUATED` as-is, and reject anything else.
</issue_to_address>
### Comment 3
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:397-400` </location>
<code_context>
+ RowExpressionOptimizationResult response = optimize(expression, level);
</code_context>
<issue_to_address>
**suggestion (testing):** Test should assert `expressionFailureInfo` is non-null instead of its message being non-null
In `optimizeRowExpression`, these assertions are meant to verify the successful optimization path has no failure. Asserting on `response.getExpressionFailureInfo().getMessage()` being non-null and then empty is misleading, and the earlier comment referenced the struct itself as non-null. To better reflect the protocol and match `assertEvaluateFails`, assert that `getExpressionFailureInfo()` is non-null, and then that its message is empty:
```java
assertNotNull(response.getExpressionFailureInfo());
assertTrue(response.getExpressionFailureInfo().getMessage().isEmpty());
```
Suggested implementation:
```java
{
expression = expression.accept(visitor, null);
RowExpressionOptimizationResult response = optimize(expression, level);
assertNotNull(response.getExpressionFailureInfo());
assertTrue(response.getExpressionFailureInfo().getMessage().isEmpty());
```
1. If this test method already contains assertions about `response.getExpressionFailureInfo().getMessage()` being non-null, those should be removed to avoid contradictory expectations.
2. Ensure that `assertNotNull` and `assertTrue` are imported from the same assertion framework used elsewhere in the file (likely `org.testng.Assert.*` for TestNG or `org.junit.jupiter.api.Assertions.*` for JUnit). If they are not already statically imported, add the appropriate static imports at the top of the file.
</issue_to_address>
### Comment 4
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:360-362` </location>
<code_context>
- log.error(e, "Failed to decode RowExpression from sidecar response: %s.", e.getMessage());
- throw new RuntimeException(e);
- }
+ RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
+ assertNotNull(response.getExpressionFailureInfo().getMessage());
+ assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
}
</code_context>
<issue_to_address>
**suggestion (testing):** `assertEvaluateFails` no longer verifies that `optimizedExpression` is null on failure
After the refactor this helper only checks the failure message and never asserts anything about `getOptimizedExpression()`. To keep validating `NativeSidecarExpressionInterpreter`’s failure contract, please also assert that `response.getOptimizedExpression()` is null (e.g., `assertNull(response.getOptimizedExpression(), ...)`) so tests will catch any regression where a partially optimized expression is returned on failure.
Suggested implementation:
```java
RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
assertNotNull(response.getExpressionFailureInfo().getMessage());
assertNull(response.getOptimizedExpression(), format("Expected no optimized expression on failure, but got: %s", response.getOptimizedExpression()));
assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
```
To compile, ensure that `assertNull` is available:
1. If using TestNG static imports, add `import static org.testng.Assert.assertNull;` at the top of the file (or `import static org.testng.Assert.*;` if that pattern is already used).
2. If JUnit is used instead, add the corresponding static import for `assertNull` from the correct Assert class, consistent with the existing assertions.
</issue_to_address>
### Comment 5
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/expressions/TestExpressionOptimizerManager.java:66-70` </location>
<code_context>
manager = new ExpressionOptimizerManager(
pluginNodeManager,
METADATA.getFunctionAndTypeManager(),
+ new JsonCodecRowExpressionSerde(jsonCodec(RowExpression.class)),
directory);
}
</code_context>
<issue_to_address>
**suggestion (testing):** Missing tests for wiring and behavior of the new native expression optimizer factory
`ExpressionOptimizerManager` is now constructed with a `RowExpressionSerde`, but there’s no test that exercises `loadExpressionOptimizerFactory(...)` with an actual factory or verifies that this serde is propagated into the `ExpressionOptimizerContext`.
Please add tests that:
- register a test `ExpressionOptimizerFactory` that captures the `ExpressionOptimizerContext`,
- call `manager.loadExpressionOptimizerFactory(...)`, and
- assert that `context.getRowExpressionSerde()` is non-null and is the same instance passed to `ExpressionOptimizerManager`.
It would also be valuable to add an end-to-end test that registers the real `NativeExpressionOptimizerFactory` and confirms it can be instantiated and used to optimize a simple expression.
Suggested implementation:
```java
package com.facebook.presto.sql.expressions;
import com.facebook.airlift.json.JsonCodec;
import com.facebook.presto.Session;
import com.facebook.presto.metadata.FunctionAndTypeManager;
import com.facebook.presto.metadata.InternalNodeManager;
import com.facebook.presto.spi.Plugin;
import com.facebook.presto.spi.relation.ExpressionOptimizer;
import com.facebook.presto.spi.relation.ExpressionOptimizerContext;
import com.facebook.presto.spi.relation.ExpressionOptimizerFactory;
import com.facebook.presto.spi.relation.RowExpression;
import com.facebook.presto.sql.relational.JsonCodecRowExpressionSerde;
import com.facebook.presto.sql.relational.RowExpressionSerde;
import com.facebook.presto.testing.LocalQueryRunner;
import com.facebook.presto.testing.TestingSession;
import org.testng.annotations.AfterClass;
import org.testng.annotations.BeforeClass;
import org.testng.annotations.Test;
import java.nio.file.Path;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.atomic.AtomicReference;
import static com.facebook.airlift.json.JsonCodec.jsonCodec;
import static com.facebook.presto.spi.relation.ExpressionOptimizer.Level.OPTIMIZED;
import static com.facebook.presto.sql.relational.Expressions.constant;
import static com.facebook.presto.testing.TestingSession.testSessionBuilder;
import static java.nio.file.Files.createTempDirectory;
import static org.testng.Assert.assertNotNull;
import static org.testng.Assert.assertSame;
public class TestExpressionOptimizerManager
{
private LocalQueryRunner queryRunner;
private InternalNodeManager pluginNodeManager;
private FunctionAndTypeManager functionAndTypeManager;
private Path directory;
private RowExpressionSerde rowExpressionSerde;
private ExpressionOptimizerManager manager;
@BeforeClass
public void setUp()
throws Exception
{
Session session = testSessionBuilder()
.setCatalog("tpch")
.setSchema("tiny")
.build();
this.queryRunner = LocalQueryRunner.create(session);
this.pluginNodeManager = queryRunner.getNodeManager();
this.functionAndTypeManager = queryRunner.getMetadata().getFunctionAndTypeManager();
this.directory = createTempDirectory("expression-optimizer-manager-test");
this.rowExpressionSerde = new JsonCodecRowExpressionSerde(jsonCodec(RowExpression.class));
this.manager = new ExpressionOptimizerManager(
pluginNodeManager,
functionAndTypeManager,
rowExpressionSerde,
directory.toFile());
}
@AfterClass(alwaysRun = true)
public void tearDown()
{
if (queryRunner != null) {
queryRunner.close();
}
}
@Test
public void testRowExpressionSerdePropagatedToContext()
{
// Arrange: capture the context passed into the factory
AtomicReference<ExpressionOptimizerContext> capturedContext = new AtomicReference<>();
ExpressionOptimizerFactory capturingFactory = new ExpressionOptimizerFactory()
{
@Override
public String getName()
{
return "capturing";
}
@Override
public ExpressionOptimizer create(Session session, ExpressionOptimizerContext context)
{
capturedContext.set(context);
// return a no-op optimizer; the behavior is not important for this test
return (sess, expression, expectedType, level) -> expression;
}
};
// Act: register the factory and trigger creation of an optimizer
manager.loadExpressionOptimizerFactory(capturingFactory);
Session session = TestingSession.testSessionBuilder()
.setCatalog("tpch")
.setSchema("tiny")
.build();
// This call should cause the manager to create an optimizer using the factory,
// which in turn should capture the ExpressionOptimizerContext.
ExpressionOptimizer optimizer = manager.getExpressionOptimizer("capturing", session);
assertNotNull(optimizer, "ExpressionOptimizer should be created");
ExpressionOptimizerContext context = capturedContext.get();
assertNotNull(context, "ExpressionOptimizerContext should be captured");
assertNotNull(context.getRowExpressionSerde(), "RowExpressionSerde in context should not be null");
assertSame(
context.getRowExpressionSerde(),
rowExpressionSerde,
"RowExpressionSerde in context should be the same instance passed to ExpressionOptimizerManager");
}
@Test
public void testNativeExpressionOptimizerFactoryEndToEnd()
{
// Arrange: register the real native factory
NativeExpressionOptimizerFactory nativeFactory = new NativeExpressionOptimizerFactory();
manager.loadExpressionOptimizerFactory(nativeFactory);
Session session = TestingSession.testSessionBuilder()
.setCatalog("tpch")
.setSchema("tiny")
.build();
// A very simple expression: a constant boolean
RowExpression expression = constant(true, com.facebook.presto.spi.type.BooleanType.BOOLEAN);
// Act: obtain an optimizer from the native factory and optimize the expression
ExpressionOptimizer optimizer = manager.getExpressionOptimizer(nativeFactory.getName(), session);
RowExpression optimized = optimizer.optimize(
session,
expression,
com.facebook.presto.spi.type.BooleanType.BOOLEAN,
OPTIMIZED);
// Assert: we at least get a non-null result that is usable; the specific
// shape is less important than verifying that wiring and instantiation work.
assertNotNull(optimized, "Optimized expression should not be null");
}
}
```
The exact APIs of `ExpressionOptimizerManager`, `ExpressionOptimizerFactory`, `ExpressionOptimizerContext`, and `NativeExpressionOptimizerFactory` may differ slightly in your codebase. To adapt this:
1. Update the `loadExpressionOptimizerFactory(...)` and `getExpressionOptimizer(...)` invocations to match the actual method signatures used in `ExpressionOptimizerManager`.
2. Adjust the `ExpressionOptimizerFactory` anonymous implementation to implement the correct methods and signatures (for example, if `create(...)` has different parameters).
3. Fix imports if the classes live in slightly different packages (e.g., `NativeExpressionOptimizerFactory`, `ExpressionOptimizerManager`, or `ExpressionOptimizerContext`).
4. If `ExpressionOptimizerContext` exposes the row-expression serde via a different accessor than `getRowExpressionSerde()`, update the assertion accordingly.
5. If the manager constructor takes slightly different arguments (e.g., a `File` vs. `Path`, or additional dependencies), adjust the `setUp()` method to pass the correct values while still constructing the `JsonCodecRowExpressionSerde` once and reusing the same instance in the assertions.
</issue_to_address>
### Comment 6
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/NativeSidecarPluginQueryRunnerUtils.java:53-55` </location>
<code_context>
queryRunner.loadTypeManager(NativeTypeManagerFactory.NAME);
queryRunner.loadPlanCheckerProviderManager("native", ImmutableMap.of());
+ queryRunner.getExpressionManager().loadExpressionOptimizerFactory(NativeExpressionOptimizerFactory.NAME, "native", ImmutableMap.of());
queryRunner.installPlugin(new NativeSqlInvokedFunctionsPlugin());
}
</code_context>
<issue_to_address>
**suggestion (testing):** Add an integration-style test that verifies the native expression optimizer is actually used by the engine
The helper now loads the native sidecar expression optimizer, but there’s no test confirming it’s actually used in end-to-end planning/optimization.
Please add an integration-style test (here or in a native-sidecar test) that builds a `QueryRunner` via `NativeSidecarPluginQueryRunner` with the native optimizer loaded, runs a query or planning pipeline on an expression the native sidecar optimizes (e.g., a constant-folding case from `TestNativeExpressionInterpreter`), and asserts that the native optimizer is selected (by name) or that the plan/expression reflects the native-optimized form. This will verify the configuration has the intended effect.
Suggested implementation:
```java
package com.facebook.presto.sidecar;
import com.facebook.presto.scalar.sql.NativeSqlInvokedFunctionsPlugin;
import com.facebook.presto.sidecar.expressions.NativeExpressionOptimizerFactory;
import com.facebook.presto.sidecar.functionNamespace.NativeFunctionNamespaceManagerFactory;
import com.facebook.presto.sidecar.sessionpropertyproviders.NativeSystemSessionPropertyProviderFactory;
import com.facebook.presto.sidecar.typemanager.NativeTypeManagerFactory;
import com.facebook.presto.testing.QueryRunner;
import org.testng.annotations.Test;
import static org.testng.Assert.assertTrue;
```
```java
queryRunner.loadTypeManager(NativeTypeManagerFactory.NAME);
queryRunner.loadPlanCheckerProviderManager("native", ImmutableMap.of());
queryRunner.getExpressionManager().loadExpressionOptimizerFactory(NativeExpressionOptimizerFactory.NAME, "native", ImmutableMap.of());
queryRunner.installPlugin(new NativeSqlInvokedFunctionsPlugin());
}
/**
* Integration-style test that verifies the native expression optimizer is actually
* registered and visible through the engine when the QueryRunner is created via
* NativeSidecarPluginQueryRunnerUtils.
*
* The test intentionally goes through the same helper used by other tests instead
* of constructing its own QueryRunner to ensure we are validating the real wiring.
*/
@Test
public void testNativeExpressionOptimizerIsLoadedAndVisible()
throws Exception
{
// Build a QueryRunner via the helper so we exercise the same configuration
// path that production-style tests use.
try (QueryRunner queryRunner = createNativeSidecarQueryRunner()) {
// Verify that the native expression optimizer factory is registered with the
// ExpressionManager under the expected name. This confirms that the optimizer
// is visible to the engine's planning / optimization pipeline, not just
// that the factory class exists on the classpath.
assertTrue(
queryRunner.getExpressionManager()
.getExpressionOptimizerFactoryNames()
.contains(NativeExpressionOptimizerFactory.NAME),
"Native expression optimizer factory should be registered and visible to the engine");
}
}
}
```
The above edit assumes the following existing structure and APIs, which you may need to adjust to match the actual codebase:
1. **Helper method name and signature**
- The test calls `createNativeSidecarQueryRunner()`. If your helper method has a different name or signature (e.g., `createNativeSidecarPluginQueryRunner()` or it takes parameters), update the call accordingly.
2. **ExpressionManager API**
- The test uses `queryRunner.getExpressionManager().getExpressionOptimizerFactoryNames()` which is a placeholder for a method that exposes the registered optimizer factory names as a `Set<String>` or similar.
- If your `ExpressionManager` exposes this information differently (e.g., `getExpressionOptimizerFactories().keySet()` or another accessor), change the assertion to use the real method while still checking that `NativeExpressionOptimizerFactory.NAME` is present.
3. **Imports**
- Ensure `ImmutableMap` is already imported in this file (it appears to be used in the snippet). If not, add `import com.google.common.collect.ImmutableMap;`.
- If your `QueryRunner` type resides in a different package than `com.facebook.presto.testing.QueryRunner`, adjust the import accordingly.
4. **Test style / framework**
- The test uses TestNG (`@Test`, `assertTrue` from `org.testng.Assert`). If this module uses JUnit instead, switch the annotations and assertions to the appropriate framework.
5. **Stronger behavioral assertion (optional)**
- To more directly assert that the native optimizer is *used* (not just registered), you can extend the test to:
- Run a query or planning pipeline on a constant-folding expression (e.g., `SELECT 1 + 2`).
- Inspect the resulting plan or expression to confirm it is in the optimized form (e.g., `3` instead of `1 + 2`), using the same patterns as in `TestNativeExpressionInterpreter`.
- Implement this by:
- Importing the plan / expression inspection utilities already used in other tests.
- Adding assertions after the registration check to verify the optimized form is present in the plan.
</issue_to_address>Hi @pdabre12! 👋
Thanks for trying out Sourcery by commenting with @sourcery-ai review! 🚀
Install the sourcery-ai bot to get automatic code reviews on every pull request ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/sql/relational/RowExpressionOptimizer.java
Show resolved
Hide resolved
...-plugin/src/main/java/com/facebook/presto/sidecar/expressions/NativeExpressionOptimizer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes and found some issues that need to be addressed.
- NativeExpressionOptimizer implements ExpressionOptimizer but only overrides the 4-argument optimize method; you should also implement the 3-argument optimize(RowExpression, Level, ConnectorSession) and delegate to the 4-argument version to avoid compilation issues and ensure consistent behavior when callers use the simpler API.
- In TestNativeExpressionInterpreter.optimizeRowExpression you assert that expressionFailureInfo.getMessage() is both non-null and empty; if an empty string is the success indicator, consider dropping the assertNotNull or explicitly documenting this convention to make the intent clearer to future readers.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- NativeExpressionOptimizer implements ExpressionOptimizer but only overrides the 4-argument optimize method; you should also implement the 3-argument optimize(RowExpression, Level, ConnectorSession) and delegate to the 4-argument version to avoid compilation issues and ensure consistent behavior when callers use the simpler API.
- In TestNativeExpressionInterpreter.optimizeRowExpression you assert that expressionFailureInfo.getMessage() is both non-null and empty; if an empty string is the success indicator, consider dropping the assertNotNull or explicitly documenting this convention to make the intent clearer to future readers.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/relational/RowExpressionOptimizer.java:57-58` </location>
<code_context>
- return null;
- }
- }
+ RowExpressionInterpreter interpreter = new RowExpressionInterpreter(expression, functionAndTypeManager, session, level);
+ return toRowExpression(expression.getSourceLocation(), interpreter.optimize(variableResolver::apply), expression.getType());
}
}
</code_context>
<issue_to_address>
**issue (bug_risk):** Guard against null variableResolver to avoid NullPointerException.
The `ExpressionOptimizer` contract doesn’t ensure `variableResolver` is non-null, and the previous `getRowExpression` handled a nullable resolver. With the new `variableResolver::apply` call, a null `variableResolver` will cause an NPE. Either enforce non-null with a precondition or preserve explicit handling of the null case (e.g., call `interpreter.optimize(null)` when the resolver is null).
</issue_to_address>
### Comment 2
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:360-362` </location>
<code_context>
- log.error(e, "Failed to decode RowExpression from sidecar response: %s.", e.getMessage());
- throw new RuntimeException(e);
- }
+ RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
+ assertNotNull(response.getExpressionFailureInfo().getMessage());
+ assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
}
</code_context>
<issue_to_address>
**suggestion (testing):** Also assert that failed optimizations do not produce an optimized expression
The earlier helper verified both the error and the absence of an optimized expression. The new version only asserts on the error message. To preserve that contract and catch regressions in sidecar error handling, also assert that `response.getOptimizedExpression()` is null (or equivalent) here.
```suggestion
RowExpressionOptimizationResult response = optimize(rowExpression, ExpressionOptimizer.Level.EVALUATED);
assertNull(response.getOptimizedExpression(), format("Expected no optimized expression for failed optimization, but got: %s", response.getOptimizedExpression()));
assertNotNull(response.getExpressionFailureInfo().getMessage());
assertTrue(response.getExpressionFailureInfo().getMessage().contains(errorMessage), format("Sidecar response: %s did not contain expected error message: %s.", response, errorMessage));
```
</issue_to_address>
### Comment 3
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/expressions/TestNativeExpressionInterpreter.java:397-401` </location>
<code_context>
+ RowExpressionOptimizationResult response = optimize(expression, level);
</code_context>
<issue_to_address>
**nitpick (testing):** Assertion on `expressionFailureInfo` looks inverted and may be misleading
You’re asserting `assertNotNull(response.getExpressionFailureInfo().getMessage());` and then immediately asserting that the message is empty. Per the previous comment that an empty message means "no failure", the key condition is that the message is empty, not that it’s non-null. The null check adds little value and can be misleading. Consider removing `assertNotNull`, or instead asserting on the entire `expressionFailureInfo` object (e.g., object is non-null and its `message` is empty) to better express the test’s intent.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
3846462 to
380776d
Compare
| for (int i = 0; i < rowExpressionOptimizationResults.size(); i++) { | ||
| result.put(originalExpressions.get(i), rowExpressionOptimizationResults.get(i).getOptimizedExpression()); | ||
| } | ||
| return result.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the row expressions inside of originalExpressions unique? If not, you can use result.buildKeepingLast()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are unique, these originalExpressions come from a set variable expressionsToOptimize
|
|
||
| queryRunner.loadTypeManager(NativeTypeManagerFactory.NAME); | ||
| queryRunner.loadPlanCheckerProviderManager("native", ImmutableMap.of()); | ||
| queryRunner.getExpressionManager().loadExpressionOptimizerFactory(NativeExpressionOptimizerFactory.NAME, "native", ImmutableMap.of()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hardcoding "native", you can use NativeExpressionOptimizerFactory.NAME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just went with the above structure, wherein they are hardcoding "native".
The 3-argument optimize method is default, so it automatically delegate to the 4-argument optimize method.
That is a success indicator only for the test, if no message is present we can assume the optimization worked as intended. |
Description
Add a
ExpressionOptimizerwhich delegates to the native sidecar process to evaluate expressions with Velox.Motivation and Context
#26475 added support for an endpoint in the sidecar for constant folding expressions. This follows up on that by adding an expression interpreter to call that endpoint.
For more context: RFC-0006.
Impact
No impact by default as the old in-memory evaluation is the default.
Test Plan
Tests have been added.
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.