Skip to content

feat: Add optimizer to rewrite functions in query plan#26859

Merged
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:export-D89703780
Jan 9, 2026
Merged

feat: Add optimizer to rewrite functions in query plan#26859
feilong-liu merged 1 commit intoprestodb:masterfrom
feilong-liu:export-D89703780

Conversation

@feilong-liu
Copy link
Copy Markdown
Contributor

@feilong-liu feilong-liu commented Dec 24, 2025

Description

We have an internal use case where we want to rewrite functions, where both rewrite from and rewrite to functions are internal functions.
This PR adds a new function rewrite optimizer, which use an instance of expression optimizer to do the expression rewrite, the name of the expression optimizer is specified with session property expression_optimizer_in_row_expression_rewrite which defaults to disabled when it's empty.

Motivation and Context

To do function rewrite with internal functions.

Impact

As in description

Test Plan

Unit tests and local end to end test

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General
* Add an optimizer which can do function rewrite

@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Dec 24, 2025
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Dec 24, 2025

Reviewer's Guide

Introduces a pluggable function-call rewriting optimization framework, including a new SPI interface, central manager, optimizer, session/config flags, and wiring into the query planner and test/query-runner infrastructure.

Sequence diagram for function call rewriting during query optimization

sequenceDiagram
    actor User
    participant Client
    participant Coordinator
    participant PlanOptimizers
    participant FunctionCallRewritingOptimizer as FuncCallRewritingOpt
    participant PlanRewriter
    participant CallExpressionRewriter as CallExprRewriter
    participant FunctionCallRewriterImpl as FuncCallRewriter

    User->>Client: Submit SQL query
    Client->>Coordinator: Execute query
    Coordinator->>PlanOptimizers: buildOptimizedPlan(session, query)

    Note over Coordinator,PlanOptimizers: PlanOptimizers pipeline includes FunctionCallRewritingOptimizer

    PlanOptimizers->>FuncCallRewritingOpt: optimize(plan, session, types, variableAllocator, idAllocator, warningCollector)
    FuncCallRewritingOpt->>SystemSessionProperties: isFunctionCallRewriterEnabled(session)
    SystemSessionProperties-->>FuncCallRewritingOpt: enabled or disabled

    alt function call rewriter disabled
        FuncCallRewritingOpt-->>PlanOptimizers: PlanOptimizerResult(plan, changed=false)
    else enabled
        FuncCallRewritingOpt->>FunctionCallRewriterManager: getAllRewriters()
        FunctionCallRewriterManager-->>FuncCallRewritingOpt: Set(FunctionCallRewriter)

        alt no registered rewriters
            FuncCallRewritingOpt-->>PlanOptimizers: PlanOptimizerResult(plan, changed=false)
        else rewriters present
            FuncCallRewritingOpt->>PlanRewriter: new PlanRewriter(session.toConnectorSession(), variableAllocator, idAllocator, functionResolution, rewriters)
            FuncCallRewritingOpt->>PlanRewriter: rewrite(plan)

            loop for each Project and Filter node
                PlanRewriter->>CallExprRewriter: rewrite expressions
                loop for each CallExpression
                    CallExprRewriter->>FuncCallRewriter: rewrite(call, session, variableAllocator, idAllocator, functionResolution)
                    alt rewriter returns new expression
                        FuncCallRewriter-->>CallExprRewriter: Optional(RowExpression)
                        CallExprRewriter-->>PlanRewriter: rewritten RowExpression
                    else not handled
                        FuncCallRewriter-->>CallExprRewriter: Optional.empty
                    end
                end
            end

            PlanRewriter-->>FuncCallRewritingOpt: rewrittenPlan
            FuncCallRewritingOpt-->>PlanOptimizers: PlanOptimizerResult(rewrittenPlan, changed=true or false)
        end
    end

    PlanOptimizers-->>Coordinator: final optimized plan
    Coordinator-->>Client: query results
    Client-->>User: results displayed
Loading

Class diagram for new function call rewriting framework

classDiagram
    direction LR

    class PlanOptimizer {
      <<interface>>
      +optimize(plan, session, types, variableAllocator, idAllocator, warningCollector) PlanOptimizerResult
    }

    class RowExpressionRewriter {
      <<abstract>>
    }

    class SimplePlanRewriter {
      <<abstract>>
      +rewriteWith(rewriter, plan) PlanNode
    }

    class StandardFunctionResolution

    class FunctionCallRewriter {
      <<interface>>
      +rewrite(call, session, variableAllocator, idAllocator, functionResolution) Optional~RowExpression~
    }

    class FunctionCallRewriterManager {
      -Set~FunctionCallRewriter~ rewriters
      +FunctionCallRewriterManager()
      +addRewriter(rewriter) void
      +removeRewriter(rewriter) void
      +getAllRewriters() Set~FunctionCallRewriter~
    }

    class FunctionCallRewritingOptimizer {
      -Supplier~Set~FunctionCallRewriter~~ rewritersSupplier
      -StandardFunctionResolution functionResolution
      +FunctionCallRewritingOptimizer(rewritersSupplier, functionResolution)
      +optimize(plan, session, types, variableAllocator, idAllocator, warningCollector) PlanOptimizerResult
    }

    class PlanRewriter {
      -ConnectorSession session
      -VariableAllocator variableAllocator
      -PlanNodeIdAllocator idAllocator
      -StandardFunctionResolution functionResolution
      -Set~FunctionCallRewriter~ rewriters
      -CallExpressionRewriter callRewriter
      -boolean planChanged
      +PlanRewriter(session, variableAllocator, idAllocator, functionResolution, rewriters)
      +isPlanChanged() boolean
      +visitProject(node, context) PlanNode
      +visitFilter(node, context) PlanNode
    }

    class CallExpressionRewriter {
      -ConnectorSession session
      -Set~FunctionCallRewriter~ rewriters
      -VariableAllocator variableAllocator
      -PlanNodeIdAllocator idAllocator
      -StandardFunctionResolution functionResolution
      -boolean hasRewritten
      +CallExpressionRewriter(session, rewriters, variableAllocator, idAllocator, functionResolution)
      +hasRewritten() boolean
      +rewriteCall(node, unused, treeRewriter) RowExpression
    }

    class FeaturesConfig {
      -boolean enableFunctionCallRewriter
      +setEnableFunctionCallRewriter(enableFunctionCallRewriter) FeaturesConfig
      +isEnableFunctionCallRewriter() boolean
    }

    class SystemSessionProperties {
      +String ENABLE_FUNCTION_CALL_REWRITER
      +isFunctionCallRewriterEnabled(session) boolean
    }

    class PlanOptimizers {
      +PlanOptimizers(..., FunctionCallRewriterManager functionCallRewriterManager, ...)
    }

    class CoordinatorModule {
      +setup(binder) void
    }

    class LocalQueryRunner {
      +getPlanOptimizers(noExchange, nativeExecutionEnabled) List~PlanOptimizer~
    }

    FunctionCallRewritingOptimizer ..|> PlanOptimizer
    PlanRewriter ..|> SimplePlanRewriter
    CallExpressionRewriter ..|> RowExpressionRewriter

    FunctionCallRewritingOptimizer o-- PlanRewriter
    PlanRewriter o-- CallExpressionRewriter
    PlanRewriter o-- FunctionCallRewriter
    CallExpressionRewriter o-- FunctionCallRewriter

    FunctionCallRewriterManager o-- FunctionCallRewriter
    FunctionCallRewritingOptimizer --> FunctionCallRewriterManager : uses via Supplier

    FeaturesConfig --> SystemSessionProperties : provides default
    SystemSessionProperties --> FunctionCallRewritingOptimizer : enables via session flag

    PlanOptimizers --> FunctionCallRewritingOptimizer : constructs and registers
    CoordinatorModule --> FunctionCallRewriterManager : bind singleton
    LocalQueryRunner --> FunctionCallRewriterManager : create for tests
Loading

File-Level Changes

Change Details Files
Add session/config flag and plumbing to enable or disable the function call rewriter optimizer.
  • Introduce enableFunctionCallRewriter field, config key optimizer.enable-function-rewriter, and getter/setter on FeaturesConfig with default false and tests for defaults and explicit mappings.
  • Add ENABLE_FUNCTION_CALL_REWRITER system property, session-exposed boolean property with description, and isFunctionCallRewriterEnabled(Session) helper in SystemSessionProperties.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/FeaturesConfig.java
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestFeaturesConfig.java
Introduce a pluggable SPI for function-call rewrites and a central registry to manage rewriters.
  • Define the FunctionCallRewriter SPI interface with a rewrite(CallExpression, ...) method returning an Optional.
  • Implement FunctionCallRewriterManager as a thread-safe registry with add/remove and getAllRewriters methods, designed for injection.
presto-spi/src/main/java/com/facebook/presto/spi/function/FunctionCallRewriter.java
presto-main-base/src/main/java/com/facebook/presto/sql/planner/FunctionCallRewriterManager.java
Add a new plan optimizer that applies registered function call rewriters to project and filter expressions when enabled by session property.
  • Implement FunctionCallRewritingOptimizer that gates on isFunctionCallRewriterEnabled, pulls rewriters from FunctionCallRewriterManager, and traverses the plan to rewrite CallExpression nodes using RowExpressionTreeRewriter.
  • Implement nested PlanRewriter and CallExpressionRewriter to rewrite ProjectNode assignments and FilterNode predicates, tracking whether any rewrites occurred to report plan changes.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/FunctionCallRewritingOptimizer.java
Wire the function call rewriter manager and optimizer into the planner, coordinator, and test/query-runner infrastructure.
  • Extend PlanOptimizers constructors to accept a FunctionCallRewriterManager, pass it through to the internal constructor, and register a new FunctionCallRewritingOptimizer using the manager and FunctionResolution based on the FunctionAndTypeManager.
  • Register FunctionCallRewriterManager as a singleton in CoordinatorModule so it can be injected into PlanOptimizers.
  • Update LocalQueryRunner, TestPrestoNativeBuiltInFunctions, and AbstractTestQueryFramework to construct PlanOptimizers with a new FunctionCallRewriterManager instance.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/PlanOptimizers.java
presto-main/src/main/java/com/facebook/presto/server/CoordinatorModule.java
presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeBuiltInFunctions.java
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueryFramework.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@feilong-liu feilong-liu changed the title Add code for function rewrite feature: Add optimizer to rewrite functions in query plan Dec 24, 2025
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In CallExpressionRewriter.rewriteCall, returning null when no rewriter applies is risky; consider delegating to treeRewriter.defaultRewrite(node, unused) or returning the original node so the tree rewriter doesn't see unexpected nulls and still rewrites child expressions.
  • Because CallExpressionRewriter.hasRewritten is never reset, visitProject/visitFilter may rebuild nodes and mark the plan as changed for later nodes even when no rewrite occurred there; consider resetting this flag per node (or tracking rewrite at a finer granularity) to avoid unnecessary plan node recreation.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `CallExpressionRewriter.rewriteCall`, returning `null` when no rewriter applies is risky; consider delegating to `treeRewriter.defaultRewrite(node, unused)` or returning the original node so the tree rewriter doesn't see unexpected nulls and still rewrites child expressions.
- Because `CallExpressionRewriter.hasRewritten` is never reset, `visitProject`/`visitFilter` may rebuild nodes and mark the plan as changed for later nodes even when no rewrite occurred there; consider resetting this flag per node (or tracking rewrite at a finer granularity) to avoid unnecessary plan node recreation.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/planner/optimizations/FunctionCallRewritingOptimizer.java:136-140` </location>
<code_context>
+                            Map.Entry::getKey,
+                            e -> RowExpressionTreeRewriter.rewriteWith(callRewriter, e.getValue())));
+
+            if (!callRewriter.hasRewritten() && newSource == node.getSource()) {
+                return node;
+            }
+
+            planChanged = true;
+
+            return new ProjectNode(
</code_context>

<issue_to_address>
**suggestion (bug_risk):** `hasRewritten` is global across the whole plan, so later nodes may be treated as changed even when they are not.

Because the same `CallExpressionRewriter` instance (and its `hasRewritten` flag) is reused across the whole plan, once any call is rewritten `hasRewritten()` remains `true`. That means this check:

```java
if (!callRewriter.hasRewritten() && newSource == node.getSource()) {
    return node;
}
planChanged = true;
```
will treat all subsequent `ProjectNode`/`FilterNode` instances as changed, even when their subtree is identical (`newSource == node.getSource()`). This makes `planChanged` inaccurate and causes unnecessary node recreation.

Consider either tracking rewrites per node (snapshot `hasRewritten` at method entry and compare after rewriting), or using a fresh `CallExpressionRewriter` per top-level rewrite plus a local per-node "changed" flag.

Suggested implementation:

```java
        public PlanNode visitProject(ProjectNode node, RewriteContext<Void> context)
        {
            PlanNode newSource = context.rewrite(node.getSource(), context.get());
            boolean sourceChanged = newSource != node.getSource();

            Assignments.Builder assignmentsBuilder = Assignments.builder();
            boolean assignmentsChanged = false;
            for (Map.Entry<VariableReferenceExpression, RowExpression> assignment : node.getAssignments().entrySet()) {
                RowExpression original = assignment.getValue();
                RowExpression rewritten = RowExpressionTreeRewriter.rewriteWith(callRewriter, original);

                if (rewritten != original) {
                    assignmentsChanged = true;
                }
                assignmentsBuilder.put(assignment.getKey(), rewritten);
            }
            Assignments newAssignments = assignmentsBuilder.build();

            if (!sourceChanged && !assignmentsChanged) {
                return node;
            }

            planChanged = true;

            return new ProjectNode(
                    node.getSourceLocation(),
                    idAllocator.getNextId(),
                    newSource,
                    newAssignments,
                    node.getLocality());
        }

```

If there is an analogous use of `callRewriter.hasRewritten()` in `visitFilter` (or other visitors) to decide whether a node changed, the same pattern should be applied there: compute a local `*Changed` flag by comparing the rewritten expressions to the originals and combine it with any `sourceChanged` check, instead of relying on the global `hasRewritten()` state.
</issue_to_address>

### Comment 2
<location> `presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestQueryFramework.java:594` </location>
<code_context>
                 new MBeanExporter(new TestingMBeanServer()),
                 splitManager,
                 planOptimizerManager,
+                new FunctionCallRewriterManager(),
                 pageSourceManager,
                 statsCalculator,
</code_context>

<issue_to_address>
**suggestion (testing):** Add an end-to-end test that exercises a custom `FunctionCallRewriter` via the new optimizer and session property

The test harness now wires `PlanOptimizers` with a `FunctionCallRewriterManager`, but there’s no end-to-end coverage that a connector/module `FunctionCallRewriter` actually changes the plan when `enable_function_call_rewriter` is enabled.

Please add an integration test (e.g., in an existing query-framework suite) that:
- Registers a test `FunctionCallRewriter` with `FunctionCallRewriterManager`.
- Runs a query with a function the test rewriter handles.
- Verifies via `EXPLAIN`/plan inspection that the call is rewritten when `enable_function_call_rewriter = true` and unchanged when it is `false`.

Suggested implementation:

```java
import com.facebook.presto.sql.planner.PlanFragmenter;

```

To fully implement the requested end-to-end test, you will need to:

1. **Add a test-only `FunctionCallRewriter` implementation**, e.g. `TestingFunctionCallRewriter`, under `presto-tests`:
   - Implement the `FunctionCallRewriter` interface from `com.facebook.presto.sql.planner`.
   - Have it recognize a test-only function (for example `test_rewriter_func`) and rewrite it to a different expression (e.g. rewrite `test_rewriter_func(x)` to `abs(x)` or some other deterministic function).
   - Ensure the rewrite is easily visible in a plan (e.g., function name changes or argument is modified).

2. **Expose or access the `FunctionCallRewriterManager` from the test harness**:
   - If `AbstractTestQueryFramework` (or the concrete `QueryRunner` implementation used in tests) does not currently expose the `FunctionCallRewriterManager`, add a getter or a hook in the concrete test class that creates the query runner.
   - After creating the `QueryRunner`, register the test rewriter:
     ```java
     queryRunner.getFunctionCallRewriterManager().addRewriter(new TestingFunctionCallRewriter());
     ```
   - Adjust `addRewriter` / registration call to match the actual API of `FunctionCallRewriterManager`.

3. **Create a concrete test class using `AbstractTestQueryFramework`**, for example `TestFunctionCallRewriter`:
   - Extend `AbstractTestQueryFramework`.
   - In `createQueryRunner()`, create a `QueryRunner`, then register the `TestingFunctionCallRewriter` on its `FunctionCallRewriterManager` as described above.
   - Add a test method, e.g.:
     ```java
     @Test
     public void testFunctionCallRewriterEnabledAndDisabled()
     {
         // Session with rewriter disabled
         Session disabled = Session.builder(getSession())
                 .setSystemProperty("enable_function_call_rewriter", "false")
                 .build();

         // Session with rewriter enabled
         Session enabled = Session.builder(getSession())
                 .setSystemProperty("enable_function_call_rewriter", "true")
                 .build();

         String query = "EXPLAIN SELECT test_rewriter_func(42)";

         // Plan when disabled
         MaterializedResult disabledPlan = computeActual(disabled, query);
         String disabledExplain = (String) disabledPlan.getOnlyValue();

         // Plan when enabled
         MaterializedResult enabledPlan = computeActual(enabled, query);
         String enabledExplain = (String) enabledPlan.getOnlyValue();

         // Verify that when disabled, the original function name appears
         assertTrue(disabledExplain.contains("test_rewriter_func"));

         // Verify that when enabled, the function call was rewritten
         // (e.g., to ABS or whatever TestingFunctionCallRewriter produces)
         assertFalse(enabledExplain.contains("test_rewriter_func"));
         assertTrue(enabledExplain.contains("abs")); // adjust to actual rewrite
     }
     ```
   - Use the actual session property name used for this feature (you referenced `enable_function_call_rewriter`, confirm its exact spelling and namespace in the existing session property definitions).

4. **Wire the test function into the execution engine**:
   - If `test_rewriter_func` is not already registered as a function, add a test-only scalar function in the test module (or reuse an existing one) so that parsing succeeds.
   - Ensure the function type signature and rewriter logic are consistent with the metadata and type system.

These additional changes will provide a true end-to-end test that verifies:
- The query harness uses `FunctionCallRewriterManager`.
- A custom `FunctionCallRewriter` actually rewrites a function call in the plan.
- The behavior is gated by the `enable_function_call_rewriter` session property.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +136 to +140
if (!callRewriter.hasRewritten() && newSource == node.getSource()) {
return node;
}

planChanged = true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): hasRewritten is global across the whole plan, so later nodes may be treated as changed even when they are not.

Because the same CallExpressionRewriter instance (and its hasRewritten flag) is reused across the whole plan, once any call is rewritten hasRewritten() remains true. That means this check:

if (!callRewriter.hasRewritten() && newSource == node.getSource()) {
    return node;
}
planChanged = true;

will treat all subsequent ProjectNode/FilterNode instances as changed, even when their subtree is identical (newSource == node.getSource()). This makes planChanged inaccurate and causes unnecessary node recreation.

Consider either tracking rewrites per node (snapshot hasRewritten at method entry and compare after rewriting), or using a fresh CallExpressionRewriter per top-level rewrite plus a local per-node "changed" flag.

Suggested implementation:

        public PlanNode visitProject(ProjectNode node, RewriteContext<Void> context)
        {
            PlanNode newSource = context.rewrite(node.getSource(), context.get());
            boolean sourceChanged = newSource != node.getSource();

            Assignments.Builder assignmentsBuilder = Assignments.builder();
            boolean assignmentsChanged = false;
            for (Map.Entry<VariableReferenceExpression, RowExpression> assignment : node.getAssignments().entrySet()) {
                RowExpression original = assignment.getValue();
                RowExpression rewritten = RowExpressionTreeRewriter.rewriteWith(callRewriter, original);

                if (rewritten != original) {
                    assignmentsChanged = true;
                }
                assignmentsBuilder.put(assignment.getKey(), rewritten);
            }
            Assignments newAssignments = assignmentsBuilder.build();

            if (!sourceChanged && !assignmentsChanged) {
                return node;
            }

            planChanged = true;

            return new ProjectNode(
                    node.getSourceLocation(),
                    idAllocator.getNextId(),
                    newSource,
                    newAssignments,
                    node.getLocality());
        }

If there is an analogous use of callRewriter.hasRewritten() in visitFilter (or other visitors) to decide whether a node changed, the same pattern should be applied there: compute a local *Changed flag by comparing the rewritten expressions to the originals and combine it with any sourceChanged check, instead of relying on the global hasRewritten() state.

@feilong-liu feilong-liu changed the title feature: Add optimizer to rewrite functions in query plan feat: Add optimizer to rewrite functions in query plan Dec 24, 2025
feilong-liu added a commit to feilong-liu/presto that referenced this pull request Dec 24, 2025
* This allows connectors to transform specific function calls into alternative implementations.
*/
public class FunctionCallRewritingOptimizer
implements PlanOptimizer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just do iterative rule for project and filter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code to extend RowExpressionRewriteRuleSet

feilong-liu added a commit to feilong-liu/presto that referenced this pull request Dec 24, 2025
@feilong-liu feilong-liu requested a review from kaikalur December 24, 2025 07:16
Copy link
Copy Markdown
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of creating a very specific FunctionCallRewriter SPI here, can you use the existing and more general ExpressionOptimizer SPI?

feilong-liu added a commit to feilong-liu/presto that referenced this pull request Jan 6, 2026
@feilong-liu
Copy link
Copy Markdown
Contributor Author

Instead of creating a very specific FunctionCallRewriter SPI here, can you use the existing and more general ExpressionOptimizer SPI?

Updated the code to use the ExpressionOptimizer SPI instead of adding new SPI

kaikalur
kaikalur previously approved these changes Jan 6, 2026
/**
* A rule set that rewrites row expressions using a custom expression optimizer.
*/
public class RowExpressionsRewriteOptimizer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this cleanup @feilong-liu, however I'm wondering if we can simplify this further.

To me this seems identical to SimplifyRowExpressions. There's new session properties which are also almost identical to expression_optimizer_name.

What I'm wondering is, can you make expression_optimizer_name accept a cascading list of expression optimizer names, and these optimizers are evaluated in order. That way, if you configure it such that SET SESSION expression_optimizer_name=default,function_rewrite, when we hit existing rules such as SimplifyRowExpressions, they will automatically call your custom optimizer and perform the rewrite. No new session properties needed, and no new optimizer rules needed, just a small change to evaluate the cascading list of expression rewrites.

We could have ExpressionOptimizerManager#getExpressionOptimizer return a simple composite expression optimizer if there's a comma separated list.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan Thanks for the review.

SimplifyRowExpressions is used multiple times in the plan optimizer, and for my case here, it only needs to run once. Also SimplifyRowExpressions has an additional logicalExpressionRewriter rewrite step, which is not needed either. I'd prefer to avoid this additional cost.

Also SimplifyRowExpressions has different purpose than the newly added optimizer. SimplifyRowExpressions is to evaluate and simplify row expressions, while the new optimizer added here is to do function call rewrite. Mixing them together makes is harder to maintain and may complex future usage (for example if future usage case wants to call simplify row expression optimizer, it will also pay the cost of this additional function rewrite which is not needed).

That way, if you configure it such that SET SESSION expression_optimizer_name=default,function_rewrite

I see that expression_optimizer_name is not only used in SimplifyRowExpressions but also used in all usage cases where ExpressionOptimizer is used. If we want to apply this change, we either need to change SimplifyRowExpressions to use a different session property, or run multiple expression optimizers for all other optimizers too, which both are not ideal.

I prefer to keep adding one new optimizer here instead of modifying existing use cases.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thoughts @feilong-liu. I just want to take a minute to talk this through fully to get aligned on the direction, and I appreciate your patient elaboration of your perspective.

I understand that SimplifyRowExpressions is called multiple times. In fact, as you point out, it's used almost pervasively throughout the code when we simplify expressions. The thinking behind this pervasive use of ExressionOptimizerManager to retrieve the expression optimizer is we want just a single entry point to influence how expressions get simplified, so we don't need to think about complex interactions between optimizers rules or when exactly the expression optimizer gets invoked.

Expression optimizers are written similarly as Rules, the expectation is that they're idempotent. Meaning, it doesn't matter if they're called multiple times, they return the same output.

For your use case, where you simply map from one function to another, I imagine that even if this mapping were making some sort of external service call, you only pay this cost once, and any additional cost is just walking the expression tree. Is your concern primarily the cost of walking the in-memory expression tree? My intuition is we don't need to optimize for this cost, as I expect most of what we want to conserve in the coordinator is more on the fanout between queries to tasks and task management (where we can go from 1 query to 10's of thousands of tasks, which update per second and require serialization and deserialization of nested structures), and connection management between clients and the coordinator (where likewise even queries which are not running are regularly heartbeated by clients, which can cause a large fanout on resource constrained clusters), not the fixed per query cost of plan and expression tree traversal.

Regarding your point on the purpose of SimplifyRowExpressions, it's true that this was originally intended to do as it's named, however with pluggable ExpressionOptimizers it's now a bit more generic and we could actually call it something more generic like OptimizerRowExpressions. It does still do logical expression reordering and simplification, but this doesn't really change the logic of the expression, and like all ExpressionOptimizers, it's idempotent.

tl;dr, instead of adding yet another session property and yet another very specific optimizer rule, I'm just wondering if it would be better to converge on generic functionality if it's cheap to do (which I think it is). If you'd like, I can raise a PR for this cascading approach in ExpressionOptimizerManager - it's fairly minimal and keeps your use case working without the additional session property.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan Thanks for explaining the rational behind the proposal. I get your point on desire of having one single point for row expression evaluation/simplification. However I am still concerned on running expression optimizers tens of times unnecessarily.

My intuition is we don't need to optimize for this cost

I had the same intuition a few years ago, which I realized to be not quite true. People tends to have the assumption that traversing query plans and expression trees are cheap and Presto does not need to worry about it much. It's true when Presto is running batch job, however as we started looking into interactive use cases and extends Presto to serve time sensitive use cases, every millisecond counts and running an additional expression optimizer which traverses every expression tree tens of times unnecessarily is really not an acceptable case.

The thinking behind this pervasive use of ExressionOptimizerManager to retrieve the expression optimizer is we want just a single entry point to influence how expressions get simplified, so we don't need to think about complex interactions between optimizers rules or when exactly the expression optimizer gets invoked.

On the contrary, imo having optimizer rules to pick its own expression optimizer to use sounds a cleaner and more intuitive way. People know which expression optimizer is used and have control on it. Having all optimization rules using the same set of expression optimizers complicates the problem (especially considering that some optimizer rules are not purely expression optimization, it's using the expression optimizer to do constant folding which is used as a util function).

My case is simple, which is to do function rewrite and not an "optimization" of expression. It is a one-time rewrite rather than multiple passes of expressions. It does not necessarily need to use the ExressionOptimizerManager. If this is your concern, I can move away from using ExpressionOptimizerManager and revert back to the previous version which does not use ExpressionOptimizer.

The gap this PR tries to address is the lack of a way to do expression rewrite with internal functions, which is to add an optimizer rule that can hook with internal rewriters. If interaction between optimizer rules and expression rewrites is the concern, I can also move it to analyzer instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gentle ping @tdcmeehan

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your detailed reply @feilong-liu. While I still have reservations that this type of change would really affect the low latency use cases you mention, I trust that your production experience is telling you something important.

I think given you constraint you've laid out, this is the simplest approach which doesn't introduce a new SPI footprint, so I'm in favor of this approach. My only remaining question is don't we want this to be the first or among the first rules to be evaluated?

Copy link
Copy Markdown
Contributor Author

@feilong-liu feilong-liu Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan The current rules which are above this new rules are mainly rules which transform the query plan to a query plan which Presto can execute, for example, subquery flattening, lambda function desugar, set operation flattening etc. IMO, rule transformations like above is OK to have higher priority, and I also prefer the new rule to operate on a plan which is executable by Presto and deal with the node and expressions which the execution engine recognize.

@feilong-liu feilong-liu requested a review from kaikalur January 6, 2026 05:38
kaikalur
kaikalur previously approved these changes Jan 6, 2026
@steveburnett
Copy link
Copy Markdown
Contributor

Please add documentation for the new session property expression_optimizer_in_row_expression_rewrite.

Then please update the release note to link to the new session property documentation. There's an example of such a doc link in Formatting in the Release Notes Guidelines.

return session.getSystemProperty(EXPRESSION_OPTIMIZER_NAME, String.class);
}

public static String getExpressionOptimizerInRowExpressionRewrite(Session session)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also add a config property (that way in normal deployments we can set a cluster default)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added config property

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added config property

Please add documentation for the new configuration property and also the new session property.

CONTRIBUTING.md states

"All new language features, new functions, session and config properties, and major features have documentation added."

/**
* A rule set that rewrites row expressions using a custom expression optimizer.
*/
public class RowExpressionsRewriteOptimizer
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think calling this an optimizer is confusing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename it to RewriteRowExpressions, I am really not good at naming, so asked AI and this is the suggestion it gave to me lol.

@jaystarshot
Copy link
Copy Markdown
Member

LGTM

@feilong-liu feilong-liu merged commit 943bfd3 into prestodb:master Jan 9, 2026
81 of 82 checks passed
@feilong-liu feilong-liu deleted the export-D89703780 branch January 9, 2026 06:44
tdcmeehan pushed a commit to rdtr/presto that referenced this pull request Jan 14, 2026
## Description
We have an internal use case where we want to rewrite functions, where
both rewrite from and rewrite to functions are internal functions.
This PR adds a new function rewrite optimizer, which use an instance of
expression optimizer to do the expression rewrite, the name of the
expression optimizer is specified with session property
`expression_optimizer_in_row_expression_rewrite` which defaults to
disabled when it's empty.

## Motivation and Context
To do function rewrite with internal functions.

## Impact
As in description

## Test Plan
Unit tests and local end to end test

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.
- [ ] If adding new dependencies, verified they have an [OpenSSF
Scorecard](https://securityscorecards.dev/#the-checks) score of 5.0 or
higher (or obtained explicit TSC approval for lower scores).

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== RELEASE NOTES ==

General
* Add an optimizer which can do function rewrite
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants