Skip to content

fix(native): Fix Velox to Presto IN expression conversion#26951

Merged
pramodsatya merged 3 commits intoprestodb:masterfrom
pramodsatya:fix_in_vtop_expr
Feb 5, 2026
Merged

fix(native): Fix Velox to Presto IN expression conversion#26951
pramodsatya merged 3 commits intoprestodb:masterfrom
pramodsatya:fix_in_vtop_expr

Conversation

@pramodsatya
Copy link
Copy Markdown
Contributor

@pramodsatya pramodsatya commented Jan 13, 2026

Description

Fixes Velox to Presto IN expression conversion. When the IN-list is constant, the Velox expression representation uses a constant expression with an array vector to store the list (see conversion here). The Presto IN expression expects the values from constant IN-list to be distinct arguments to the SpecialFormExpression. The VeloxToPrestoExpr is modified accordingly.

Motivation and Context

Resolves #26921.

Impact

Fixes bug with IN expression in native expression optimizer.

Test Plan

Added e2e test.

== NO RELEASE NOTE ==

Summary by Sourcery

Fix Velox-to-Presto conversion of IN expressions to correctly construct Presto special form arguments and add coverage for the native expression optimizer.

Bug Fixes:

  • Correct Velox IN expression conversion when the IN-list is represented as a constant array so Presto receives individual arguments instead of a single array-typed constant.

Tests:

  • Add an end-to-end test ensuring IN expressions are handled correctly by the native expression optimizer in the sidecar plugin test suite.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 13, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Jan 13, 2026

Reviewer's Guide

Adjusts Velox-to-Presto IN expression conversion so constant array-backed IN-lists are expanded into individual arguments, and adds an end-to-end test using the native expression optimizer to validate correct handling of IN expressions.

Sequence diagram for Velox constant IN expression conversion

sequenceDiagram
    actor QueryEngine
    participant VeloxToPrestoExprConverter as Converter
    participant CallTypedExpr as InCallExpr
    participant ConstantTypedExpr as InListExpr
    participant ConstantVector as ConstVector
    participant ArrayVector as InListArray
    participant PrestoSpecialFormExpression as PrestoInExpr

    QueryEngine->>Converter: getSpecialFormExpression(InCallExpr)
    Converter->>InCallExpr: inputs()
    InCallExpr-->>Converter: lhsExpr, inListExpr
    Converter->>Converter: getInSpecialFormExpressionArgs(InCallExpr)

    Converter->>InListExpr: isConstantKind()
    InListExpr-->>Converter: true
    Converter->>InListExpr: toConstantVector(pool_)
    InListExpr-->>Converter: ConstVector
    Converter->>ConstVector: wrappedVector()
    ConstVector-->>Converter: InListArray

    Converter->>InListArray: sizeAt(wrappedIdx), offsetAt(wrappedIdx)
    InListArray-->>Converter: size, offset
    Converter->>InListArray: elements()
    InListArray-->>Converter: elementsVector

    loop for each element in IN list
        Converter->>ConstVector: wrappedIndex(0)
        Converter->>Converter: elementIndex = offset + i
        Converter->>ConstantVector: wrapInConstant(size, elementIndex, elementsVector)
        ConstantVector-->>Converter: elementConstant
        Converter->>Converter: create ConstantTypedExpr(elementConstant)
        Converter->>Converter: getRowExpression(elementConstantExpr)
        Converter-->>Converter: append argument
    end

    Converter-->>Converter: result.arguments = [lhsExpr, expandedArgs]
    Converter-->>QueryEngine: PrestoInExpr
Loading

Updated class diagram for VeloxToPrestoExprConverter IN handling

classDiagram
    class VeloxToPrestoExprConverter {
        +velox::memory::MemoryPool* pool_
        +RowExpressionPtr getRowExpression(velox::core::ITypedExpr* expr)
        +std::vector~RowExpressionPtr~ getSwitchSpecialFormExpressionArgs(const velox::core::CallTypedExpr* switchExpr)
        +std::vector~RowExpressionPtr~ getInSpecialFormExpressionArgs(const velox::core::CallTypedExpr* inExpr)
        +protocol::SpecialFormExpressionPtr getSpecialFormExpression(const velox::core::CallTypedExpr* expr)
    }

    class CallTypedExpr {
        +std::vector~velox::core::ITypedExprPtr~ inputs()
        +std::shared_ptr~velox::Type~ type()
    }

    class ConstantTypedExpr {
        +bool isConstantKind()
        +std::shared_ptr~velox::Vector~ toConstantVector(velox::memory::MemoryPool* pool)
    }

    class ConstantVector {
        +velox::vector_size_t size()
        +velox::vector_size_t wrappedIndex(velox::vector_size_t index)
        +std::shared_ptr~velox::BaseVector~ wrappedVector()
        +static std::shared_ptr~velox::BaseVector~ wrapInConstant(velox::vector_size_t size, velox::vector_size_t index, std::shared_ptr~velox::BaseVector~ elementsVector)
    }

    class ArrayVector {
        +velox::vector_size_t sizeAt(velox::vector_size_t index)
        +velox::vector_size_t offsetAt(velox::vector_size_t index)
        +std::shared_ptr~velox::BaseVector~ elements()
    }

    class SpecialFormExpression {
        +std::string form
        +std::vector~RowExpressionPtr~ arguments
    }

    VeloxToPrestoExprConverter --> CallTypedExpr : converts inputs
    VeloxToPrestoExprConverter --> ConstantTypedExpr : casts IN list input
    ConstantTypedExpr --> ConstantVector : toConstantVector
    ConstantVector --> ArrayVector : wrappedVector as ArrayVector
    VeloxToPrestoExprConverter --> ConstantVector : wrapInConstant
    VeloxToPrestoExprConverter --> SpecialFormExpression : build IN special form
    ArrayVector --> ConstantVector : provides element indices
Loading

File-Level Changes

Change Details Files
Handle Velox constant array IN-lists by expanding them into distinct Presto IN arguments during Velox-to-Presto conversion.
  • Introduce helper getInSpecialFormExpressionArgs to construct argument list for Presto IN SpecialFormExpression from a Velox CallTypedExpr.
  • Detect when the second IN input is a constant array, materialize its ConstantVector and underlying ArrayVector, and iterate elements to wrap each as its own ConstantTypedExpr row expression argument.
  • Fallback to previous behavior (passing inputs[1..] directly) when the IN-list is not a constant array.
  • Wire the new IN-specific argument handling into getSpecialFormExpression by recognizing the "IN" special form name and updating comments about which forms are handled generically.
presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.cpp
presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.h
Extend native sidecar plugin tests to cover IN expression behavior with the native expression optimizer.
  • Update TODO comment to reflect that multiple tests will eventually be removable once the native expression optimizer is universally enabled.
  • Add testInExpression which enables the native expression optimizer and runs a query with a WHERE ... IN ('nation','region') predicate on information_schema.columns to ensure successful execution.
presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java

Assessment against linked issues

Issue Objective Addressed Explanation
#26921 Fix native expression optimizer handling of IN predicates with constant IN-lists so that they no longer produce type errors like "= cannot be applied to varchar, array(varchar)".
#26921 Add regression coverage to ensure an IN predicate query using the native expression optimizer (e.g., on information_schema.columns with a constant IN-list) executes successfully.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@pramodsatya pramodsatya marked this pull request as ready for review January 22, 2026 22:34
@pramodsatya pramodsatya requested review from a team and pdabre12 as code owners January 22, 2026 22:34
Copilot AI review requested due to automatic review settings January 22, 2026 22:34
@prestodb-ci prestodb-ci requested review from a team and namya28 and removed request for a team and Copilot January 22, 2026 22:34
@pramodsatya
Copy link
Copy Markdown
Contributor Author

@aditi-pandit, could you please help review this change?

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • In getInSpecialFormExpressionArgs, the constant-array branch assumes a non-null ArrayVector and elements; consider explicitly handling or guarding against null wrappedVector() / elements() / null array elements to avoid unexpected crashes when the IN-list or its entries contain nulls.
  • The error message in VELOX_CHECK_GE(numInputs, 2, ...) has a small typo (atleast); consider correcting it to at least to keep diagnostics clear and polished.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `getInSpecialFormExpressionArgs`, the constant-array branch assumes a non-null `ArrayVector` and elements; consider explicitly handling or guarding against null `wrappedVector()` / `elements()` / null array elements to avoid unexpected crashes when the IN-list or its entries contain nulls.
- The error message in `VELOX_CHECK_GE(numInputs, 2, ...)` has a small typo (`atleast`); consider correcting it to `at least` to keep diagnostics clear and polished.

## Individual Comments

### Comment 1
<location> `presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.cpp:171` </location>
<code_context>
+  std::vector<RowExpressionPtr> result;
+  const auto& inputs = inExpr->inputs();
+  const auto numInputs = inputs.size();
+  VELOX_CHECK_GE(numInputs, 2, "IN expression should have atleast 2 inputs");
+  result.push_back(getRowExpression(inputs.at(0)));
+
</code_context>

<issue_to_address>
**nitpick (typo):** Fix minor typo in the IN-arity check message.

The error message uses "atleast"; please change this to "at least" for consistency with other messages.

```suggestion
  VELOX_CHECK_GE(numInputs, 2, "IN expression should have at least 2 inputs");
```
</issue_to_address>

### Comment 2
<location> `presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java:671-678` </location>
<code_context>
         assertQuerySucceeds(session, "SELECT array_sort_desc(ARRAY[-25, 20000, -17, 3672], x -> IF(x = 5, NULL, abs(x)))");
     }

+    // This test case verifies the IN expression is handled correctly by the native expression optimizer.
+    @Test
+    public void testInExpression()
+    {
+        Session session = Session.builder(getQueryRunner().getDefaultSession())
+                .setSystemProperty(EXPRESSION_OPTIMIZER_NAME, "native")
+                .build();
+        assertQuerySucceeds(session, "SELECT table_name, COALESCE(abs(ordinal_position), 0) as abs_pos FROM information_schema.columns WHERE table_catalog = 'hive' AND table_name IN ('nation', 'region') ORDER BY table_name, ordinal_position");
+    }
+
</code_context>

<issue_to_address>
**issue (testing):** This test only verifies that the query does not fail, but not that the optimized IN expression produces correct results.

The current assertion only checks for absence of errors. Since the original bug was about incorrect IN conversion, this test should validate results, not just success.

Recommend comparing outputs with and without the native optimizer, e.g.:
- Run the same query with the default optimizer, and use `assertQuery` or `assertQuery(sessionWithNative, sql)` to ensure both result sets match.

This will catch regressions where the query returns wrong rows but still succeeds.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes native expression optimizer failures for IN predicates by correctly converting Velox constant IN lists (stored as a constant array vector) into Presto IN special-form arguments.

Changes:

  • Add VeloxToPrestoExprConverter::getInSpecialFormExpressionArgs to expand constant array IN-lists into distinct constant arguments.
  • Route IN special-form conversion through the new helper in getSpecialFormExpression.
  • Add an e2e test covering IN ('nation','region') under the native expression optimizer.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
presto-native-sidecar-plugin/src/test/java/com/facebook/presto/sidecar/TestNativeSidecarPlugin.java Adds an e2e regression test for IN predicate handling under the native optimizer.
presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.h Declares a new helper to construct Presto IN special-form arguments from Velox IN.
presto-native-execution/presto_cpp/main/types/VeloxToPrestoExpr.cpp Implements constant-array IN-list expansion and wires it into special-form conversion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya. Your code looks okay overall, but I would be great to explain with an example the structural transformations you are doing.

Copy link
Copy Markdown
Contributor Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aditi-pandit, this conversion unwraps the array vector for constant IN-list, performing the inverse operation of convertInExpr:

.
Added comments to elaborate on the rationale behind this conversion, could you please take another look?

Copy link
Copy Markdown
Contributor

@pdabre12 pdabre12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya.
One comment, can we add the failing test queries from the issue as test cases here?

Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya

result.push_back(getRowExpression(constantExpr));
}
} else {
for (auto i = 1; i < numInputs; i++) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the non-constant IN expression. Please add a comment describing the conversion (with an example)

const auto& inList = inputs.at(1);
// Check if IN-list is a constant expression with values represented by a
// constant array vector.
if (numInputs == 2 && inList->isConstantKind() && inList->type()->isArray()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we abstract a function for this if condition part of the code ?

// expression in both of these forms.
std::vector<RowExpressionPtr>
VeloxToPrestoExprConverter::getInSpecialFormExpressionArgs(
const velox::core::CallTypedExpr* inExpr) const {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a const raw pointer ? Why not use a const ref parameter ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input expression here is immutable so a raw pointer is passed, the input expression is also passed on to the caller, getSpecialFormExpression, as a raw pointer. Is it fine to retain this as a raw pointer?

Copy link
Copy Markdown
Contributor Author

@pramodsatya pramodsatya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @aditi-pandit, addressed the review comments. Could you please take another look?

Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pramodsatya

@pramodsatya pramodsatya merged commit 6c7dd64 into prestodb:master Feb 5, 2026
87 of 88 checks passed
@pramodsatya pramodsatya deleted the fix_in_vtop_expr branch February 5, 2026 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[native] Constant folding using native expression optimizer fails for IN predicate queries

5 participants