Skip to content

feat: Support vector search in LogicalPlanner (#27169)#27169

Merged
skyelves merged 4 commits intoprestodb:masterfrom
skyelves:export-D93690255
Mar 19, 2026
Merged

feat: Support vector search in LogicalPlanner (#27169)#27169
skyelves merged 4 commits intoprestodb:masterfrom
skyelves:export-D93690255

Conversation

@skyelves
Copy link
Copy Markdown
Member

@skyelves skyelves commented Feb 19, 2026

Summary:

Support vector search in LogicalPlanner

High level design

The process for executing a CREATE VECTOR INDEX SQL statement is as follows:

  1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Release Notes

== NO RELEASE NOTE ==

Differential Revision: D93690255

@skyelves skyelves requested review from a team, feilong-liu and jaystarshot as code owners February 19, 2026 04:46
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Feb 19, 2026

Reviewer's Guide

Adds full parsing, analysis, and planning support for a new CREATE VECTOR INDEX statement by introducing a corresponding AST node, extending the SQL grammar and formatter, wiring it through analysis as a CTAS-style synthetic query that calls a create_vector_index UDF, and teaching the LogicalPlanner to build plans from that synthetic query, plus associated tests and minor grammar tweaks.

Sequence diagram for CREATE VECTOR INDEX planning flow

sequenceDiagram
    actor User
    participant Client
    participant SqlParser
    participant AstBuilder
    participant Analyzer
    participant Analysis
    participant LogicalPlanner
    participant Optimizer
    participant ExecutionEngine
    participant PythonScript

    User->>Client: Submit CREATE VECTOR INDEX statement
    Client->>SqlParser: parse(sql)
    SqlParser->>AstBuilder: build AST
    AstBuilder-->>SqlParser: CreateVectorIndex node
    SqlParser-->>Analyzer: CreateVectorIndex statement

    Analyzer->>Analysis: setCreateVectorIndexTableName(sourceTable)
    Analyzer->>Analyzer: mapFromProperties(node.properties)
    Analyzer->>Analyzer: extractStringProperty(index_type, distance_metric, index_options)
    Analyzer->>Analyzer: build partitioned_by JSON
    Analyzer->>Analyzer: build propertiesJson
    Analyzer->>Analyzer: build synthetic Query
    Analyzer->>Analysis: setVectorIndexQuery(query)
    Analyzer->>Analysis: setCreateTableDestination(targetIndexTable)
    Analyzer->>Analysis: setCreateTableProperties(empty)
    Analyzer->>Analysis: setCreateTableAsSelectWithData(true)
    Analyzer->>Analysis: addAccessControlCheckForTable(TABLE_CREATE,...)
    Analyzer-->>Client: Analysis complete

    Client->>LogicalPlanner: plan(CreateVectorIndex)
    LogicalPlanner->>Analysis: getVectorIndexQuery()
    Analysis-->>LogicalPlanner: synthetic Query with create_vector_index UDF
    LogicalPlanner->>LogicalPlanner: createTableCreationPlan(analysis, query)
    LogicalPlanner->>Optimizer: optimize CTAS plan
    Optimizer-->>LogicalPlanner: optimized plan (no table scan)
    LogicalPlanner-->>ExecutionEngine: submit execution plan

    ExecutionEngine->>PythonScript: invoke create_vector_index(sourceTable, columns, indexType, propertiesJson, targetIndexTable)
    PythonScript-->>ExecutionEngine: create index table populated
    ExecutionEngine-->>Client: statement completed
    Client-->>User: CREATE VECTOR INDEX succeeded
Loading

Class diagram for new CreateVectorIndex and related planning changes

classDiagram

    class Statement

    class CreateVectorIndex {
      +CreateVectorIndex(Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ where, List~Property~ properties)
      +CreateVectorIndex(NodeLocation location, Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ where, List~Property~ properties)
      +Identifier getIndexName()
      +QualifiedName getTableName()
      +List~Identifier~ getColumns()
      +Optional~Expression~ getWhere()
      +List~Property~ getProperties()
      +<R,C> R accept(AstVisitor~R,C~ visitor, C context)
      +List~Node~ getChildren()
    }

    CreateVectorIndex --|> Statement

    class AstVisitor {
      +<R,C> R visitCreateVectorIndex(CreateVectorIndex node, C context)
    }

    AstVisitor <|.. DefaultTraversalVisitor

    class DefaultTraversalVisitor {
      +Object visitCreateVectorIndex(CreateVectorIndex node, Object context)
    }

    class AstBuilder {
      +Node visitCreateVectorIndex(SqlBaseParser.CreateVectorIndexContext context)
    }

    class SqlFormatter_Formatter {
      +Void visitCreateVectorIndex(CreateVectorIndex node, Integer indent)
    }

    class Analysis {
      -Optional~QualifiedObjectName~ createVectorIndexTableName
      -Optional~Query~ vectorIndexQuery
      +void setCreateVectorIndexTableName(QualifiedObjectName tableName)
      +Optional~QualifiedObjectName~ getCreateVectorIndexTableName()
      +void setVectorIndexQuery(Query query)
      +Optional~Query~ getVectorIndexQuery()
      +void setCreateTableDestination(QualifiedObjectName table)
      +Optional~QualifiedObjectName~ getCreateTableDestination()
    }

    class StatementAnalyzer {
      +Scope visitCreateVectorIndex(CreateVectorIndex node, Optional~Scope~ scope)
      -String extractStringProperty(Map~String,Expression~ properties, String key, String defaultValue)
    }

    class LogicalPlanner {
      +PlanNode plan(Statement statement)
    }

    class StatementUtils {
      +QueryType getQueryType(Class~? extends Statement~ statementClass)
    }

    class Query {}
    class QuerySpecification {}
    class FunctionCall {}
    class Expression {}
    class Property {}
    class Identifier {}
    class QualifiedName {}
    class QualifiedObjectName {}

    CreateVectorIndex --> QualifiedName : uses
    CreateVectorIndex --> Identifier : uses
    CreateVectorIndex --> Expression : optional where
    CreateVectorIndex --> Property : properties

    AstBuilder --> CreateVectorIndex : constructs
    SqlFormatter_Formatter --> CreateVectorIndex : formats
    DefaultTraversalVisitor --> CreateVectorIndex : traverses

    StatementAnalyzer --> CreateVectorIndex : analyzes
    StatementAnalyzer --> Analysis : populates
    Analysis --> Query : holds synthetic

    LogicalPlanner --> Analysis : reads vectorIndexQuery
    LogicalPlanner --> Query : plans CTAS-style

    StatementUtils --> CreateVectorIndex : maps_to_INSERT_QueryType

    AstVisitor --> CreateVectorIndex
    DefaultTraversalVisitor --> Analysis
Loading

File-Level Changes

Change Details Files
Add a CreateVectorIndex AST node and wire it into the parser, visitors, and SQL formatter.
  • Introduce a CreateVectorIndex Statement subclass holding index name, source table, columns, optional WHERE predicate, and WITH properties.
  • Extend SqlBase.g4 to parse CREATE VECTOR INDEX syntax, including VECTOR and INDEX keywords and allowing trailing commas in properties.
  • Implement AstBuilder.visitCreateVectorIndex to build the AST from the parse tree.
  • Add visitCreateVectorIndex overrides in AstVisitor, DefaultTraversalVisitor, and SqlFormatter to support tree walking and pretty-printing of the new statement.
  • Update TestSqlParser and TestSqlParserErrorHandling to cover valid CREATE VECTOR INDEX forms and reserved-word expectations.
presto-parser/src/main/java/com/facebook/presto/sql/tree/CreateVectorIndex.java
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
presto-parser/src/main/java/com/facebook/presto/sql/parser/AstBuilder.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/AstVisitor.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/DefaultTraversalVisitor.java
presto-parser/src/main/java/com/facebook/presto/sql/SqlFormatter.java
presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParser.java
presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParserErrorHandling.java
Extend analysis to support CREATE VECTOR INDEX by synthesizing a CTAS-like query that calls a create_vector_index function and tracking related metadata.
  • Add createVectorIndexTableName and vectorIndexQuery fields with setters/getters to Analysis to store the source table and synthetic query.
  • Implement StatementAnalyzer.visitCreateVectorIndex to validate source/target tables, extract and normalize properties (including partitioned_by array to JSON), construct the create_vector_index FunctionCall and wrapping Query, set CTAS-related analysis state, register access-control checks, and capture updated source columns and result scope.
  • Classify CreateVectorIndex as an INSERT-type query in StatementUtils.
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Analysis.java
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/utils/StatementUtils.java
Teach the LogicalPlanner to plan CREATE VECTOR INDEX statements using the synthesized CTAS-style query from analysis.
  • Extend LogicalPlanner.createPlan dispatch to detect CreateVectorIndex statements.
  • On CreateVectorIndex, retrieve the synthetic vectorIndexQuery from Analysis and delegate to createTableCreationPlan, failing with NOT_SUPPORTED if the query is missing.
presto-main-base/src/main/java/com/facebook/presto/sql/planner/LogicalPlanner.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 3 issues, and left some high level feedback:

  • The WHERE clause on CreateVectorIndex is analyzed but never incorporated into the synthetic create_vector_index query in visitCreateVectorIndex, so any filtering the user specifies will be silently ignored; either plumb it into the function call/plan or reject unsupported WHERE usage explicitly.
  • The hardcoded property keys in visitCreateVectorIndex (e.g., distance_metric) don't match the ones used in the parser tests (e.g., metric), which will confuse users and makes configuration brittle; consider aligning on a single set of property names and enforcing/validating them consistently.
  • The JSON string for propertiesJson is constructed via StringBuilder/String.format with raw string literals, which will break for values containing quotes or backslashes; use a proper JSON encoder or existing JSON utilities to safely build this payload.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `WHERE` clause on `CreateVectorIndex` is analyzed but never incorporated into the synthetic `create_vector_index` query in `visitCreateVectorIndex`, so any filtering the user specifies will be silently ignored; either plumb it into the function call/plan or reject unsupported WHERE usage explicitly.
- The hardcoded property keys in `visitCreateVectorIndex` (e.g., `distance_metric`) don't match the ones used in the parser tests (e.g., `metric`), which will confuse users and makes configuration brittle; consider aligning on a single set of property names and enforcing/validating them consistently.
- The JSON string for `propertiesJson` is constructed via `StringBuilder`/`String.format` with raw string literals, which will break for values containing quotes or backslashes; use a proper JSON encoder or existing JSON utilities to safely build this payload.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java:1184-1185` </location>
<code_context>
+            partitionedByJson.append("]");
+
+            // Pack all config into a JSON properties string for the Python script
+            String propertiesJson = String.format(
+                    "{\"distance_metric\":\"%s\",\"index_options\":\"%s\",\"partitioned_by\":%s}",
+                    distanceMetric, indexOptions, partitionedByJson.toString());
+
</code_context>

<issue_to_address>
**issue (bug_risk):** Avoid manual JSON string construction and ensure proper escaping of property values.

This JSON is assembled with `String.format` and concatenation, so any quotes, backslashes, or control characters in `distance_metric`, `index_options`, or `partitioned_by` values will produce invalid JSON and may enable injection-like behavior in downstream consumers. Prefer constructing this object with a JSON library (e.g., Airlift JSON already used in Presto) so values are properly escaped, or at least centralize robust escaping rather than relying on raw string formatting and `StringBuilder` concatenation.
</issue_to_address>

### Comment 2
<location> `presto-parser/src/main/java/com/facebook/presto/sql/tree/DefaultTraversalVisitor.java:593-590` </location>
<code_context>
             return createAndAssignScope(node, scope);
         }

+        @Override
+        protected Scope visitCreateVectorIndex(CreateVectorIndex node, Optional<Scope> scope)
+        {
</code_context>

<issue_to_address>
**issue (bug_risk):** DefaultTraversalVisitor should also traverse `tableName` for CreateVectorIndex.

In `visitCreateVectorIndex`, the visitor handles `indexName`, `columns`, `where`, and `properties` but skips `tableName`. This diverges from other statement visitors and can break logic that relies on visiting all table references (e.g., identifier collection, access control, rewriters). Please also call `process(node.getTableName(), context);` so the table reference is traversed consistently.
</issue_to_address>

### Comment 3
<location> `presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParserErrorHandling.java:84` </location>
<code_context>
                  "line 1:21: mismatched input ','. Expecting: <expression>"},
                 {"CREATE TABLE foo () AS (VALUES 1)",
-                 "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VIEW'"},
+                 "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
                 {"CREATE TABLE foo (*) AS (VALUES 1)",
-                 "line 1:19: mismatched input '*'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VIEW'"},
</code_context>

<issue_to_address>
**suggestion (testing):** Add negative parser tests for invalid CREATE VECTOR INDEX syntax to complement the positive cases

The error handling table now includes `VECTOR`, but there are no negative parser tests for malformed `CREATE VECTOR INDEX` statements. To exercise the new grammar branch and lock in error behavior, please add a few invalid cases here, e.g.:

- `CREATE VECTOR INDEX idx ON` – missing table and columns
- `CREATE VECTOR INDEX idx ON t` – missing column list
- `CREATE VECTOR INDEX idx ON t()` – empty column list
- `CREATE VECTOR INDEX idx ON t(c) WITH ()` – empty properties

Suggested implementation:

```java
                {"select foo(DISTINCT ,1)",
                 "line 1:21: mismatched input ','. Expecting: <expression>"},
                {"CREATE TABLE foo () AS (VALUES 1)",
                 "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
                {"CREATE TABLE foo (*) AS (VALUES 1)",
                 "line 1:19: mismatched input '*'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
                {"CREATE VECTOR INDEX idx ON",
                 "line 1:26: mismatched input '<EOF>'. Expecting: <identifier>"},
                {"CREATE VECTOR INDEX idx ON t",
                 "line 1:28: mismatched input '<EOF>'. Expecting: '('"},
                {"CREATE VECTOR INDEX idx ON t()",
                 "line 1:30: mismatched input ')'. Expecting: <identifier>"},
                {"CREATE VECTOR INDEX idx ON t(c) WITH ()",
                 "line 1:39: mismatched input ')'. Expecting: <identifier>"},
                {"SELECT grouping(a+2) FROM (VALUES (1)) AS t (a) GROUP BY a+2",
                 "line 1:18: mismatched input '+'. Expecting: ')', ','"},
                {"SELECT x() over (ROWS select) FROM t",

```

1. If the actual error messages produced by the parser differ (column numbers or expected token sets), adjust the expected strings to match the real output from `TestSqlParserErrorHandling` by running the test suite and copying the exact messages.
2. If the grammar for `CREATE VECTOR INDEX` uses different expectations (e.g., a different nonterminal instead of `<identifier>` or additional expected tokens after `ON t`), update the expectations accordingly to reflect the concrete grammar.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

"line 1:21: mismatched input ','. Expecting: <expression>"},
{"CREATE TABLE foo () AS (VALUES 1)",
"line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VIEW'"},
"line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add negative parser tests for invalid CREATE VECTOR INDEX syntax to complement the positive cases

The error handling table now includes VECTOR, but there are no negative parser tests for malformed CREATE VECTOR INDEX statements. To exercise the new grammar branch and lock in error behavior, please add a few invalid cases here, e.g.:

  • CREATE VECTOR INDEX idx ON – missing table and columns
  • CREATE VECTOR INDEX idx ON t – missing column list
  • CREATE VECTOR INDEX idx ON t() – empty column list
  • CREATE VECTOR INDEX idx ON t(c) WITH () – empty properties

Suggested implementation:

                {"select foo(DISTINCT ,1)",
                 "line 1:21: mismatched input ','. Expecting: <expression>"},
                {"CREATE TABLE foo () AS (VALUES 1)",
                 "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
                {"CREATE TABLE foo (*) AS (VALUES 1)",
                 "line 1:19: mismatched input '*'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
                {"CREATE VECTOR INDEX idx ON",
                 "line 1:26: mismatched input '<EOF>'. Expecting: <identifier>"},
                {"CREATE VECTOR INDEX idx ON t",
                 "line 1:28: mismatched input '<EOF>'. Expecting: '('"},
                {"CREATE VECTOR INDEX idx ON t()",
                 "line 1:30: mismatched input ')'. Expecting: <identifier>"},
                {"CREATE VECTOR INDEX idx ON t(c) WITH ()",
                 "line 1:39: mismatched input ')'. Expecting: <identifier>"},
                {"SELECT grouping(a+2) FROM (VALUES (1)) AS t (a) GROUP BY a+2",
                 "line 1:18: mismatched input '+'. Expecting: ')', ','"},
                {"SELECT x() over (ROWS select) FROM t",
  1. If the actual error messages produced by the parser differ (column numbers or expected token sets), adjust the expected strings to match the real output from TestSqlParserErrorHandling by running the test suite and copying the exact messages.
  2. If the grammar for CREATE VECTOR INDEX uses different expectations (e.g., a different nonterminal instead of <identifier> or additional expected tokens after ON t), update the expectations accordingly to reflect the concrete grammar.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 19, 2026

Codenotify: Notifying subscribers in CODENOTIFY files for diff 5022f6b...4e123b1.

No notifications.

Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need documentation for CREATE VECTOR INDEX? Could add a new page in https://github.com/prestodb/presto/tree/master/presto-docs/src/main/sphinx/sql.

@steveburnett
Copy link
Copy Markdown
Contributor

Please add a release note entry for this PR that follows the Release Notes Guidelines to pass the not required but failing CI check.

skyelves added a commit to skyelves/presto that referenced this pull request Feb 24, 2026
Summary:

Support vector search in  LogicalPlanner

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Feb 24, 2026
Summary:

Support vector search in  LogicalPlanner

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Feb 25, 2026
Summary:

Support vector search in  LogicalPlanner

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Support vector search in  LogicalPlanner

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Support vector search in  LogicalPlanner

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Support vector search in  LogicalPlanner

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Support vector search in  LogicalPlanner

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Support vector search in  LogicalPlanner

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Support vector search in  LogicalPlanner

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Support vector search in  LogicalPlanner

Differential Revision: D93690255
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Mar 13, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@gggrace14 gggrace14 self-requested a review March 16, 2026 20:02
gggrace14
gggrace14 previously approved these changes Mar 16, 2026
Copy link
Copy Markdown
Contributor

@gggrace14 gggrace14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 5 commits look good to me after revising.

@meta-codesync meta-codesync bot changed the title feat: Support vector search in LogicalPlanner (#27169) feat: Support vector search in LogicalPlanner (#27169) Mar 17, 2026
skyelves added a commit to skyelves/presto that referenced this pull request Mar 17, 2026
Summary:

Support vector search in  LogicalPlanner

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**

**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
Please follow release notes guidelines and fill in the release notes below.
```
  == RELEASE NOTES ==
  General Changes
  * Add support for create-vector-index statement, which creates
    vector search indexes on table columns with configurable index properties
    and partition filtering via an ``UPDATING FOR`` clause.
```

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 17, 2026
Summary:

Support vector search in  LogicalPlanner

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**

**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
Please follow release notes guidelines and fill in the release notes below.
```
  == RELEASE NOTES ==
  General Changes
  * Add support for create-vector-index statement, which creates
    vector search indexes on table columns with configurable index properties
    and partition filtering via an ``UPDATING FOR`` clause.
```

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 18, 2026
Summary:

Support vector search in  LogicalPlanner

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**

**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 18, 2026
Summary:

Support vector search in  LogicalPlanner

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**

**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D93690255
skyelves pushed a commit to skyelves/presto that referenced this pull request Mar 18, 2026
Summary:
Pull Request resolved: prestodb#27169

Support vector search in  LogicalPlanner

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**

**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 18, 2026
Summary:

Support vector search in  LogicalPlanner

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**

**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D93690255
skyelves added a commit to skyelves/presto that referenced this pull request Mar 19, 2026
Summary:

Support vector search in  LogicalPlanner

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**

**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D93690255
Summary:
## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

**StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.**
**This results in a structured CreateVectorIndexAnalysis object.**

3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
```
== NO RELEASE NOTE ==
```



Differential Revision: D91524358

Pulled By: skyelves
…estodb#27261)

Summary:

Add dedicated WriterTarget subclass and ConnectorMetadata SPI for
CREATE VECTOR INDEX, enabling each connector to implement vector index
creation independently.

- CreateVectorIndexReference: plan-time target carrying index metadata
  and source table reference
- beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to
  NOT_SUPPORTED so connectors must opt in
- ClassLoaderSafeConnectorMetadata: delegation wrappers


## Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D95325176
Summary:

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D95341384
Summary:

Support vector search in  LogicalPlanner

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**

**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D93690255
@NivinCS
Copy link
Copy Markdown
Contributor

NivinCS commented Mar 19, 2026

@skyelves, could you please update the PR to include only the LogicalPlanner changes? The other changes already have separate PRs.

@skyelves
Copy link
Copy Markdown
Member Author

@skyelves, could you please update the PR to include only the LogicalPlanner changes? The other changes already have separate PRs.

@skyelves, could you please update the PR to include only the LogicalPlanner changes? The other changes already have separate PRs.

@skyelves, could you please update the PR to include only the LogicalPlanner changes? The other changes already have separate PRs.

Sorry for the confusion, I am planning to only land this PR which contains four commits. All the PRs are created by meta export service so it's kind of confusing.

@skyelves skyelves merged commit ad97939 into prestodb:master Mar 19, 2026
143 of 146 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants