feat: Add analysis support for CREATE VECTOR INDEX (#27036) by skyelves · Pull Request #27036 · prestodb/presto

skyelves · 2026-01-26T23:17:19Z

Summary:

High level design

The process for executing a CREATE VECTOR INDEX SQL statement is as follows:

SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.

Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Release Notes

== NO RELEASE NOTE ==

Differential Revision: D91524358

Pulled By: skyelves

sourcery-ai · 2026-01-26T23:17:30Z

Reviewer's Guide

Adds full parser, AST, formatter, analyzer, and query-type support for a new CREATE VECTOR INDEX statement, along with tests and a design doc describing a future plan-optimizer rewrite to a UDF-based SELECT.

Sequence diagram for CREATE VECTOR INDEX statement processing

sequenceDiagram
    actor Client
    participant SqlParser
    participant AstBuilder
    participant AstTree as AstVisitor_AstBuilder
    participant StatementAnalyzer
    participant Analysis
    participant StatementUtils
    participant QueryDispatcher

    Client->>SqlParser: parse("CREATE VECTOR INDEX ...")
    SqlParser->>AstBuilder: visitCreateVectorIndexContext
    AstBuilder->>AstTree: visitCreateVectorIndex(context)
    AstTree-->>AstBuilder: CreateVectorIndex AST node
    AstBuilder-->>SqlParser: CreateVectorIndex
    SqlParser-->>QueryDispatcher: Statement(CreateVectorIndex)

    QueryDispatcher->>StatementUtils: getQueryType(CreateVectorIndex.class)
    StatementUtils-->>QueryDispatcher: QueryType.SELECT

    QueryDispatcher->>StatementAnalyzer: analyze(CreateVectorIndex)
    StatementAnalyzer->>Analysis: setCreateVectorIndexTableName(tableName)
    StatementAnalyzer-->>QueryDispatcher: Scope(result BOOLEAN)

    QueryDispatcher-->>Client: planned SELECT-style execution path

Class diagram for the new CreateVectorIndex AST and related components

classDiagram
    class Statement {
    }

    class CreateVectorIndex {
        +Identifier indexName
        +QualifiedName tableName
        +List~Identifier~ columns
        +Optional~Expression~ where
        +List~Property~ properties
        +CreateVectorIndex(Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ where, List~Property~ properties)
        +CreateVectorIndex(NodeLocation location, Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ where, List~Property~ properties)
        +Identifier getIndexName()
        +QualifiedName getTableName()
        +List~Identifier~ getColumns()
        +Optional~Expression~ getWhere()
        +List~Property~ getProperties()
        +<R,C> R accept(AstVisitor visitor, C context)
        +List~Node~ getChildren()
    }

    class Identifier {
    }

    class QualifiedName {
    }

    class Expression {
    }

    class Property {
        +Identifier name
        +Expression value
    }

    class AstVisitor {
        +<R,C> R visitCreateVectorIndex(CreateVectorIndex node, C context)
    }

    class DefaultTraversalVisitor {
        +Void visitCreateVectorIndex(CreateVectorIndex node, Object context)
    }

    class SqlFormatter_Visitor {
        +Void visitCreateVectorIndex(CreateVectorIndex node, Integer indent)
    }

    class Analysis {
        -Optional~QualifiedObjectName~ createVectorIndexTableName
        +void setCreateVectorIndexTableName(QualifiedObjectName tableName)
        +Optional~QualifiedObjectName~ getCreateVectorIndexTableName()
    }

    class StatementAnalyzer_Visitor {
        +Scope visitCreateVectorIndex(CreateVectorIndex node, Optional~Scope~ scope)
    }

    class StatementUtils {
        -Map~Class, QueryType~ queryTypes
        +QueryType getQueryType(Class statementClass)
    }

    class QueryType {
        <<enum>>
        SELECT
        DATA_DEFINITION
        CONTROL
    }

    Statement <|-- CreateVectorIndex
    AstVisitor <|-- DefaultTraversalVisitor
    AstVisitor <|-- SqlFormatter_Visitor
    AstVisitor <|-- StatementAnalyzer_Visitor

    CreateVectorIndex --> Identifier
    CreateVectorIndex --> QualifiedName
    CreateVectorIndex --> Expression
    CreateVectorIndex --> Property

    StatementAnalyzer_Visitor --> Analysis
    StatementUtils --> QueryType
    StatementUtils ..> CreateVectorIndex
    Analysis ..> QualifiedObjectName

File-Level Changes

Change	Details	Files
Introduce CreateVectorIndex AST node and parsing/formatting support for CREATE VECTOR INDEX statements with columns, optional WHERE, and WITH properties.	Add CreateVectorIndex statement node class with index name, table name, columns, optional where clause, and properties. Extend ANTLR grammar to recognize CREATE VECTOR INDEX syntax, allow VECTOR/INDEX as non-reserved keywords, and loosen properties list to allow optional trailing comma. Implement AstBuilder.visitCreateVectorIndex to build the new AST from the parser context. Teach AstVisitor and DefaultTraversalVisitor how to visit/traverse CreateVectorIndex nodes. Add SqlFormatter support to render CREATE VECTOR INDEX statements, including formatting of columns, WHERE, and WITH properties. Extend parser success tests to cover basic, WHERE, WITH properties, and combined CREATE VECTOR INDEX forms. Update parser error-handling expectations to include VECTOR in certain CREATE TABLE error messages.	`presto-parser/src/main/java/com/facebook/presto/sql/tree/CreateVectorIndex.java` `presto-parser/src/main/java/com/facebook/presto/sql/parser/AstBuilder.java` `presto-parser/src/main/java/com/facebook/presto/sql/tree/AstVisitor.java` `presto-parser/src/main/java/com/facebook/presto/sql/tree/DefaultTraversalVisitor.java` `presto-parser/src/main/java/com/facebook/presto/sql/SqlFormatter.java` `presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4` `presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParser.java` `presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParserErrorHandling.java`
Wire CREATE VECTOR INDEX into analysis and classification layers so it can be treated as a SELECT-like statement and participate in planning later.	Extend Analysis to store the qualified table name associated with a CREATE VECTOR INDEX statement and expose setter/getter. Update StatementAnalyzer to analyze CreateVectorIndex, resolving table name, validating properties, and setting an output boolean field scope. Register CreateVectorIndex in StatementUtils with QueryType.SELECT so it goes through the normal query execution path instead of DDL.	`presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Analysis.java` `presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java` `presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/utils/StatementUtils.java`
Document the planned plan-optimizer rewrite of CREATE VECTOR INDEX into a metadata-only UDF SELECT invocation.	Add a detailed design document describing how CREATE VECTOR INDEX will later be transformed at the plan optimizer level into a SELECT that calls a Python UDF with table/column/where/properties metadata. Outline phases from parsing to optimization, the shape of the CreateVectorIndexNode, the optimizer rule, and the UDF contract.	`CREATE_VECTOR_INDEX_PLAN_OPTIMIZER.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

github-actions · 2026-01-26T23:19:40Z

Codenotify: Notifying subscribers in CODENOTIFY files for diff 5022f6b...98cf096.

No notifications.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

The change to properties in SqlBase.g4 to allow a trailing comma applies to all WITH (...) property lists, not just CREATE VECTOR INDEX; please double-check that this broader grammar relaxation (and the adjusted error expectations) is intentional for all existing statements that use properties.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- The change to `properties` in `SqlBase.g4` to allow a trailing comma applies to all `WITH (...)` property lists, not just `CREATE VECTOR INDEX`; please double-check that this broader grammar relaxation (and the adjusted error expectations) is intentional for all existing statements that use `properties`.

## Individual Comments

### Comment 1
<location> `presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParserErrorHandling.java:83-86` </location>
<code_context>
+                 {"CREATE TABLE foo () AS (VALUES 1)",
</code_context>

<issue_to_address>
**suggestion (testing):** Add negative tests for invalid CREATE VECTOR INDEX syntax to the error-handling suite

With the grammar now supporting `CREATE VECTOR INDEX` (and `VECTOR` in the expected tokens), please add negative error-handling tests for malformed vector index statements in `getStatements()`. For instance:

- `CREATE VECTOR INDEX` (missing index name and rest of statement)
- `CREATE VECTOR INDEX idx ON` (missing table)
- `CREATE VECTOR INDEX idx ON t` (missing column list)
- `CREATE VECTOR INDEX idx ON t()` / `CREATE VECTOR INDEX idx ON t(,)` (invalid column list)

These cases help verify clear, stable error messages for common mistakes and protect against grammar regressions around the new syntax.

Suggested implementation:

```java
                {"CREATE TABLE foo () AS (VALUES 1)",
                 "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
                {"CREATE TABLE foo (*) AS (VALUES 1)",
                 "line 1:19: mismatched input '*'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
                {"CREATE VECTOR INDEX",
                 "line 1:20: mismatched input '<EOF>'. Expecting: <identifier>"},
                {"CREATE VECTOR INDEX idx ON",
                 "line 1:29: mismatched input '<EOF>'. Expecting: <identifier>"},
                {"CREATE VECTOR INDEX idx ON t",
                 "line 1:31: mismatched input '<EOF>'. Expecting: '('"},
                {"CREATE VECTOR INDEX idx ON t()",
                 "line 1:32: mismatched input ')'. Expecting: <identifier>"},
                {"CREATE VECTOR INDEX idx ON t(,)",
                 "line 1:32: mismatched input ','. Expecting: <identifier>"},
                {"SELECT grouping(a+2) FROM (VALUES (1)) AS t (a) GROUP BY a+2",

```

The exact error column numbers and messages (especially the expected tokens like &lt;identifier&gt; vs a concrete token name) may differ slightly depending on the current ANTLR grammar and error handler configuration in your version of Presto. If test failures occur:
1. Run the tests to see the actual parser error messages for each of the added SQL snippets.
2. Adjust the `line 1:XX:` column indices and the `Expecting: ...` portions in each of the new test cases to match the real output exactly.
3. If your test suite uses a helper to normalize or format error messages, ensure the new expectations follow that convention (e.g., quoting identifiers or token names consistently with nearby tests).
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-01-26T23:22:59Z

presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParserErrorHandling.java

                {"CREATE TABLE foo () AS (VALUES 1)",
-                 "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VIEW'"},
+                 "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
                {"CREATE TABLE foo (*) AS (VALUES 1)",
-                 "line 1:19: mismatched input '*'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VIEW'"},
+                 "line 1:19: mismatched input '*'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},


suggestion (testing): Add negative tests for invalid CREATE VECTOR INDEX syntax to the error-handling suite

With the grammar now supporting CREATE VECTOR INDEX (and VECTOR in the expected tokens), please add negative error-handling tests for malformed vector index statements in getStatements(). For instance:

CREATE VECTOR INDEX (missing index name and rest of statement)

CREATE VECTOR INDEX idx ON (missing table)

CREATE VECTOR INDEX idx ON t (missing column list)

CREATE VECTOR INDEX idx ON t() / CREATE VECTOR INDEX idx ON t(,) (invalid column list)

These cases help verify clear, stable error messages for common mistakes and protect against grammar regressions around the new syntax.

Suggested implementation:

{"CREATE TABLE foo () AS (VALUES 1)", "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"}, {"CREATE TABLE foo (*) AS (VALUES 1)", "line 1:19: mismatched input '*'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"}, {"CREATE VECTOR INDEX", "line 1:20: mismatched input '<EOF>'. Expecting: <identifier>"}, {"CREATE VECTOR INDEX idx ON", "line 1:29: mismatched input '<EOF>'. Expecting: <identifier>"}, {"CREATE VECTOR INDEX idx ON t", "line 1:31: mismatched input '<EOF>'. Expecting: '('"}, {"CREATE VECTOR INDEX idx ON t()", "line 1:32: mismatched input ')'. Expecting: <identifier>"}, {"CREATE VECTOR INDEX idx ON t(,)", "line 1:32: mismatched input ','. Expecting: <identifier>"}, {"SELECT grouping(a+2) FROM (VALUES (1)) AS t (a) GROUP BY a+2",

The exact error column numbers and messages (especially the expected tokens like <identifier> vs a concrete token name) may differ slightly depending on the current ANTLR grammar and error handler configuration in your version of Presto. If test failures occur:

Run the tests to see the actual parser error messages for each of the added SQL snippets.

Adjust the line 1:XX: column indices and the Expecting: ... portions in each of the new test cases to match the real output exactly.

If your test suite uses a helper to normalize or format error messages, ensure the new expectations follow that convention (e.g., quoting identifiers or token names consistently with nearby tests).

aditi-pandit

@skyelves : Thanks for this code. Had couple of comments.

aditi-pandit · 2026-01-27T01:50:44Z

CREATE_VECTOR_INDEX_PLAN_OPTIMIZER.md

+
+---
+
+## Files to Modify/Create


There isn't any need to repeat all the code in the doc.

aditi-pandit · 2026-01-27T01:52:21Z

CREATE_VECTOR_INDEX_PLAN_OPTIMIZER.md

+
+### 1. UDF Receives Metadata Only
+
+The `create_local_index` UDF does **NOT** receive actual row data. It receives:


What is the need to wrap this is a CREATE VECTOR INDEX statement ?

If we create a statement then it needs to work with all kinds of tables etc... Doesn't seem the code is as generic.

aditi-pandit · 2026-01-27T01:52:48Z

@skyelves : Thanks for this work. It might be good if you can write a basic RFC for this. It is quite a complex piece of work that is adding new syntax etc, and also we want to work with Iceberg and specific vector indexing libraries on our side as well, so it would be good to clear out that interface.

steveburnett

The .md file appears to be the same file as in PR #27027 .

I feel like I would expect to see CREATE VECTOR INDEX documentation added in this PR, in the form of a .rst file in https://github.com/prestodb/presto/tree/master/presto-docs/src/main/sphinx/sql.

prestodb#27036) Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

NivinCS · 2026-02-25T07:46:33Z

presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java

+                    distanceMetric, indexOptions, partitionedByJson.toString());
+
+            // Build synthetic query: SELECT create_vector_index('source_table', 'col1', 'col2', 'type', 'props')
+            // No FROM clause — the Python script handles all data access, no table scan needed.


Since vector index creation requires scanning table data (e.g., embedding columns), delegating all data access to the Python UDF without a visible table scan prevents the analyzer from registering read dependencies on the indexed table and enforcing column-level SELECT privileges during index build. This effectively indicates that the index build process accesses the underlying data source outside Presto’s planning and execution framework, which may lead to inconsistencies with snapshot isolation, partition pruning, predicate pushdown, resource group enforcement, and access control checks. To maintain governance and consistency guarantees expected in a lakehouse execution model, it would be preferable for index build operations to execute within Presto’s planning and scheduling framework, with the underlying table scan represented in the analysed query similar to CTAS.

Sorry for the noise. This is actually a bug and I just fixed it. The analysed query should be similiar to CTAS

Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: **StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.** **This results in a structured CreateVectorIndexAnalysis object.** 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91524358

Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

Summary: Pull Request resolved: prestodb#27036 ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: **StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.** **This results in a structured CreateVectorIndexAnalysis object.** 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91524358

linux-foundation-easycla · 2026-03-13T18:11:55Z

The committers listed above are authorized under a signed CLA.

✅ login: skyelves / name: Ke Wang (98cf096)

Summary: Pull Request resolved: prestodb#27036 ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: **StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.** **This results in a structured CreateVectorIndexAnalysis object.** 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91524358

Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: **StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.** **This results in a structured CreateVectorIndexAnalysis object.** 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91524358

aditi-pandit

Thanks @skyelves for this code. Please can you add some tests in TestAnalyzer class for the StatementAnalyzer code.

aditi-pandit

Please add unit tests.

steveburnett · 2026-03-17T17:36:24Z

Please add a release note - or NO RELEASE NOTE - following the Release Notes Guidelines to pass the failing but not required CI check.

Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: **StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.** **This results in a structured CreateVectorIndexAnalysis object.** 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes Please follow release notes guidelines and fill in the release notes below. ``` == RELEASE NOTES == General Changes * Add support for create-vector-index statement, which creates vector search indexes on table columns with configurable index properties and partition filtering via an ``UPDATING FOR`` clause. ``` Differential Revision: D91524358

skyelves · 2026-03-17T21:12:03Z

Please add unit tests.

Thanks, added some tests. Could you take a another look?

skyelves · 2026-03-17T22:45:29Z

Please add a release note - or NO RELEASE NOTE - following the Release Notes Guidelines to pass the failing but not required CI check.

added

aditi-pandit

Thanks for adding the tests.

NivinCS · 2026-03-18T06:34:19Z

presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Analysis.java

+    public static final class CreateVectorIndexAnalysis
+    {
+        private final QualifiedObjectName sourceTableName;
+        private final QualifiedObjectName targetTableName;


The index artifact is currently represented using QualifiedObjectName, similar to a table. Since vector indexes may also be implemented as connector-managed artifacts (e.g., external index files or metadata entries), it would be better to treat this as a logical index identifier rather than strictly a physical table. This would allow connectors to map the index name to their own storage model while keeping the engine abstraction consistent across different implementations.

NivinCS · 2026-03-18T14:40:47Z

presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java

+            Map<String, ColumnHandle> sourceColumns = metadataResolver.getColumnHandles(sourceTableHandle);
+            for (Identifier column : node.getColumns()) {
+                if (!sourceColumns.containsKey(column.getValue())) {
+                    throw new SemanticException(MISSING_COLUMN, column, "Column '%s' does not exist in source table '%s'", column.getValue(), sourceTableName);


The current validation ensures that the specified columns exist in the source table, which is good. However, since the syntax allows either (embedding) or (row_id, embedding), it would be helpful to also validate the column structure. If only one column is provided, it should be validated as an embedding column rather than a row identifier. Additionally, when two columns are specified, they should follow the (row_id, embedding) order(optional). This validation can help prevent invalid cases like (id) from passing analysis and failing later during index creation.

gggrace14 · 2026-03-18T18:21:42Z

presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java

+                throw new SemanticException(MISSING_TABLE, node, "Source table '%s' does not exist", sourceTableName);
+            }
+
+            QualifiedObjectName targetTable = createQualifiedObjectName(session, node, node.getIndexName(), metadata);


Hi @skyelves , here you're creating targetTable of type QualifiedObjectName from node.getIndexName() of type QualifiedName, are you referencing to some example? Is it okay to keep using QualifiedName at the Analyzer layer? Can you help check? Quick look at the Analysis class shows several QualifiedName used there.

Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: **StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.** **This results in a structured CreateVectorIndexAnalysis object.** 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D91524358 Pulled By: skyelves

skyelves requested review from a team, feilong-liu, jaystarshot and steveburnett as code owners January 26, 2026 23:17

meta-codesync bot added fb-exported meta-exported labels Jan 26, 2026

skyelves changed the title ~~feat[Vector Search][2/n]: Add syntax support for CREATE VECTOR INDEX~~ feat: Add syntax support for CREATE VECTOR INDEX Jan 26, 2026

skyelves changed the title ~~feat: Add syntax support for CREATE VECTOR INDEX~~ feat: Add analysis support for CREATE VECTOR INDEX Jan 26, 2026

sourcery-ai bot reviewed Jan 26, 2026

View reviewed changes

aditi-pandit reviewed Jan 27, 2026

View reviewed changes

steveburnett reviewed Jan 27, 2026

View reviewed changes

skyelves added a commit to skyelves/presto that referenced this pull request Feb 19, 2026

feat[Vector Search][2/n]: Add analysis support for CREATE VECTOR INDEX (

c4f805d

prestodb#27036) Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

skyelves force-pushed the export-D91524358 branch from e857c4a to c4f805d Compare February 19, 2026 00:18

skyelves added a commit to skyelves/presto that referenced this pull request Feb 19, 2026

feat[Vector Search][2/n]: Add analysis support for CREATE VECTOR INDEX (

96d7f07

prestodb#27036) Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

skyelves force-pushed the export-D91524358 branch from c4f805d to 96d7f07 Compare February 19, 2026 01:06

skyelves added a commit to skyelves/presto that referenced this pull request Feb 19, 2026

feat[Vector Search][2/n]: Add analysis support for CREATE VECTOR INDEX (

ea5bf42

prestodb#27036) Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

skyelves added a commit to skyelves/presto that referenced this pull request Feb 19, 2026

feat[Vector Search][2/n]: Add analysis support for CREATE VECTOR INDEX (

ff69410

prestodb#27036) Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

skyelves added a commit to skyelves/presto that referenced this pull request Feb 19, 2026

feat: Add analysis support for CREATE VECTOR INDEX (prestodb#27036)

96d0e1d

Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

skyelves force-pushed the export-D91524358 branch from 96d7f07 to 96d0e1d Compare February 19, 2026 03:03

skyelves added a commit to skyelves/presto that referenced this pull request Feb 19, 2026

feat: Add analysis support for CREATE VECTOR INDEX (prestodb#27036)

9508978

Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

skyelves force-pushed the export-D91524358 branch from 96d0e1d to 9508978 Compare February 19, 2026 04:46

skyelves added a commit to skyelves/presto that referenced this pull request Feb 19, 2026

feat: Add analysis support for CREATE VECTOR INDEX (prestodb#27036)

3285e3b

Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

skyelves added a commit to skyelves/presto that referenced this pull request Feb 24, 2026

feat: Add analysis support for CREATE VECTOR INDEX (prestodb#27036)

97881cc

Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

skyelves force-pushed the export-D91524358 branch from 9508978 to 97881cc Compare February 24, 2026 01:01

skyelves added a commit to skyelves/presto that referenced this pull request Feb 24, 2026

feat: Add analysis support for CREATE VECTOR INDEX (prestodb#27036)

8d22106

Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

skyelves force-pushed the export-D91524358 branch from 97881cc to bc37d64 Compare February 24, 2026 21:16

NivinCS reviewed Feb 25, 2026

View reviewed changes

skyelves pushed a commit to skyelves/presto that referenced this pull request Mar 13, 2026

feat: Add analysis support for CREATE VECTOR INDEX (prestodb#27036)

d07c679

Summary: Pull Request resolved: prestodb#27036 Differential Revision: D91524358

skyelves force-pushed the export-D91524358 branch from ed2f575 to 59681f3 Compare March 13, 2026 18:10

skyelves force-pushed the export-D91524358 branch from 59681f3 to 9ad94e5 Compare March 13, 2026 18:11

skyelves force-pushed the export-D91524358 branch from 9ad94e5 to 717d698 Compare March 13, 2026 18:16

skyelves force-pushed the export-D91524358 branch from 717d698 to 486f144 Compare March 16, 2026 22:53

gggrace14 self-requested a review March 16, 2026 23:05

gggrace14 previously approved these changes Mar 16, 2026

View reviewed changes

aditi-pandit reviewed Mar 17, 2026

View reviewed changes

aditi-pandit requested changes Mar 17, 2026

View reviewed changes

skyelves dismissed gggrace14’s stale review via 3632c80 March 17, 2026 21:10

skyelves force-pushed the export-D91524358 branch from 486f144 to 3632c80 Compare March 17, 2026 21:10

skyelves force-pushed the export-D91524358 branch from 3632c80 to 1e017b8 Compare March 17, 2026 21:46

aditi-pandit previously approved these changes Mar 17, 2026

View reviewed changes

NivinCS reviewed Mar 18, 2026

View reviewed changes

feilong-liu previously approved these changes Mar 18, 2026

View reviewed changes

meta-codesync bot changed the title ~~feat: Add analysis support for CREATE VECTOR INDEX (#27036)~~ feat: Add analysis support for CREATE VECTOR INDEX (#27036) (#27036) Mar 18, 2026

skyelves force-pushed the export-D91524358 branch from 1e017b8 to fdc220d Compare March 18, 2026 17:24

gggrace14 reviewed Mar 18, 2026

View reviewed changes


		### 1. UDF Receives Metadata Only

		The `create_local_index` UDF does NOT receive actual row data. It receives:

Conversation

skyelves commented Jan 26, 2026 • edited by meta-codesync bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

High level design

Release Notes

Uh oh!

sourcery-ai bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for CREATE VECTOR INDEX statement processing

Class diagram for the new CreateVectorIndex AST and related components

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

github-actions bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

aditi-pandit left a comment

Choose a reason for hiding this comment

Uh oh!

aditi-pandit Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

aditi-pandit Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

aditi-pandit commented Jan 27, 2026

Uh oh!

steveburnett left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NivinCS Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

skyelves Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

linux-foundation-easycla bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aditi-pandit left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aditi-pandit left a comment

Choose a reason for hiding this comment

Uh oh!

steveburnett commented Mar 17, 2026

Uh oh!

skyelves commented Mar 17, 2026

Uh oh!

skyelves commented Mar 17, 2026

Uh oh!

aditi-pandit left a comment

Choose a reason for hiding this comment

Uh oh!

NivinCS Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

NivinCS Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

gggrace14 Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

skyelves commented Jan 26, 2026 •

edited by meta-codesync bot

Loading

sourcery-ai bot commented Jan 26, 2026 •

edited

Loading

github-actions bot commented Jan 26, 2026 •

edited

Loading

steveburnett left a comment •

edited

Loading

linux-foundation-easycla bot commented Mar 13, 2026 •

edited

Loading

aditi-pandit left a comment •

edited

Loading

gggrace14 Mar 18, 2026 •

edited

Loading