Skip to content

feat: Add syntax support for CREATE VECTOR INDEX (#27307)#27307

Merged
skyelves merged 1 commit intoprestodb:masterfrom
skyelves:export-D91385788
Mar 17, 2026
Merged

feat: Add syntax support for CREATE VECTOR INDEX (#27307)#27307
skyelves merged 1 commit intoprestodb:masterfrom
skyelves:export-D91385788

Conversation

@skyelves
Copy link
Copy Markdown
Member

@skyelves skyelves commented Mar 10, 2026

Summary:

High level design

The process for executing a CREATE VECTOR INDEX SQL statement is as follows:

  1. SQL Input & Parsing:**

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.

  1. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788

Release Notes

  == RELEASE NOTES ==
  General Changes
  * Add support for create-vector-index statement, which creates
    vector search indexes on table columns with configurable index properties
    and partition filtering via an ``UPDATING FOR`` clause.

@skyelves skyelves requested a review from a team as a code owner March 10, 2026 22:20
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 10, 2026

Reviewer's Guide

Adds full parser, AST, traversal, and formatter support for a new CREATE VECTOR INDEX statement, including grammar changes, a new CreateVectorIndex AST node, visitor wiring, formatting logic, and extensive positive/negative SQL parser tests.

Sequence diagram for parsing and formatting CREATE VECTOR INDEX

sequenceDiagram
    actor User
    participant SqlParser
    participant SqlBaseParser
    participant AstBuilder
    participant CreateVectorIndex
    participant SqlFormatter

    User->>SqlParser: submit SQL
    Note over User,SqlFormatter: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...

    SqlParser->>SqlBaseParser: parse SQL using grammar
    SqlBaseParser-->>SqlParser: CreateVectorIndexContext

    SqlParser->>AstBuilder: visitCreateVectorIndex(context)
    AstBuilder->>AstBuilder: extract indexName, tableName, columns, updatingFor, properties
    AstBuilder->>CreateVectorIndex: new CreateVectorIndex(indexName, tableName, columns, updatingFor, properties)
    CreateVectorIndex-->>AstBuilder: CreateVectorIndex node
    AstBuilder-->>SqlParser: AST CreateVectorIndex

    User->>SqlFormatter: format AST CreateVectorIndex
    SqlFormatter->>CreateVectorIndex: visitCreateVectorIndex(node)
    SqlFormatter->>SqlFormatter: build SQL string
    SqlFormatter-->>User: formatted CREATE VECTOR INDEX SQL
Loading

Class diagram for new CreateVectorIndex AST integration

classDiagram
    class Statement {
    }

    class Node {
    }

    Node <|-- Statement

    class Identifier {
    }

    class QualifiedName {
    }

    class Expression {
    }

    class Property {
    }

    class CreateVectorIndex {
        +Identifier indexName
        +QualifiedName tableName
        +List~Identifier~ columns
        +Optional~Expression~ updatingFor
        +List~Property~ properties
        +CreateVectorIndex(Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
        +CreateVectorIndex(NodeLocation location, Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
        +Identifier getIndexName()
        +QualifiedName getTableName()
        +List~Identifier~ getColumns()
        +Optional~Expression~ getUpdatingFor()
        +List~Property~ getProperties()
        +<R,C> R accept(AstVisitor visitor, C context)
        +List~Node~ getChildren()
    }

    Statement <|-- CreateVectorIndex
    Identifier <-- CreateVectorIndex : uses
    QualifiedName <-- CreateVectorIndex : uses
    Expression <-- CreateVectorIndex : uses
    Property <-- CreateVectorIndex : uses

    class AstVisitor {
        +<R,C> R visitCreateTable(CreateTable node, C context)
        +<R,C> R visitCreateVectorIndex(CreateVectorIndex node, C context)
        +<R,C> R visitCreateType(CreateType node, C context)
    }

    class DefaultTraversalVisitor {
        +<R,C> R visitCreateTable(CreateTable node, C context)
        +<R,C> R visitCreateVectorIndex(CreateVectorIndex node, C context)
        +<R,C> R visitStartTransaction(StartTransaction node, C context)
    }

    AstVisitor <|-- DefaultTraversalVisitor
    AstVisitor --> CreateVectorIndex : visitCreateVectorIndex
    DefaultTraversalVisitor --> CreateVectorIndex : visitCreateVectorIndex

    class SqlBaseParser {
        <<parser>>
        +CreateVectorIndexContext createVectorIndex()
    }

    class SqlBaseParser_CreateVectorIndexContext {
    }

    SqlBaseParser --> SqlBaseParser_CreateVectorIndexContext : produces

    class AstBuilder {
        +Node visitCreateTable(SqlBaseParser_CreateTableContext context)
        +Node visitCreateVectorIndex(SqlBaseParser_CreateVectorIndexContext context)
        +Node visitCreateType(SqlBaseParser_CreateTypeContext context)
    }

    AstBuilder --> CreateVectorIndex : constructs

    class SqlFormatter_Formatter {
        +Void visitCreateTable(CreateTable node, Integer indent)
        +Void visitCreateVectorIndex(CreateVectorIndex node, Integer indent)
        +Void visitCreateType(CreateType node, Integer indent)
    }

    SqlFormatter_Formatter --> CreateVectorIndex : formats
Loading

File-Level Changes

Change Details Files
Introduce CreateVectorIndex AST node and integrate it with the existing AST visitor hierarchy.
  • Add CreateVectorIndex statement class with index name, target table, column list, optional UPDATING FOR expression, and WITH properties
  • Implement children, equality, hashing, toString, and accept methods for the new node
  • Wire CreateVectorIndex into AstVisitor via a dedicated visitCreateVectorIndex method
  • Extend DefaultTraversalVisitor to traverse the new node’s identifiers, optional updating expression, and properties
presto-parser/src/main/java/com/facebook/presto/sql/tree/CreateVectorIndex.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/AstVisitor.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/DefaultTraversalVisitor.java
Extend SQL grammar and parser to recognize CREATE VECTOR INDEX statements including optional WITH properties and UPDATING FOR clause, and allow trailing commas in property lists.
  • Add createVectorIndex alternative to the statement rule with CREATE VECTOR INDEX identifier ON TABLE qualifiedName, column list, optional WITH properties, and optional UPDATING FOR booleanExpression
  • Allow optional trailing comma in properties lists
  • Add INDEX, UPDATING, and VECTOR as tokens and mark INDEX, UPDATING, and VECTOR as non-reserved keywords where appropriate
  • Adjust expected token list in parser error handling tests to include VECTOR
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParserErrorHandling.java
Implement AST building and SQL formatting for CREATE VECTOR INDEX and add comprehensive parser tests, including negative cases.
  • Add AstBuilder.visitCreateVectorIndex to construct CreateVectorIndex from the parse tree, including extracting index name, table name, column identifiers, optional UPDATING FOR expression, and WITH properties
  • Extend SqlFormatter to render CREATE VECTOR INDEX statements with column list, optional WITH clause, and optional UPDATING FOR clause across lines
  • Add TestSqlParser test coverage for CREATE VECTOR INDEX with various table qualifications, column counts, property combinations, UPDATING FOR expressions, and invalid syntax cases
presto-parser/src/main/java/com/facebook/presto/sql/parser/AstBuilder.java
presto-parser/src/main/java/com/facebook/presto/sql/SqlFormatter.java
presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParser.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The CREATE VECTOR INDEX grammar currently restricts the column list to at most two identifiers via identifier (',' identifier)?; this should likely be identifier (',' identifier)* to match the AST builder and tests that assume an arbitrary number of columns.
  • In DefaultTraversalVisitor.visitCreateVectorIndex, the tableName field is never processed, which is inconsistent with other statement visitors and could break visitors that expect to traverse the referenced table; consider calling process(node.getTableName(), context).
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `CREATE VECTOR INDEX` grammar currently restricts the column list to at most two identifiers via `identifier (',' identifier)?`; this should likely be `identifier (',' identifier)*` to match the AST builder and tests that assume an arbitrary number of columns.
- In `DefaultTraversalVisitor.visitCreateVectorIndex`, the `tableName` field is never processed, which is inconsistent with other statement visitors and could break visitors that expect to traverse the referenced table; consider calling `process(node.getTableName(), context)`.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

skyelves added a commit to skyelves/presto that referenced this pull request Mar 10, 2026
Summary:


## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
@skyelves skyelves force-pushed the export-D91385788 branch 2 times, most recently from 05e5dc5 to e2b3e23 Compare March 10, 2026 22:24
skyelves added a commit to skyelves/presto that referenced this pull request Mar 10, 2026
Summary:
Pull Request resolved: prestodb#27307

Pull Request resolved: prestodb#27027

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 10, 2026

Codenotify: Notifying subscribers in CODENOTIFY files for diff 1ddef8b...28cc426.

Notify File(s)
@aditi-pandit presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@elharo presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@kaikalur presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@rschlussel presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4

Copy link
Copy Markdown
Contributor

@jja725 jja725 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Try to learn from it.

| CREATE VECTOR INDEX identifier ON qualifiedName
'(' identifier (',' identifier)? ')'
(WITH properties)?
(UPDATING FOR booleanExpression)? #createVectorIndex
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC defines ON TABLE candidates_table(id, embedding) with two required columns. Should the parser enforce exactly 2 columns? If
single-column is valid (using $row_id for id), this should be documented.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC defines ON TABLE candidates_table(id, embedding) with two required columns. Should the parser enforce exactly 2 columns? If single-column is valid (using $row_id for id), this should be documented.

Thanks for comments! After discussing several rounds, we decide to proceed with keeping $row_id optional. We will update RFC accordingly.

We want to support either (embedding) or (embedding, row_id)

Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need documentation for CREATE VECTOR INDEX added in https://prestodb.io/docs/current/sql.html ?

skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
skyelves pushed a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:
Pull Request resolved: prestodb#27307

Pull Request resolved: prestodb#27027

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
@meta-codesync meta-codesync bot changed the title feat: Add syntax support for CREATE VECTOR INDEX (#27027) feat: Add syntax support for CREATE VECTOR INDEX (#27307) Mar 13, 2026
skyelves pushed a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:
Pull Request resolved: prestodb#27307

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
@skyelves
Copy link
Copy Markdown
Member Author

Do we need documentation for CREATE VECTOR INDEX added in https://prestodb.io/docs/current/sql.html ?

yes, created a separate PR https://github.com/prestodb/presto/pull/27332/changes

Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

## Release Notes
Please follow release notes guidelines and fill in the release notes below.
```
  == RELEASE NOTES ==
  General Changes
  * Add support for create-vector-index statement, which creates
    vector search indexes on table columns with configurable index properties
    and partition filtering via an ``UPDATING FOR`` clause.
```

Differential Revision: D91385788
@gggrace14 gggrace14 self-requested a review March 16, 2026 23:05
Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @skyelves

Copy link
Copy Markdown
Contributor

@NivinCS NivinCS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, @skyelves . The changes look good to me, except for the CI test suite failure. Please look into it.

@steveburnett
Copy link
Copy Markdown
Contributor

Please add a release note (or NO RELEASE NOTE) following the Release Notes Guidelines to pass the failing but not required CI check.

@skyelves skyelves merged commit ba0158b into prestodb:master Mar 17, 2026
114 of 118 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants