Skip to content

feat: Add create_vector_index function signature (#27264)#27264

Closed
skyelves wants to merge 3 commits intoprestodb:masterfrom
skyelves:export-D95341384
Closed

feat: Add create_vector_index function signature (#27264)#27264
skyelves wants to merge 3 commits intoprestodb:masterfrom
skyelves:export-D95341384

Conversation

@skyelves
Copy link
Copy Markdown
Member

@skyelves skyelves commented Mar 5, 2026

Summary:

Add create_vector_index function signature

High level design

The process for executing a CREATE VECTOR INDEX SQL statement is as follows:

  1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Release Notes

== NO RELEASE NOTE ==

Differential Revision: D95341384

@skyelves skyelves requested review from a team, feilong-liu and jaystarshot as code owners March 5, 2026 07:52
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Mar 5, 2026

Reviewer's Guide

Adds end-to-end support for a new CREATE VECTOR INDEX SQL statement, including parser/AST, analyzer, planning hooks, SPI connector APIs, and a dummy aggregation function used only for planning, plus tests and error handling updates.

Sequence diagram for CREATE VECTOR INDEX statement lifecycle

sequenceDiagram
    actor User
    participant Coordinator as PrestoCoordinator
    participant Parser as SqlParser_AstBuilder
    participant Analyzer as StatementAnalyzer
    participant Analysis as Analysis
    participant Planner as Planner_TableWriter
    participant Meta as ConnectorMetadata
    participant Connector as ConnectorImpl
    participant CreateVectorIndexAggregation

    User->>Coordinator: submit SQL
    Coordinator->>Parser: parse CREATE VECTOR INDEX
    Parser-->>Coordinator: CreateVectorIndex AST

    Coordinator->>Analyzer: analyze(CreateVectorIndex)
    Analyzer->>Meta: tableExists(sourceTableName)
    Meta-->>Analyzer: boolean
    Analyzer->>Meta: tableExists(targetTableName)
    Meta-->>Analyzer: boolean
    Analyzer->>Meta: getTableHandle(sourceTableName)
    Meta-->>Analyzer: TableHandle
    Analyzer->>Meta: getColumnHandles(TableHandle)
    Meta-->>Analyzer: Map columnHandles
    Analyzer->>Analyzer: validate columns and properties
    Analyzer->>Analysis: setCreateVectorIndexAnalysis(CreateVectorIndexAnalysis)
    Analyzer-->>Coordinator: Scope with VARCHAR result

    Coordinator->>Planner: plan(CreateVectorIndexAnalysis)
    Planner->>Meta: beginCreateVectorIndex(session, indexMetadata, layout, sourceTableName)
    Meta->>Connector: beginCreateVectorIndex(session, indexMetadata, layout, sourceTableName)
    Connector-->>Meta: ConnectorOutputTableHandle
    Meta-->>Planner: ConnectorOutputTableHandle

    loop execute task writers
        Planner->>CreateVectorIndexAggregation: aggregate embeddings and ids
    end

    Planner->>Meta: finishCreateVectorIndex(session, handle, fragments, stats)
    Meta->>Connector: finishCreateVectorIndex(session, handle, fragments, stats)
    Connector-->>Meta: Optional ConnectorOutputMetadata
    Meta-->>Planner: Optional ConnectorOutputMetadata
    Planner-->>Coordinator: completed plan execution
    Coordinator-->>User: success response
Loading

Class diagram for new CreateVectorIndex AST and analysis types

classDiagram
    class Statement {
    }

    class CreateVectorIndex {
        +Identifier indexName
        +QualifiedName tableName
        +List~Identifier~ columns
        +Optional~Expression~ updatingFor
        +List~Property~ properties
        +CreateVectorIndex(Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
        +CreateVectorIndex(NodeLocation location, Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
        +Identifier getIndexName()
        +QualifiedName getTableName()
        +List~Identifier~ getColumns()
        +Optional~Expression~ getUpdatingFor()
        +List~Property~ getProperties()
        +<R,C> R accept(AstVisitor~R,C~ visitor, C context)
        +List~Node~ getChildren()
    }

    class AstVisitor~R,C~ {
        +R visitCreateVectorIndex(CreateVectorIndex node, C context)
    }

    class DefaultTraversalVisitor~R,C~ {
        +R visitCreateVectorIndex(CreateVectorIndex node, C context)
    }

    class SqlFormatter {
        +Void visitCreateVectorIndex(CreateVectorIndex node, Integer indent)
    }

    class AstBuilder {
        +Node visitCreateVectorIndex(SqlBaseParser_CreateVectorIndexContext context)
    }

    class StatementAnalyzer_Visitor {
        +Scope visitCreateVectorIndex(CreateVectorIndex node, Optional~Scope~ scope)
    }

    class Analysis {
        -Optional~CreateVectorIndexAnalysis~ createVectorIndexAnalysis
        +void setCreateVectorIndexAnalysis(CreateVectorIndexAnalysis analysis)
        +Optional~CreateVectorIndexAnalysis~ getCreateVectorIndexAnalysis()
    }

    class CreateVectorIndexAnalysis {
        +QualifiedObjectName sourceTableName
        +QualifiedObjectName targetTableName
        +List~Identifier~ columns
        +Map~String,Expression~ properties
        +Optional~Expression~ updatingFor
        +CreateVectorIndexAnalysis(QualifiedObjectName sourceTableName, QualifiedObjectName targetTableName, List~Identifier~ columns, Map~String,Expression~ properties, Optional~Expression~ updatingFor)
        +QualifiedObjectName getSourceTableName()
        +QualifiedObjectName getTargetTableName()
        +List~Identifier~ getColumns()
        +Map~String,Expression~ getProperties()
        +Optional~Expression~ getUpdatingFor()
    }

    class StatementUtils {
        -Map~Class,QueryType~ QUERY_TYPES
    }

    Statement <|-- CreateVectorIndex
    AstVisitor <|-- DefaultTraversalVisitor

    CreateVectorIndex ..> Identifier
    CreateVectorIndex ..> QualifiedName
    CreateVectorIndex ..> Expression
    CreateVectorIndex ..> Property

    AstBuilder ..> CreateVectorIndex
    SqlFormatter ..> CreateVectorIndex
    DefaultTraversalVisitor ..> CreateVectorIndex
    StatementAnalyzer_Visitor ..> CreateVectorIndex

    Analysis o-- CreateVectorIndexAnalysis

    StatementUtils ..> CreateVectorIndex
Loading

Class diagram for new vector index writer and connector SPI APIs

classDiagram
    class WriterTarget {
        <<abstract>>
        +ConnectorId getConnectorId()
        +SchemaTableName getSchemaTableName()
        +Optional~List~OutputColumnMetadata~~ getOutputColumns()
    }

    class CreateVectorIndexReference {
        +ConnectorId connectorId
        +ConnectorTableMetadata tableMetadata
        +Optional~NewTableLayout~ layout
        +Optional~List~OutputColumnMetadata~~ columns
        +SchemaTableName sourceTableName
        +CreateVectorIndexReference(ConnectorId connectorId, ConnectorTableMetadata tableMetadata, Optional~NewTableLayout~ layout, Optional~List~OutputColumnMetadata~~ columns, SchemaTableName sourceTableName)
        +ConnectorId getConnectorId()
        +ConnectorTableMetadata getTableMetadata()
        +Optional~NewTableLayout~ getLayout()
        +SchemaTableName getSchemaTableName()
        +Optional~List~OutputColumnMetadata~~ getOutputColumns()
        +SchemaTableName getSourceTableName()
        +String toString()
    }

    class ConnectorMetadata {
        +ConnectorOutputTableHandle beginCreateTable(ConnectorSession session, ConnectorTableMetadata tableMetadata, Optional~ConnectorNewTableLayout~ layout)
        +Optional~ConnectorOutputMetadata~ finishCreateTable(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
        +ConnectorOutputTableHandle beginCreateVectorIndex(ConnectorSession session, ConnectorTableMetadata indexMetadata, Optional~ConnectorNewTableLayout~ layout, SchemaTableName sourceTableName)
        +Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
    }

    class ClassLoaderSafeConnectorMetadata {
        -ConnectorMetadata delegate
        +ConnectorOutputTableHandle beginCreateTable(ConnectorSession session, ConnectorTableMetadata tableMetadata, Optional~ConnectorNewTableLayout~ layout)
        +Optional~ConnectorOutputMetadata~ finishCreateTable(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
        +ConnectorOutputTableHandle beginCreateVectorIndex(ConnectorSession session, ConnectorTableMetadata indexMetadata, Optional~ConnectorNewTableLayout~ layout, SchemaTableName sourceTableName)
        +Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
    }

    WriterTarget <|-- CreateVectorIndexReference

    ClassLoaderSafeConnectorMetadata ..|> ConnectorMetadata
    ClassLoaderSafeConnectorMetadata o-- ConnectorMetadata

    CreateVectorIndexReference ..> ConnectorId
    CreateVectorIndexReference ..> ConnectorTableMetadata
    CreateVectorIndexReference ..> NewTableLayout
    CreateVectorIndexReference ..> OutputColumnMetadata
    CreateVectorIndexReference ..> SchemaTableName

    ConnectorMetadata ..> ConnectorSession
    ConnectorMetadata ..> ConnectorTableMetadata
    ConnectorMetadata ..> ConnectorNewTableLayout
    ConnectorMetadata ..> SchemaTableName
Loading

Class diagram for CreateVectorIndexAggregation planning function

classDiagram
    class CreateVectorIndexAggregation {
        <<final>>
        -CreateVectorIndexAggregation()
        +void inputRealArray(SliceState state, Block embedding)
        +void inputDoubleArray(SliceState state, Block embedding)
        +void inputRealArrayIntId(SliceState state, Block embedding, long id)
        +void inputRealArrayBigintId(SliceState state, Block embedding, long id)
        +void inputRealArrayVarcharId(SliceState state, Block embedding, Slice id)
        +void inputDoubleArrayIntId(SliceState state, Block embedding, long id)
        +void inputDoubleArrayBigintId(SliceState state, Block embedding, long id)
        +void inputDoubleArrayVarcharId(SliceState state, Block embedding, Slice id)
        +void combine(SliceState state, SliceState otherState)
        +void output(SliceState state, BlockBuilder out)
    }

    class SliceState {
    }

    class Block {
    }

    class BlockBuilder {
    }

    class Slice {
    }

    class BuiltInTypeAndFunctionNamespaceManager {
        -List~SqlFunction~ getBuiltInFunctions(FunctionsConfig functionsConfig)
    }

    CreateVectorIndexAggregation ..> SliceState
    CreateVectorIndexAggregation ..> Block
    CreateVectorIndexAggregation ..> BlockBuilder
    CreateVectorIndexAggregation ..> Slice

    BuiltInTypeAndFunctionNamespaceManager ..> CreateVectorIndexAggregation
Loading

File-Level Changes

Change Details Files
Introduce CreateVectorIndex AST node and wire it through parsing, visiting, and SQL formatting.
  • Add CreateVectorIndex statement class with index name, table name, column list, optional updating-for predicate, and properties.
  • Extend AstBuilder to build CreateVectorIndex from the new CREATE VECTOR INDEX grammar rule.
  • Update AstVisitor and DefaultTraversalVisitor to support visiting CreateVectorIndex nodes and their children.
  • Add SqlFormatter support to pretty-print CREATE VECTOR INDEX statements with WITH properties and UPDATING FOR clauses.
presto-parser/src/main/java/com/facebook/presto/sql/tree/CreateVectorIndex.java
presto-parser/src/main/java/com/facebook/presto/sql/parser/AstBuilder.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/AstVisitor.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/DefaultTraversalVisitor.java
presto-parser/src/main/java/com/facebook/presto/sql/SqlFormatter.java
Extend SQL grammar and error handling for CREATE VECTOR INDEX and related keywords.
  • Add CREATE VECTOR INDEX production with column list, optional WITH properties, and optional UPDATING FOR boolean expression.
  • Allow trailing comma in WITH properties list to support vector index syntax.
  • Register INDEX, VECTOR, and UPDATING as non-reserved keywords and define their tokens.
  • Adjust parser error expectation strings to include VECTOR where applicable.
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParserErrorHandling.java
Add analyzer support to validate CREATE VECTOR INDEX and capture analysis metadata.
  • Extend Analysis to carry an Optional with source/target table names, columns, properties, and optional updating-for expression.
  • Implement StatementAnalyzer.visitCreateVectorIndex to resolve source/target tables, enforce existence/non-existence checks, validate referenced columns, validate properties, and register access control checks.
  • Map CreateVectorIndex to QueryType.INSERT for statement classification.
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/Analysis.java
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
presto-analyzer/src/main/java/com/facebook/presto/sql/analyzer/utils/StatementUtils.java
Introduce SPI and planning support for creating vector indexes, including a new WriterTarget and connector metadata hooks.
  • Add TableWriterNode.CreateVectorIndexReference WriterTarget with connector id, table metadata, optional layout, optional output columns, and source table name accessors.
  • Extend ConnectorMetadata with beginCreateVectorIndex and finishCreateVectorIndex default methods that throw NOT_SUPPORTED by default.
  • Implement classloader-safe delegation for the new vector index methods in ClassLoaderSafeConnectorMetadata.
presto-spi/src/main/java/com/facebook/presto/spi/plan/TableWriterNode.java
presto-spi/src/main/java/com/facebook/presto/spi/connector/ConnectorMetadata.java
presto-spi/src/main/java/com/facebook/presto/spi/connector/classloader/ClassLoaderSafeConnectorMetadata.java
Add a dummy create_vector_index aggregation function used only during planning of vector index creation.
  • Introduce CreateVectorIndexAggregation with multiple @inputfunction overloads for array(real) and array(double) embeddings with optional integer, bigint, or varchar IDs.
  • Provide no-op input/combine methods and a VARCHAR output that always writes an empty string, since execution is bypassed by connector optimizations.
  • Register CreateVectorIndexAggregation as a built-in aggregation function.
presto-main-base/src/main/java/com/facebook/presto/operator/aggregation/CreateVectorIndexAggregation.java
presto-main-base/src/main/java/com/facebook/presto/metadata/BuiltInTypeAndFunctionNamespaceManager.java
Add parser tests for CREATE VECTOR INDEX covering positive and negative syntax cases.
  • Add TestSqlParser.testCreateVectorIndex with cases for basic syntax, single/multiple columns, qualified table names, WITH properties (including trailing comma), UPDATING FOR predicates, and combined clauses.
  • Add negative tests verifying appropriate parse errors for missing keywords, identifiers, columns, and malformed WITH/UPDATING FOR clauses.
presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParser.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Add create_vector_index function signature

Differential Revision: D95341384
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The CreateVectorIndex grammar rule only allows at most two columns due to identifier (',' identifier)?; this should likely be identifier (',' identifier)* to support the multiple-column examples covered by the tests.
  • In CreateVectorIndex/DefaultTraversalVisitor.visitCreateVectorIndex, the tableName is never added to getChildren() or traversed, which can break visitors and tooling that rely on full AST traversal; consider including tableName alongside indexName and columns.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `CreateVectorIndex` grammar rule only allows at most two columns due to `identifier (',' identifier)?`; this should likely be `identifier (',' identifier)*` to support the multiple-column examples covered by the tests.
- In `CreateVectorIndex`/`DefaultTraversalVisitor.visitCreateVectorIndex`, the `tableName` is never added to `getChildren()` or traversed, which can break visitors and tooling that rely on full AST traversal; consider including `tableName` alongside `indexName` and `columns`.

## Individual Comments

### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java" line_range="1148-1157" />
<code_context>
+        protected Scope visitCreateVectorIndex(CreateVectorIndex node, Optional<Scope> scope)
</code_context>
<issue_to_address>
**🚨 issue (security):** Source table is used without a corresponding access-control check

This logic only validates the source table and columns while registering an access-control check (TABLE_CREATE) on the destination index. Because the index is derived from source table data, it should also register a TABLE_SELECT (or equivalent) check on the source table, consistent with CTAS/INSERT-from-select, to prevent users from materializing data they are not allowed to read directly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

Codenotify: Notifying subscribers in CODENOTIFY files for diff 5022f6b...59ebef9.

No notifications.

@steveburnett
Copy link
Copy Markdown
Contributor

Please add a release note entry following the Release Notes Guidelines.

skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Add create_vector_index function signature

Differential Revision: D95341384
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Add create_vector_index function signature

Differential Revision: D95341384
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary:

Add create_vector_index function signature

Differential Revision: D95341384
@skyelves skyelves force-pushed the export-D95341384 branch 3 times, most recently from fea4677 to 192dff1 Compare March 11, 2026 02:17
@meta-codesync meta-codesync bot changed the title feat: Add create_vector_index function signature feat: Add create_vector_index function signature (#27264) Mar 13, 2026
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D95341384
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D95341384
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D95341384
@skyelves skyelves force-pushed the export-D95341384 branch 2 times, most recently from 1c53d20 to a76ce0d Compare March 13, 2026 18:10
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:
Pull Request resolved: prestodb#27264

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D95341384
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Mar 13, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:
Pull Request resolved: prestodb#27264

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D95341384
skyelves added a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D95341384
skyelves pushed a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:
Pull Request resolved: prestodb#27264

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D95341384
skyelves added a commit to skyelves/presto that referenced this pull request Mar 16, 2026
Summary:

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
Please follow release notes guidelines and fill in the release notes below.
```
  == RELEASE NOTES ==
  General Changes
  * Add support for create-vector-index statement, which creates
    vector search indexes on table columns with configurable index properties
    and partition filtering via an ``UPDATING FOR`` clause.
```

Differential Revision: D95341384
@gggrace14 gggrace14 self-requested a review March 16, 2026 23:05
gggrace14
gggrace14 previously approved these changes Mar 16, 2026
skyelves added a commit to skyelves/presto that referenced this pull request Mar 17, 2026
Summary:

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
Please follow release notes guidelines and fill in the release notes below.
```
  == RELEASE NOTES ==
  General Changes
  * Add support for create-vector-index statement, which creates
    vector search indexes on table columns with configurable index properties
    and partition filtering via an ``UPDATING FOR`` clause.
```

Differential Revision: D95341384
@skyelves skyelves force-pushed the export-D95341384 branch 3 times, most recently from e0ce58f to 5615745 Compare March 18, 2026 17:24
skyelves added a commit to skyelves/presto that referenced this pull request Mar 18, 2026
Summary:

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D95341384
skyelves pushed a commit to skyelves/presto that referenced this pull request Mar 18, 2026
Summary:
Pull Request resolved: prestodb#27264

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D95341384
@skyelves skyelves force-pushed the export-D95341384 branch 2 times, most recently from 390d229 to 1b5a709 Compare March 19, 2026 00:57
Summary:
## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

**StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.**
**This results in a structured CreateVectorIndexAnalysis object.**

3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
```
== NO RELEASE NOTE ==
```



Differential Revision: D91524358

Pulled By: skyelves
…estodb#27261)

Summary:

Add dedicated WriterTarget subclass and ConnectorMetadata SPI for
CREATE VECTOR INDEX, enabling each connector to implement vector index
creation independently.

- CreateVectorIndexReference: plan-time target carrying index metadata
  and source table reference
- beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to
  NOT_SUPPORTED so connectors must opt in
- ClassLoaderSafeConnectorMetadata: delegation wrappers


## Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D95325176
Summary:

Add create_vector_index function signature

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:

SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.


## Release Notes
```
== NO RELEASE NOTE ==
```

Differential Revision: D95341384
@NivinCS
Copy link
Copy Markdown
Contributor

NivinCS commented Mar 19, 2026

@skyelves, could you please update the PR to include only the changes related to the create_vector_index function signature? The other changes already have separate PRs open.

@skyelves skyelves closed this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants