feat: Add create_vector_index function signature (#27264)#27264
Closed
skyelves wants to merge 3 commits intoprestodb:masterfrom
Closed
feat: Add create_vector_index function signature (#27264)#27264skyelves wants to merge 3 commits intoprestodb:masterfrom
skyelves wants to merge 3 commits intoprestodb:masterfrom
Conversation
Contributor
Reviewer's GuideAdds end-to-end support for a new CREATE VECTOR INDEX SQL statement, including parser/AST, analyzer, planning hooks, SPI connector APIs, and a dummy aggregation function used only for planning, plus tests and error handling updates. Sequence diagram for CREATE VECTOR INDEX statement lifecyclesequenceDiagram
actor User
participant Coordinator as PrestoCoordinator
participant Parser as SqlParser_AstBuilder
participant Analyzer as StatementAnalyzer
participant Analysis as Analysis
participant Planner as Planner_TableWriter
participant Meta as ConnectorMetadata
participant Connector as ConnectorImpl
participant CreateVectorIndexAggregation
User->>Coordinator: submit SQL
Coordinator->>Parser: parse CREATE VECTOR INDEX
Parser-->>Coordinator: CreateVectorIndex AST
Coordinator->>Analyzer: analyze(CreateVectorIndex)
Analyzer->>Meta: tableExists(sourceTableName)
Meta-->>Analyzer: boolean
Analyzer->>Meta: tableExists(targetTableName)
Meta-->>Analyzer: boolean
Analyzer->>Meta: getTableHandle(sourceTableName)
Meta-->>Analyzer: TableHandle
Analyzer->>Meta: getColumnHandles(TableHandle)
Meta-->>Analyzer: Map columnHandles
Analyzer->>Analyzer: validate columns and properties
Analyzer->>Analysis: setCreateVectorIndexAnalysis(CreateVectorIndexAnalysis)
Analyzer-->>Coordinator: Scope with VARCHAR result
Coordinator->>Planner: plan(CreateVectorIndexAnalysis)
Planner->>Meta: beginCreateVectorIndex(session, indexMetadata, layout, sourceTableName)
Meta->>Connector: beginCreateVectorIndex(session, indexMetadata, layout, sourceTableName)
Connector-->>Meta: ConnectorOutputTableHandle
Meta-->>Planner: ConnectorOutputTableHandle
loop execute task writers
Planner->>CreateVectorIndexAggregation: aggregate embeddings and ids
end
Planner->>Meta: finishCreateVectorIndex(session, handle, fragments, stats)
Meta->>Connector: finishCreateVectorIndex(session, handle, fragments, stats)
Connector-->>Meta: Optional ConnectorOutputMetadata
Meta-->>Planner: Optional ConnectorOutputMetadata
Planner-->>Coordinator: completed plan execution
Coordinator-->>User: success response
Class diagram for new CreateVectorIndex AST and analysis typesclassDiagram
class Statement {
}
class CreateVectorIndex {
+Identifier indexName
+QualifiedName tableName
+List~Identifier~ columns
+Optional~Expression~ updatingFor
+List~Property~ properties
+CreateVectorIndex(Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
+CreateVectorIndex(NodeLocation location, Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
+Identifier getIndexName()
+QualifiedName getTableName()
+List~Identifier~ getColumns()
+Optional~Expression~ getUpdatingFor()
+List~Property~ getProperties()
+<R,C> R accept(AstVisitor~R,C~ visitor, C context)
+List~Node~ getChildren()
}
class AstVisitor~R,C~ {
+R visitCreateVectorIndex(CreateVectorIndex node, C context)
}
class DefaultTraversalVisitor~R,C~ {
+R visitCreateVectorIndex(CreateVectorIndex node, C context)
}
class SqlFormatter {
+Void visitCreateVectorIndex(CreateVectorIndex node, Integer indent)
}
class AstBuilder {
+Node visitCreateVectorIndex(SqlBaseParser_CreateVectorIndexContext context)
}
class StatementAnalyzer_Visitor {
+Scope visitCreateVectorIndex(CreateVectorIndex node, Optional~Scope~ scope)
}
class Analysis {
-Optional~CreateVectorIndexAnalysis~ createVectorIndexAnalysis
+void setCreateVectorIndexAnalysis(CreateVectorIndexAnalysis analysis)
+Optional~CreateVectorIndexAnalysis~ getCreateVectorIndexAnalysis()
}
class CreateVectorIndexAnalysis {
+QualifiedObjectName sourceTableName
+QualifiedObjectName targetTableName
+List~Identifier~ columns
+Map~String,Expression~ properties
+Optional~Expression~ updatingFor
+CreateVectorIndexAnalysis(QualifiedObjectName sourceTableName, QualifiedObjectName targetTableName, List~Identifier~ columns, Map~String,Expression~ properties, Optional~Expression~ updatingFor)
+QualifiedObjectName getSourceTableName()
+QualifiedObjectName getTargetTableName()
+List~Identifier~ getColumns()
+Map~String,Expression~ getProperties()
+Optional~Expression~ getUpdatingFor()
}
class StatementUtils {
-Map~Class,QueryType~ QUERY_TYPES
}
Statement <|-- CreateVectorIndex
AstVisitor <|-- DefaultTraversalVisitor
CreateVectorIndex ..> Identifier
CreateVectorIndex ..> QualifiedName
CreateVectorIndex ..> Expression
CreateVectorIndex ..> Property
AstBuilder ..> CreateVectorIndex
SqlFormatter ..> CreateVectorIndex
DefaultTraversalVisitor ..> CreateVectorIndex
StatementAnalyzer_Visitor ..> CreateVectorIndex
Analysis o-- CreateVectorIndexAnalysis
StatementUtils ..> CreateVectorIndex
Class diagram for new vector index writer and connector SPI APIsclassDiagram
class WriterTarget {
<<abstract>>
+ConnectorId getConnectorId()
+SchemaTableName getSchemaTableName()
+Optional~List~OutputColumnMetadata~~ getOutputColumns()
}
class CreateVectorIndexReference {
+ConnectorId connectorId
+ConnectorTableMetadata tableMetadata
+Optional~NewTableLayout~ layout
+Optional~List~OutputColumnMetadata~~ columns
+SchemaTableName sourceTableName
+CreateVectorIndexReference(ConnectorId connectorId, ConnectorTableMetadata tableMetadata, Optional~NewTableLayout~ layout, Optional~List~OutputColumnMetadata~~ columns, SchemaTableName sourceTableName)
+ConnectorId getConnectorId()
+ConnectorTableMetadata getTableMetadata()
+Optional~NewTableLayout~ getLayout()
+SchemaTableName getSchemaTableName()
+Optional~List~OutputColumnMetadata~~ getOutputColumns()
+SchemaTableName getSourceTableName()
+String toString()
}
class ConnectorMetadata {
+ConnectorOutputTableHandle beginCreateTable(ConnectorSession session, ConnectorTableMetadata tableMetadata, Optional~ConnectorNewTableLayout~ layout)
+Optional~ConnectorOutputMetadata~ finishCreateTable(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
+ConnectorOutputTableHandle beginCreateVectorIndex(ConnectorSession session, ConnectorTableMetadata indexMetadata, Optional~ConnectorNewTableLayout~ layout, SchemaTableName sourceTableName)
+Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
}
class ClassLoaderSafeConnectorMetadata {
-ConnectorMetadata delegate
+ConnectorOutputTableHandle beginCreateTable(ConnectorSession session, ConnectorTableMetadata tableMetadata, Optional~ConnectorNewTableLayout~ layout)
+Optional~ConnectorOutputMetadata~ finishCreateTable(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
+ConnectorOutputTableHandle beginCreateVectorIndex(ConnectorSession session, ConnectorTableMetadata indexMetadata, Optional~ConnectorNewTableLayout~ layout, SchemaTableName sourceTableName)
+Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
}
WriterTarget <|-- CreateVectorIndexReference
ClassLoaderSafeConnectorMetadata ..|> ConnectorMetadata
ClassLoaderSafeConnectorMetadata o-- ConnectorMetadata
CreateVectorIndexReference ..> ConnectorId
CreateVectorIndexReference ..> ConnectorTableMetadata
CreateVectorIndexReference ..> NewTableLayout
CreateVectorIndexReference ..> OutputColumnMetadata
CreateVectorIndexReference ..> SchemaTableName
ConnectorMetadata ..> ConnectorSession
ConnectorMetadata ..> ConnectorTableMetadata
ConnectorMetadata ..> ConnectorNewTableLayout
ConnectorMetadata ..> SchemaTableName
Class diagram for CreateVectorIndexAggregation planning functionclassDiagram
class CreateVectorIndexAggregation {
<<final>>
-CreateVectorIndexAggregation()
+void inputRealArray(SliceState state, Block embedding)
+void inputDoubleArray(SliceState state, Block embedding)
+void inputRealArrayIntId(SliceState state, Block embedding, long id)
+void inputRealArrayBigintId(SliceState state, Block embedding, long id)
+void inputRealArrayVarcharId(SliceState state, Block embedding, Slice id)
+void inputDoubleArrayIntId(SliceState state, Block embedding, long id)
+void inputDoubleArrayBigintId(SliceState state, Block embedding, long id)
+void inputDoubleArrayVarcharId(SliceState state, Block embedding, Slice id)
+void combine(SliceState state, SliceState otherState)
+void output(SliceState state, BlockBuilder out)
}
class SliceState {
}
class Block {
}
class BlockBuilder {
}
class Slice {
}
class BuiltInTypeAndFunctionNamespaceManager {
-List~SqlFunction~ getBuiltInFunctions(FunctionsConfig functionsConfig)
}
CreateVectorIndexAggregation ..> SliceState
CreateVectorIndexAggregation ..> Block
CreateVectorIndexAggregation ..> BlockBuilder
CreateVectorIndexAggregation ..> Slice
BuiltInTypeAndFunctionNamespaceManager ..> CreateVectorIndexAggregation
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 5, 2026
Summary: Add create_vector_index function signature Differential Revision: D95341384
985dd0d to
31446c9
Compare
Contributor
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The
CreateVectorIndexgrammar rule only allows at most two columns due toidentifier (',' identifier)?; this should likely beidentifier (',' identifier)*to support the multiple-column examples covered by the tests. - In
CreateVectorIndex/DefaultTraversalVisitor.visitCreateVectorIndex, thetableNameis never added togetChildren()or traversed, which can break visitors and tooling that rely on full AST traversal; consider includingtableNamealongsideindexNameandcolumns.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `CreateVectorIndex` grammar rule only allows at most two columns due to `identifier (',' identifier)?`; this should likely be `identifier (',' identifier)*` to support the multiple-column examples covered by the tests.
- In `CreateVectorIndex`/`DefaultTraversalVisitor.visitCreateVectorIndex`, the `tableName` is never added to `getChildren()` or traversed, which can break visitors and tooling that rely on full AST traversal; consider including `tableName` alongside `indexName` and `columns`.
## Individual Comments
### Comment 1
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java" line_range="1148-1157" />
<code_context>
+ protected Scope visitCreateVectorIndex(CreateVectorIndex node, Optional<Scope> scope)
</code_context>
<issue_to_address>
**🚨 issue (security):** Source table is used without a corresponding access-control check
This logic only validates the source table and columns while registering an access-control check (TABLE_CREATE) on the destination index. Because the index is derived from source table data, it should also register a TABLE_SELECT (or equivalent) check on the source table, consistent with CTAS/INSERT-from-select, to prevent users from materializing data they are not allowed to read directly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
Show resolved
Hide resolved
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 5022f6b...59ebef9. No notifications. |
Contributor
|
Please add a release note entry following the Release Notes Guidelines. |
31446c9 to
9dac2bd
Compare
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 5, 2026
Summary: Add create_vector_index function signature Differential Revision: D95341384
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 5, 2026
Summary: Add create_vector_index function signature Differential Revision: D95341384
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 5, 2026
Summary: Add create_vector_index function signature Differential Revision: D95341384
fea4677 to
192dff1
Compare
192dff1 to
d12c60f
Compare
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 13, 2026
Summary: Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D95341384
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 13, 2026
Summary: Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D95341384
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 13, 2026
Summary: Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D95341384
1c53d20 to
a76ce0d
Compare
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 13, 2026
Summary: Pull Request resolved: prestodb#27264 Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D95341384
a76ce0d to
66f7daf
Compare
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 13, 2026
Summary: Pull Request resolved: prestodb#27264 Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D95341384
66f7daf to
a6cc27b
Compare
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 13, 2026
Summary: Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D95341384
skyelves
pushed a commit
to skyelves/presto
that referenced
this pull request
Mar 13, 2026
Summary: Pull Request resolved: prestodb#27264 Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D95341384
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 16, 2026
Summary:
Add create_vector_index function signature
## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:
SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:
StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:
TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):
PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):
TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:
Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.
## Release Notes
Please follow release notes guidelines and fill in the release notes below.
```
== RELEASE NOTES ==
General Changes
* Add support for create-vector-index statement, which creates
vector search indexes on table columns with configurable index properties
and partition filtering via an ``UPDATING FOR`` clause.
```
Differential Revision: D95341384
a6cc27b to
350dea0
Compare
gggrace14
previously approved these changes
Mar 16, 2026
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 17, 2026
Summary:
Add create_vector_index function signature
## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:
SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:
StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:
TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):
PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):
TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:
Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.
## Release Notes
Please follow release notes guidelines and fill in the release notes below.
```
== RELEASE NOTES ==
General Changes
* Add support for create-vector-index statement, which creates
vector search indexes on table columns with configurable index properties
and partition filtering via an ``UPDATING FOR`` clause.
```
Differential Revision: D95341384
e0ce58f to
5615745
Compare
skyelves
added a commit
to skyelves/presto
that referenced
this pull request
Mar 18, 2026
Summary: Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D95341384
skyelves
pushed a commit
to skyelves/presto
that referenced
this pull request
Mar 18, 2026
Summary: Pull Request resolved: prestodb#27264 Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D95341384
390d229 to
1b5a709
Compare
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: **StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.** **This results in a structured CreateVectorIndexAnalysis object.** 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D91524358 Pulled By: skyelves
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D95325176
Summary: Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D95341384
1b5a709 to
59ebef9
Compare
Contributor
|
@skyelves, could you please update the PR to include only the changes related to the create_vector_index function signature? The other changes already have separate PRs open. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Add create_vector_index function signature
High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:
StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:
TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):
PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):
TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:
Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.
Release Notes
Differential Revision: D95341384