feat: Add CreateVectorIndexReference WriterTarget and SPI methods (#27261)#27261
feat: Add CreateVectorIndexReference WriterTarget and SPI methods (#27261)#27261skyelves wants to merge 2 commits intoprestodb:masterfrom
Conversation
Reviewer's GuideIntroduces a new CREATE VECTOR INDEX SQL statement end-to-end (parser, AST, analyzer, formatter, and tests) plus a new TableWriterNode.CreateVectorIndexReference and connector SPI hooks so connectors can implement vector index creation with their own metadata and layouts. Sequence diagram for CREATE VECTOR INDEX execution pathsequenceDiagram
actor User
participant Frontend as PrestoFrontend
participant Parser as SqlParser
participant Analyzer as StatementAnalyzer
participant Planner as PlanBuilder
participant Exec as ExecutionEngine
participant Meta as ConnectorMetadata
User->>Frontend: submit "CREATE VECTOR INDEX ..."
Frontend->>Parser: parse SQL
Parser-->>Frontend: CreateVectorIndex AST
Frontend->>Analyzer: analyze(CreateVectorIndex)
Analyzer->>Analyzer: validate source and target tables
Analyzer->>Analyzer: validate columns exist in source table
Analyzer->>Analyzer: validate properties
Analyzer->>Frontend: setCreateVectorIndexAnalysis(CreateVectorIndexAnalysis)
Analyzer-->>Frontend: analyzed statement
Frontend->>Planner: build plan
Planner->>Planner: create TableWriterNode
Planner->>Planner: create CreateVectorIndexReference
Planner-->>Frontend: physical plan
Frontend->>Exec: schedule query(plan)
loop for each connector fragment
Exec->>Meta: beginCreateVectorIndex(session, indexMetadata, layout, sourceTableName)
Meta-->>Exec: ConnectorOutputTableHandle
Exec->>Meta: finishCreateVectorIndex(session, tableHandle, fragments, statistics)
Meta-->>Exec: Optional ConnectorOutputMetadata
end
Exec-->>Frontend: index creation completed
Frontend-->>User: success response
Class diagram for new CREATE VECTOR INDEX types and SPI hooksclassDiagram
class Statement
class CreateVectorIndex {
+CreateVectorIndex(Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
+CreateVectorIndex(NodeLocation location, Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
+Identifier getIndexName()
+QualifiedName getTableName()
+List~Identifier~ getColumns()
+Optional~Expression~ getUpdatingFor()
+List~Property~ getProperties()
}
CreateVectorIndex --|> Statement
class Analysis {
-Optional~CreateVectorIndexAnalysis~ createVectorIndexAnalysis
+void setCreateVectorIndexAnalysis(CreateVectorIndexAnalysis analysis)
+Optional~CreateVectorIndexAnalysis~ getCreateVectorIndexAnalysis()
}
class CreateVectorIndexAnalysis {
+CreateVectorIndexAnalysis(QualifiedObjectName sourceTableName, QualifiedObjectName targetTableName, List~Identifier~ columns, Map~String, Expression~ properties, Optional~Expression~ updatingFor)
+QualifiedObjectName getSourceTableName()
+QualifiedObjectName getTargetTableName()
+List~Identifier~ getColumns()
+Map~String, Expression~ getProperties()
+Optional~Expression~ getUpdatingFor()
}
Analysis *-- CreateVectorIndexAnalysis
class WriterTarget {
<<abstract>>
+ConnectorId getConnectorId()
+SchemaTableName getSchemaTableName()
+Optional~List~OutputColumnMetadata~~ getOutputColumns()
}
class CreateVectorIndexReference {
-ConnectorId connectorId
-ConnectorTableMetadata tableMetadata
-Optional~NewTableLayout~ layout
-Optional~List~OutputColumnMetadata~~ columns
-SchemaTableName sourceTableName
+CreateVectorIndexReference(ConnectorId connectorId, ConnectorTableMetadata tableMetadata, Optional~NewTableLayout~ layout, Optional~List~OutputColumnMetadata~~ columns, SchemaTableName sourceTableName)
+ConnectorId getConnectorId()
+ConnectorTableMetadata getTableMetadata()
+Optional~NewTableLayout~ getLayout()
+SchemaTableName getSchemaTableName()
+Optional~List~OutputColumnMetadata~~ getOutputColumns()
+SchemaTableName getSourceTableName()
}
CreateVectorIndexReference --|> WriterTarget
class ConnectorMetadata {
<<interface>>
+ConnectorOutputTableHandle beginCreateVectorIndex(ConnectorSession session, ConnectorTableMetadata indexMetadata, Optional~ConnectorNewTableLayout~ layout, SchemaTableName sourceTableName)
+Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
}
class ClassLoaderSafeConnectorMetadata {
-ConnectorMetadata delegate
-ClassLoader classLoader
+ConnectorOutputTableHandle beginCreateVectorIndex(ConnectorSession session, ConnectorTableMetadata indexMetadata, Optional~ConnectorNewTableLayout~ layout, SchemaTableName sourceTableName)
+Optional~ConnectorOutputMetadata~ finishCreateVectorIndex(ConnectorSession session, ConnectorOutputTableHandle tableHandle, Collection~Slice~ fragments, Collection~ComputedStatistics~ computedStatistics)
}
ClassLoaderSafeConnectorMetadata ..|> ConnectorMetadata
ClassLoaderSafeConnectorMetadata o-- ConnectorMetadata
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 5022f6b...37b2e82. No notifications. |
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The
CREATE VECTOR INDEXgrammar rule currently restricts the column list to at most two identifiers (identifier (',' identifier)?), which conflicts with the tests that use three columns and beyond; this should be changed to allow an arbitrary number of columns (e.g.,identifier (',' identifier)*). - The grammar for
CREATE VECTOR INDEXonly supportsON TABLE qualifiedName, but the parser tests expect bothON t(a, b)andON TABLE t(a, b)forms; consider updating the rule to makeTABLEoptional so the parser and tests are aligned. - In
DefaultTraversalVisitor.visitCreateVectorIndex, thetableNameis not traversed (onlyindexName,columns,updatingFor, andpropertiesare processed); if any visitors rely on visiting the table name, you should addprocess(node.getTableName(), context)for consistency with other statement visitors.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `CREATE VECTOR INDEX` grammar rule currently restricts the column list to at most two identifiers (`identifier (',' identifier)?`), which conflicts with the tests that use three columns and beyond; this should be changed to allow an arbitrary number of columns (e.g., `identifier (',' identifier)*`).
- The grammar for `CREATE VECTOR INDEX` only supports `ON TABLE qualifiedName`, but the parser tests expect both `ON t(a, b)` and `ON TABLE t(a, b)` forms; consider updating the rule to make `TABLE` optional so the parser and tests are aligned.
- In `DefaultTraversalVisitor.visitCreateVectorIndex`, the `tableName` is not traversed (only `indexName`, `columns`, `updatingFor`, and `properties` are processed); if any visitors rely on visiting the table name, you should add `process(node.getTableName(), context)` for consistency with other statement visitors.
## Individual Comments
### Comment 1
<location path="presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4" line_range="172-173" />
<code_context>
| MERGE INTO qualifiedName (AS? identifier)?
- USING relation ON expression mergeCase+ #mergeInto
+ USING relation ON expression mergeCase+ #mergeInto
+ | CREATE VECTOR INDEX identifier ON TABLE qualifiedName
+ '(' identifier (',' identifier)? ')'
+ (WITH properties)?
+ (UPDATING FOR booleanExpression)? #createVectorIndex
</code_context>
<issue_to_address>
**issue (bug_risk):** Column list grammar only allows at most two columns instead of an arbitrary list
This grammar only parses one or two identifiers, but the rest of the pipeline treats `columns` as an arbitrary list. To avoid rejecting valid statements with more than two columns, update it to allow a repeated list, e.g. ` '(' identifier (',' identifier)* ')'`.
</issue_to_address>
### Comment 2
<location path="presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java" line_range="1173-1175" />
<code_context>
+
+ Map<String, Expression> allProperties = mapFromProperties(node.getProperties());
+
+ analysis.addAccessControlCheckForTable(TABLE_CREATE,
+ new AccessControlInfoForTable(accessControl, session.getIdentity(),
+ session.getTransactionId(), session.getAccessControlContext(), targetTable));
+
+ analysis.setCreateVectorIndexAnalysis(new Analysis.CreateVectorIndexAnalysis(
</code_context>
<issue_to_address>
**🚨 suggestion (security):** Access control covers target table creation but not reading from the source table
You’ve added a `TABLE_CREATE` check on the target, but there’s no corresponding check for reading the source table, even though the index build will scan it. To match other DDL+data operations (CTAS, refresh materialized view), consider adding a source-table access control check (e.g., `SELECT` on `sourceTableName`).
Suggested implementation:
```java
Map<String, Expression> allProperties = mapFromProperties(node.getProperties());
analysis.addAccessControlCheckForTable(TABLE_SELECT,
new AccessControlInfoForTable(accessControl, session.getIdentity(),
session.getTransactionId(), session.getAccessControlContext(), sourceTableName));
analysis.addAccessControlCheckForTable(TABLE_CREATE,
new AccessControlInfoForTable(accessControl, session.getIdentity(),
session.getTransactionId(), session.getAccessControlContext(), targetTable));
analysis.setCreateVectorIndexAnalysis(new Analysis.CreateVectorIndexAnalysis(
```
1. Ensure that `TABLE_SELECT` is imported or statically referenced in this file in the same way as `TABLE_CREATE`. If it is not yet imported, add the appropriate import or static import alongside the existing `TABLE_CREATE` import.
2. Confirm that `sourceTableName` is the correct object to use for access control in this context (it should match the type used elsewhere for table access checks, typically the same type as `targetTable` or other `SELECT` checks in this class). If a different variable (e.g., `sourceTable`) is used for table access checks elsewhere in the `StatementAnalyzer`, replace `sourceTableName` with that variable for consistency.
</issue_to_address>
### Comment 3
<location path="presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParser.java" line_range="1384" />
<code_context>
}
+ @Test
+ public void testCreateVectorIndex()
+ {
+ // Basic CREATE VECTOR INDEX
</code_context>
<issue_to_address>
**suggestion (testing):** Add formatter/round‑trip tests for CREATE VECTOR INDEX to ensure AST ↔ SQL consistency, especially around optional TABLE, WITH, and UPDATING FOR clauses
The parser tests cover valid/invalid forms well, but nothing exercises SqlFormatter’s new visitCreateVectorIndex implementation. Please add round‑trip/formatting tests (like the existing CREATE TABLE ones) to verify:
- `CREATE VECTOR INDEX idx ON t(a, b)` is formatted as expected (the formatter always prints `ON TABLE`)
- Properties are emitted with correct commas/spacing
- UPDATING FOR expressions (equality, BETWEEN, complex boolean) format correctly
- Formatting is stable under parse → format → parse cycles
This will catch divergences between parser and formatter or syntactically invalid formatted SQL.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| | CREATE VECTOR INDEX identifier ON TABLE qualifiedName | ||
| '(' identifier (',' identifier)? ')' |
There was a problem hiding this comment.
issue (bug_risk): Column list grammar only allows at most two columns instead of an arbitrary list
This grammar only parses one or two identifiers, but the rest of the pipeline treats columns as an arbitrary list. To avoid rejecting valid statements with more than two columns, update it to allow a repeated list, e.g. '(' identifier (',' identifier)* ')'.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testCreateVectorIndex() |
There was a problem hiding this comment.
suggestion (testing): Add formatter/round‑trip tests for CREATE VECTOR INDEX to ensure AST ↔ SQL consistency, especially around optional TABLE, WITH, and UPDATING FOR clauses
The parser tests cover valid/invalid forms well, but nothing exercises SqlFormatter’s new visitCreateVectorIndex implementation. Please add round‑trip/formatting tests (like the existing CREATE TABLE ones) to verify:
CREATE VECTOR INDEX idx ON t(a, b)is formatted as expected (the formatter always printsON TABLE)- Properties are emitted with correct commas/spacing
- UPDATING FOR expressions (equality, BETWEEN, complex boolean) format correctly
- Formatting is stable under parse → format → parse cycles
This will catch divergences between parser and formatter or syntactically invalid formatted SQL.
7cf44a9 to
fe33047
Compare
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
1679e01 to
54201c0
Compare
|
Please add a release note entry following the Release Notes Guidelines. |
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
3d1b664 to
4277277
Compare
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
4277277 to
e3f6c2c
Compare
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
e3f6c2c to
406fc6a
Compare
…estodb#27261) Summary: Pull Request resolved: prestodb#27261 Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
406fc6a to
c0a4448
Compare
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers Differential Revision: D95325176
c0a4448 to
9d30102
Compare
|
looks like that this PR has some conflict/overlap with #27307 ? |
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers ## Release Notes Please follow release notes guidelines and fill in the release notes below. ``` == RELEASE NOTES == General Changes * Add support for create-vector-index statement, which creates vector search indexes on table columns with configurable index properties and partition filtering via an ``UPDATING FOR`` clause. ``` Differential Revision: D95325176
9d30102 to
f438830
Compare
Thanks, I just rebased |
ac616af to
77c9555
Compare
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D95325176
72016b4 to
196aaaf
Compare
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: **StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.** **This results in a structured CreateVectorIndexAnalysis object.** 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D91524358 Pulled By: skyelves
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D95325176
196aaaf to
37b2e82
Compare
|
@skyelves , Could you please update the PR to include only the changes related to CreateVectorIndexReference WriterTarget and the SPI methods? It looks like the analysis support code changes are also included in this PR, which seems to be a duplication since we already have a separate PR for those changes. |
Summary:
Add dedicated WriterTarget subclass and ConnectorMetadata SPI for
CREATE VECTOR INDEX, enabling each connector to implement vector index
creation independently.
and source table reference
NOT_SUPPORTED so connectors must opt in
Release Notes
Differential Revision: D95325176