feat: Add syntax support for CREATE VECTOR INDEX (#27307)#27307
feat: Add syntax support for CREATE VECTOR INDEX (#27307)#27307skyelves merged 1 commit intoprestodb:masterfrom
Conversation
0999ba9 to
2cc20e8
Compare
Reviewer's GuideAdds full parser, AST, traversal, and formatter support for a new CREATE VECTOR INDEX statement, including grammar changes, a new CreateVectorIndex AST node, visitor wiring, formatting logic, and extensive positive/negative SQL parser tests. Sequence diagram for parsing and formatting CREATE VECTOR INDEXsequenceDiagram
actor User
participant SqlParser
participant SqlBaseParser
participant AstBuilder
participant CreateVectorIndex
participant SqlFormatter
User->>SqlParser: submit SQL
Note over User,SqlFormatter: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
SqlParser->>SqlBaseParser: parse SQL using grammar
SqlBaseParser-->>SqlParser: CreateVectorIndexContext
SqlParser->>AstBuilder: visitCreateVectorIndex(context)
AstBuilder->>AstBuilder: extract indexName, tableName, columns, updatingFor, properties
AstBuilder->>CreateVectorIndex: new CreateVectorIndex(indexName, tableName, columns, updatingFor, properties)
CreateVectorIndex-->>AstBuilder: CreateVectorIndex node
AstBuilder-->>SqlParser: AST CreateVectorIndex
User->>SqlFormatter: format AST CreateVectorIndex
SqlFormatter->>CreateVectorIndex: visitCreateVectorIndex(node)
SqlFormatter->>SqlFormatter: build SQL string
SqlFormatter-->>User: formatted CREATE VECTOR INDEX SQL
Class diagram for new CreateVectorIndex AST integrationclassDiagram
class Statement {
}
class Node {
}
Node <|-- Statement
class Identifier {
}
class QualifiedName {
}
class Expression {
}
class Property {
}
class CreateVectorIndex {
+Identifier indexName
+QualifiedName tableName
+List~Identifier~ columns
+Optional~Expression~ updatingFor
+List~Property~ properties
+CreateVectorIndex(Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
+CreateVectorIndex(NodeLocation location, Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ updatingFor, List~Property~ properties)
+Identifier getIndexName()
+QualifiedName getTableName()
+List~Identifier~ getColumns()
+Optional~Expression~ getUpdatingFor()
+List~Property~ getProperties()
+<R,C> R accept(AstVisitor visitor, C context)
+List~Node~ getChildren()
}
Statement <|-- CreateVectorIndex
Identifier <-- CreateVectorIndex : uses
QualifiedName <-- CreateVectorIndex : uses
Expression <-- CreateVectorIndex : uses
Property <-- CreateVectorIndex : uses
class AstVisitor {
+<R,C> R visitCreateTable(CreateTable node, C context)
+<R,C> R visitCreateVectorIndex(CreateVectorIndex node, C context)
+<R,C> R visitCreateType(CreateType node, C context)
}
class DefaultTraversalVisitor {
+<R,C> R visitCreateTable(CreateTable node, C context)
+<R,C> R visitCreateVectorIndex(CreateVectorIndex node, C context)
+<R,C> R visitStartTransaction(StartTransaction node, C context)
}
AstVisitor <|-- DefaultTraversalVisitor
AstVisitor --> CreateVectorIndex : visitCreateVectorIndex
DefaultTraversalVisitor --> CreateVectorIndex : visitCreateVectorIndex
class SqlBaseParser {
<<parser>>
+CreateVectorIndexContext createVectorIndex()
}
class SqlBaseParser_CreateVectorIndexContext {
}
SqlBaseParser --> SqlBaseParser_CreateVectorIndexContext : produces
class AstBuilder {
+Node visitCreateTable(SqlBaseParser_CreateTableContext context)
+Node visitCreateVectorIndex(SqlBaseParser_CreateVectorIndexContext context)
+Node visitCreateType(SqlBaseParser_CreateTypeContext context)
}
AstBuilder --> CreateVectorIndex : constructs
class SqlFormatter_Formatter {
+Void visitCreateTable(CreateTable node, Integer indent)
+Void visitCreateVectorIndex(CreateVectorIndex node, Integer indent)
+Void visitCreateType(CreateType node, Integer indent)
}
SqlFormatter_Formatter --> CreateVectorIndex : formats
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The
CREATE VECTOR INDEXgrammar currently restricts the column list to at most two identifiers viaidentifier (',' identifier)?; this should likely beidentifier (',' identifier)*to match the AST builder and tests that assume an arbitrary number of columns. - In
DefaultTraversalVisitor.visitCreateVectorIndex, thetableNamefield is never processed, which is inconsistent with other statement visitors and could break visitors that expect to traverse the referenced table; consider callingprocess(node.getTableName(), context).
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `CREATE VECTOR INDEX` grammar currently restricts the column list to at most two identifiers via `identifier (',' identifier)?`; this should likely be `identifier (',' identifier)*` to match the AST builder and tests that assume an arbitrary number of columns.
- In `DefaultTraversalVisitor.visitCreateVectorIndex`, the `tableName` field is never processed, which is inconsistent with other statement visitors and could break visitors that expect to traverse the referenced table; consider calling `process(node.getTableName(), context)`.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
05e5dc5 to
e2b3e23
Compare
Summary: Pull Request resolved: prestodb#27307 Pull Request resolved: prestodb#27027 ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
e2b3e23 to
554121d
Compare
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 1ddef8b...28cc426.
|
554121d to
364ee28
Compare
jja725
left a comment
There was a problem hiding this comment.
Thanks for the contribution! Try to learn from it.
| | CREATE VECTOR INDEX identifier ON qualifiedName | ||
| '(' identifier (',' identifier)? ')' | ||
| (WITH properties)? | ||
| (UPDATING FOR booleanExpression)? #createVectorIndex |
There was a problem hiding this comment.
The RFC defines ON TABLE candidates_table(id, embedding) with two required columns. Should the parser enforce exactly 2 columns? If
single-column is valid (using $row_id for id), this should be documented.
There was a problem hiding this comment.
The RFC defines ON TABLE candidates_table(id, embedding) with two required columns. Should the parser enforce exactly 2 columns? If single-column is valid (using $row_id for id), this should be documented.
Thanks for comments! After discussing several rounds, we decide to proceed with keeping $row_id optional. We will update RFC accordingly.
We want to support either (embedding) or (embedding, row_id)
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
presto-parser/src/main/java/com/facebook/presto/sql/parser/AstBuilder.java
Outdated
Show resolved
Hide resolved
steveburnett
left a comment
There was a problem hiding this comment.
Do we need documentation for CREATE VECTOR INDEX added in https://prestodb.io/docs/current/sql.html ?
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
Summary: Pull Request resolved: prestodb#27307 Pull Request resolved: prestodb#27027 ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
364ee28 to
c258e43
Compare
Summary: Pull Request resolved: prestodb#27307 ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing:** **SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...** **The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.** 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D91385788
yes, created a separate PR https://github.com/prestodb/presto/pull/27332/changes |
Summary:
## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**
**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**
2. Statement Analysis:
StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:
TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):
PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):
TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:
Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.
## Release Notes
Please follow release notes guidelines and fill in the release notes below.
```
== RELEASE NOTES ==
General Changes
* Add support for create-vector-index statement, which creates
vector search indexes on table columns with configurable index properties
and partition filtering via an ``UPDATING FOR`` clause.
```
Differential Revision: D91385788
c258e43 to
28cc426
Compare
|
Please add a release note (or |
Summary:
High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:
TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):
PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):
TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:
Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.
Differential Revision: D91385788
Release Notes