feat: Support vector search in LogicalPlanner (#27169)#27169
feat: Support vector search in LogicalPlanner (#27169)#27169skyelves merged 4 commits intoprestodb:masterfrom
Conversation
Reviewer's GuideAdds full parsing, analysis, and planning support for a new CREATE VECTOR INDEX statement by introducing a corresponding AST node, extending the SQL grammar and formatter, wiring it through analysis as a CTAS-style synthetic query that calls a create_vector_index UDF, and teaching the LogicalPlanner to build plans from that synthetic query, plus associated tests and minor grammar tweaks. Sequence diagram for CREATE VECTOR INDEX planning flowsequenceDiagram
actor User
participant Client
participant SqlParser
participant AstBuilder
participant Analyzer
participant Analysis
participant LogicalPlanner
participant Optimizer
participant ExecutionEngine
participant PythonScript
User->>Client: Submit CREATE VECTOR INDEX statement
Client->>SqlParser: parse(sql)
SqlParser->>AstBuilder: build AST
AstBuilder-->>SqlParser: CreateVectorIndex node
SqlParser-->>Analyzer: CreateVectorIndex statement
Analyzer->>Analysis: setCreateVectorIndexTableName(sourceTable)
Analyzer->>Analyzer: mapFromProperties(node.properties)
Analyzer->>Analyzer: extractStringProperty(index_type, distance_metric, index_options)
Analyzer->>Analyzer: build partitioned_by JSON
Analyzer->>Analyzer: build propertiesJson
Analyzer->>Analyzer: build synthetic Query
Analyzer->>Analysis: setVectorIndexQuery(query)
Analyzer->>Analysis: setCreateTableDestination(targetIndexTable)
Analyzer->>Analysis: setCreateTableProperties(empty)
Analyzer->>Analysis: setCreateTableAsSelectWithData(true)
Analyzer->>Analysis: addAccessControlCheckForTable(TABLE_CREATE,...)
Analyzer-->>Client: Analysis complete
Client->>LogicalPlanner: plan(CreateVectorIndex)
LogicalPlanner->>Analysis: getVectorIndexQuery()
Analysis-->>LogicalPlanner: synthetic Query with create_vector_index UDF
LogicalPlanner->>LogicalPlanner: createTableCreationPlan(analysis, query)
LogicalPlanner->>Optimizer: optimize CTAS plan
Optimizer-->>LogicalPlanner: optimized plan (no table scan)
LogicalPlanner-->>ExecutionEngine: submit execution plan
ExecutionEngine->>PythonScript: invoke create_vector_index(sourceTable, columns, indexType, propertiesJson, targetIndexTable)
PythonScript-->>ExecutionEngine: create index table populated
ExecutionEngine-->>Client: statement completed
Client-->>User: CREATE VECTOR INDEX succeeded
Class diagram for new CreateVectorIndex and related planning changesclassDiagram
class Statement
class CreateVectorIndex {
+CreateVectorIndex(Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ where, List~Property~ properties)
+CreateVectorIndex(NodeLocation location, Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ where, List~Property~ properties)
+Identifier getIndexName()
+QualifiedName getTableName()
+List~Identifier~ getColumns()
+Optional~Expression~ getWhere()
+List~Property~ getProperties()
+<R,C> R accept(AstVisitor~R,C~ visitor, C context)
+List~Node~ getChildren()
}
CreateVectorIndex --|> Statement
class AstVisitor {
+<R,C> R visitCreateVectorIndex(CreateVectorIndex node, C context)
}
AstVisitor <|.. DefaultTraversalVisitor
class DefaultTraversalVisitor {
+Object visitCreateVectorIndex(CreateVectorIndex node, Object context)
}
class AstBuilder {
+Node visitCreateVectorIndex(SqlBaseParser.CreateVectorIndexContext context)
}
class SqlFormatter_Formatter {
+Void visitCreateVectorIndex(CreateVectorIndex node, Integer indent)
}
class Analysis {
-Optional~QualifiedObjectName~ createVectorIndexTableName
-Optional~Query~ vectorIndexQuery
+void setCreateVectorIndexTableName(QualifiedObjectName tableName)
+Optional~QualifiedObjectName~ getCreateVectorIndexTableName()
+void setVectorIndexQuery(Query query)
+Optional~Query~ getVectorIndexQuery()
+void setCreateTableDestination(QualifiedObjectName table)
+Optional~QualifiedObjectName~ getCreateTableDestination()
}
class StatementAnalyzer {
+Scope visitCreateVectorIndex(CreateVectorIndex node, Optional~Scope~ scope)
-String extractStringProperty(Map~String,Expression~ properties, String key, String defaultValue)
}
class LogicalPlanner {
+PlanNode plan(Statement statement)
}
class StatementUtils {
+QueryType getQueryType(Class~? extends Statement~ statementClass)
}
class Query {}
class QuerySpecification {}
class FunctionCall {}
class Expression {}
class Property {}
class Identifier {}
class QualifiedName {}
class QualifiedObjectName {}
CreateVectorIndex --> QualifiedName : uses
CreateVectorIndex --> Identifier : uses
CreateVectorIndex --> Expression : optional where
CreateVectorIndex --> Property : properties
AstBuilder --> CreateVectorIndex : constructs
SqlFormatter_Formatter --> CreateVectorIndex : formats
DefaultTraversalVisitor --> CreateVectorIndex : traverses
StatementAnalyzer --> CreateVectorIndex : analyzes
StatementAnalyzer --> Analysis : populates
Analysis --> Query : holds synthetic
LogicalPlanner --> Analysis : reads vectorIndexQuery
LogicalPlanner --> Query : plans CTAS-style
StatementUtils --> CreateVectorIndex : maps_to_INSERT_QueryType
AstVisitor --> CreateVectorIndex
DefaultTraversalVisitor --> Analysis
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 3 issues, and left some high level feedback:
- The
WHEREclause onCreateVectorIndexis analyzed but never incorporated into the syntheticcreate_vector_indexquery invisitCreateVectorIndex, so any filtering the user specifies will be silently ignored; either plumb it into the function call/plan or reject unsupported WHERE usage explicitly. - The hardcoded property keys in
visitCreateVectorIndex(e.g.,distance_metric) don't match the ones used in the parser tests (e.g.,metric), which will confuse users and makes configuration brittle; consider aligning on a single set of property names and enforcing/validating them consistently. - The JSON string for
propertiesJsonis constructed viaStringBuilder/String.formatwith raw string literals, which will break for values containing quotes or backslashes; use a proper JSON encoder or existing JSON utilities to safely build this payload.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `WHERE` clause on `CreateVectorIndex` is analyzed but never incorporated into the synthetic `create_vector_index` query in `visitCreateVectorIndex`, so any filtering the user specifies will be silently ignored; either plumb it into the function call/plan or reject unsupported WHERE usage explicitly.
- The hardcoded property keys in `visitCreateVectorIndex` (e.g., `distance_metric`) don't match the ones used in the parser tests (e.g., `metric`), which will confuse users and makes configuration brittle; consider aligning on a single set of property names and enforcing/validating them consistently.
- The JSON string for `propertiesJson` is constructed via `StringBuilder`/`String.format` with raw string literals, which will break for values containing quotes or backslashes; use a proper JSON encoder or existing JSON utilities to safely build this payload.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java:1184-1185` </location>
<code_context>
+ partitionedByJson.append("]");
+
+ // Pack all config into a JSON properties string for the Python script
+ String propertiesJson = String.format(
+ "{\"distance_metric\":\"%s\",\"index_options\":\"%s\",\"partitioned_by\":%s}",
+ distanceMetric, indexOptions, partitionedByJson.toString());
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Avoid manual JSON string construction and ensure proper escaping of property values.
This JSON is assembled with `String.format` and concatenation, so any quotes, backslashes, or control characters in `distance_metric`, `index_options`, or `partitioned_by` values will produce invalid JSON and may enable injection-like behavior in downstream consumers. Prefer constructing this object with a JSON library (e.g., Airlift JSON already used in Presto) so values are properly escaped, or at least centralize robust escaping rather than relying on raw string formatting and `StringBuilder` concatenation.
</issue_to_address>
### Comment 2
<location> `presto-parser/src/main/java/com/facebook/presto/sql/tree/DefaultTraversalVisitor.java:593-590` </location>
<code_context>
return createAndAssignScope(node, scope);
}
+ @Override
+ protected Scope visitCreateVectorIndex(CreateVectorIndex node, Optional<Scope> scope)
+ {
</code_context>
<issue_to_address>
**issue (bug_risk):** DefaultTraversalVisitor should also traverse `tableName` for CreateVectorIndex.
In `visitCreateVectorIndex`, the visitor handles `indexName`, `columns`, `where`, and `properties` but skips `tableName`. This diverges from other statement visitors and can break logic that relies on visiting all table references (e.g., identifier collection, access control, rewriters). Please also call `process(node.getTableName(), context);` so the table reference is traversed consistently.
</issue_to_address>
### Comment 3
<location> `presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParserErrorHandling.java:84` </location>
<code_context>
"line 1:21: mismatched input ','. Expecting: <expression>"},
{"CREATE TABLE foo () AS (VALUES 1)",
- "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VIEW'"},
+ "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
{"CREATE TABLE foo (*) AS (VALUES 1)",
- "line 1:19: mismatched input '*'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VIEW'"},
</code_context>
<issue_to_address>
**suggestion (testing):** Add negative parser tests for invalid CREATE VECTOR INDEX syntax to complement the positive cases
The error handling table now includes `VECTOR`, but there are no negative parser tests for malformed `CREATE VECTOR INDEX` statements. To exercise the new grammar branch and lock in error behavior, please add a few invalid cases here, e.g.:
- `CREATE VECTOR INDEX idx ON` – missing table and columns
- `CREATE VECTOR INDEX idx ON t` – missing column list
- `CREATE VECTOR INDEX idx ON t()` – empty column list
- `CREATE VECTOR INDEX idx ON t(c) WITH ()` – empty properties
Suggested implementation:
```java
{"select foo(DISTINCT ,1)",
"line 1:21: mismatched input ','. Expecting: <expression>"},
{"CREATE TABLE foo () AS (VALUES 1)",
"line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
{"CREATE TABLE foo (*) AS (VALUES 1)",
"line 1:19: mismatched input '*'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
{"CREATE VECTOR INDEX idx ON",
"line 1:26: mismatched input '<EOF>'. Expecting: <identifier>"},
{"CREATE VECTOR INDEX idx ON t",
"line 1:28: mismatched input '<EOF>'. Expecting: '('"},
{"CREATE VECTOR INDEX idx ON t()",
"line 1:30: mismatched input ')'. Expecting: <identifier>"},
{"CREATE VECTOR INDEX idx ON t(c) WITH ()",
"line 1:39: mismatched input ')'. Expecting: <identifier>"},
{"SELECT grouping(a+2) FROM (VALUES (1)) AS t (a) GROUP BY a+2",
"line 1:18: mismatched input '+'. Expecting: ')', ','"},
{"SELECT x() over (ROWS select) FROM t",
```
1. If the actual error messages produced by the parser differ (column numbers or expected token sets), adjust the expected strings to match the real output from `TestSqlParserErrorHandling` by running the test suite and copying the exact messages.
2. If the grammar for `CREATE VECTOR INDEX` uses different expectations (e.g., a different nonterminal instead of `<identifier>` or additional expected tokens after `ON t`), update the expectations accordingly to reflect the concrete grammar.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/sql/analyzer/StatementAnalyzer.java
Outdated
Show resolved
Hide resolved
presto-parser/src/main/java/com/facebook/presto/sql/tree/DefaultTraversalVisitor.java
Show resolved
Hide resolved
| "line 1:21: mismatched input ','. Expecting: <expression>"}, | ||
| {"CREATE TABLE foo () AS (VALUES 1)", | ||
| "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VIEW'"}, | ||
| "line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"}, |
There was a problem hiding this comment.
suggestion (testing): Add negative parser tests for invalid CREATE VECTOR INDEX syntax to complement the positive cases
The error handling table now includes VECTOR, but there are no negative parser tests for malformed CREATE VECTOR INDEX statements. To exercise the new grammar branch and lock in error behavior, please add a few invalid cases here, e.g.:
CREATE VECTOR INDEX idx ON– missing table and columnsCREATE VECTOR INDEX idx ON t– missing column listCREATE VECTOR INDEX idx ON t()– empty column listCREATE VECTOR INDEX idx ON t(c) WITH ()– empty properties
Suggested implementation:
{"select foo(DISTINCT ,1)",
"line 1:21: mismatched input ','. Expecting: <expression>"},
{"CREATE TABLE foo () AS (VALUES 1)",
"line 1:19: mismatched input ')'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
{"CREATE TABLE foo (*) AS (VALUES 1)",
"line 1:19: mismatched input '*'. Expecting: 'FUNCTION', 'MATERIALIZED', 'OR', 'ROLE', 'SCHEMA', 'TABLE', 'TEMPORARY', 'TYPE', 'VECTOR', 'VIEW'"},
{"CREATE VECTOR INDEX idx ON",
"line 1:26: mismatched input '<EOF>'. Expecting: <identifier>"},
{"CREATE VECTOR INDEX idx ON t",
"line 1:28: mismatched input '<EOF>'. Expecting: '('"},
{"CREATE VECTOR INDEX idx ON t()",
"line 1:30: mismatched input ')'. Expecting: <identifier>"},
{"CREATE VECTOR INDEX idx ON t(c) WITH ()",
"line 1:39: mismatched input ')'. Expecting: <identifier>"},
{"SELECT grouping(a+2) FROM (VALUES (1)) AS t (a) GROUP BY a+2",
"line 1:18: mismatched input '+'. Expecting: ')', ','"},
{"SELECT x() over (ROWS select) FROM t",- If the actual error messages produced by the parser differ (column numbers or expected token sets), adjust the expected strings to match the real output from
TestSqlParserErrorHandlingby running the test suite and copying the exact messages. - If the grammar for
CREATE VECTOR INDEXuses different expectations (e.g., a different nonterminal instead of<identifier>or additional expected tokens afterON t), update the expectations accordingly to reflect the concrete grammar.
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 5022f6b...4e123b1. No notifications. |
steveburnett
left a comment
There was a problem hiding this comment.
Do we need documentation for CREATE VECTOR INDEX? Could add a new page in https://github.com/prestodb/presto/tree/master/presto-docs/src/main/sphinx/sql.
|
Please add a release note entry for this PR that follows the Release Notes Guidelines to pass the not required but failing CI check. |
Summary: Support vector search in LogicalPlanner Differential Revision: D93690255
4df9aba to
9f02480
Compare
Summary: Support vector search in LogicalPlanner Differential Revision: D93690255
9f02480 to
be0a397
Compare
Summary: Support vector search in LogicalPlanner Differential Revision: D93690255
be0a397 to
5ce4f49
Compare
Summary: Support vector search in LogicalPlanner Differential Revision: D93690255
5ce4f49 to
b7fb0f7
Compare
Summary: Support vector search in LogicalPlanner Differential Revision: D93690255
b7fb0f7 to
682451f
Compare
Summary: Support vector search in LogicalPlanner Differential Revision: D93690255
682451f to
2dc4a82
Compare
Summary: Support vector search in LogicalPlanner Differential Revision: D93690255
2dc4a82 to
3564eb4
Compare
Summary: Support vector search in LogicalPlanner Differential Revision: D93690255
3564eb4 to
c69b80a
Compare
Summary: Support vector search in LogicalPlanner Differential Revision: D93690255
c69b80a to
6bb4f61
Compare
Summary: Support vector search in LogicalPlanner Differential Revision: D93690255
6bb4f61 to
5fd79ff
Compare
71bb1d4 to
b165989
Compare
gggrace14
left a comment
There was a problem hiding this comment.
The 5 commits look good to me after revising.
Summary:
Support vector search in LogicalPlanner
## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:
SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:
StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**
**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):
PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):
TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:
Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.
## Release Notes
Please follow release notes guidelines and fill in the release notes below.
```
== RELEASE NOTES ==
General Changes
* Add support for create-vector-index statement, which creates
vector search indexes on table columns with configurable index properties
and partition filtering via an ``UPDATING FOR`` clause.
```
Differential Revision: D93690255
b165989 to
a3cbda1
Compare
Summary:
Support vector search in LogicalPlanner
## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:
SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:
StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
**• LogicalPlanner.createVectorIndexPlan() builds the core execution query:**
**CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...**
**• The resulting plan tree includes:**
**TableFinishNode(target = CreateVectorIndexReference)**
**└── TableWriterNode(target = CreateVectorIndexReference)**
**└── query plan**
4. Connector Plan Optimization (Rewriting):
PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):
TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:
Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.
## Release Notes
Please follow release notes guidelines and fill in the release notes below.
```
== RELEASE NOTES ==
General Changes
* Add support for create-vector-index statement, which creates
vector search indexes on table columns with configurable index properties
and partition filtering via an ``UPDATING FOR`` clause.
```
Differential Revision: D93690255
a3cbda1 to
a5c1df0
Compare
Summary: Support vector search in LogicalPlanner ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: **• LogicalPlanner.createVectorIndexPlan() builds the core execution query:** **CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...** **• The resulting plan tree includes:** **TableFinishNode(target = CreateVectorIndexReference)** **└── TableWriterNode(target = CreateVectorIndexReference)** **└── query plan** 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D93690255
a5c1df0 to
c58854b
Compare
Summary: Support vector search in LogicalPlanner ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: **• LogicalPlanner.createVectorIndexPlan() builds the core execution query:** **CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...** **• The resulting plan tree includes:** **TableFinishNode(target = CreateVectorIndexReference)** **└── TableWriterNode(target = CreateVectorIndexReference)** **└── query plan** 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D93690255
Summary: Pull Request resolved: prestodb#27169 Support vector search in LogicalPlanner ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: **• LogicalPlanner.createVectorIndexPlan() builds the core execution query:** **CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...** **• The resulting plan tree includes:** **TableFinishNode(target = CreateVectorIndexReference)** **└── TableWriterNode(target = CreateVectorIndexReference)** **└── query plan** 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. Differential Revision: D93690255
Summary: Support vector search in LogicalPlanner ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: **• LogicalPlanner.createVectorIndexPlan() builds the core execution query:** **CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...** **• The resulting plan tree includes:** **TableFinishNode(target = CreateVectorIndexReference)** **└── TableWriterNode(target = CreateVectorIndexReference)** **└── query plan** 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D93690255
c58854b to
b4ee8fa
Compare
Summary: Support vector search in LogicalPlanner ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: **• LogicalPlanner.createVectorIndexPlan() builds the core execution query:** **CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...** **• The resulting plan tree includes:** **TableFinishNode(target = CreateVectorIndexReference)** **└── TableWriterNode(target = CreateVectorIndexReference)** **└── query plan** 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D93690255
b4ee8fa to
25b72eb
Compare
Summary: ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: **StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.** **This results in a structured CreateVectorIndexAnalysis object.** 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D91524358 Pulled By: skyelves
…estodb#27261) Summary: Add dedicated WriterTarget subclass and ConnectorMetadata SPI for CREATE VECTOR INDEX, enabling each connector to implement vector index creation independently. - CreateVectorIndexReference: plan-time target carrying index metadata and source table reference - beginCreateVectorIndex/finishCreateVectorIndex: SPI defaults to NOT_SUPPORTED so connectors must opt in - ClassLoaderSafeConnectorMetadata: delegation wrappers ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D95325176
Summary: Add create_vector_index function signature ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: • LogicalPlanner.createVectorIndexPlan() builds the core execution query: CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ... • The resulting plan tree includes: TableFinishNode(target = CreateVectorIndexReference) └── TableWriterNode(target = CreateVectorIndexReference) └── query plan 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D95341384
Summary: Support vector search in LogicalPlanner ## High level design The process for executing a CREATE VECTOR INDEX SQL statement is as follows: 1. SQL Input & Parsing: SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ... The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node. 2. Statement Analysis: StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties. This results in a structured CreateVectorIndexAnalysis object. 3. Logical Planning & Query Generation: **• LogicalPlanner.createVectorIndexPlan() builds the core execution query:** **CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...** **• The resulting plan tree includes:** **TableFinishNode(target = CreateVectorIndexReference)** **└── TableWriterNode(target = CreateVectorIndexReference)** **└── query plan** 4. Connector Plan Optimization (Rewriting): PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization. ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase. 5. Execution and Metadata Handling (For connectors that don't rewrite): TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex(). Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync(). 6. ConnectorMetadata SPI: Default: The standard implementation throws NOT_SUPPORTED. Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls. ## Release Notes ``` == NO RELEASE NOTE == ``` Differential Revision: D93690255
25b72eb to
4e123b1
Compare
|
@skyelves, could you please update the PR to include only the LogicalPlanner changes? The other changes already have separate PRs. |
Sorry for the confusion, I am planning to only land this PR which contains four commits. All the PRs are created by meta export service so it's kind of confusing. |
Summary:
Support vector search in LogicalPlanner
High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...
The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.
2. Statement Analysis:
StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:
TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):
PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):
TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:
Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.
Release Notes
Differential Revision: D93690255