Skip to content

feat: Add syntax support for CREATE VECTOR INDEX#27027

Closed
skyelves wants to merge 1 commit intoprestodb:masterfrom
skyelves:export-D91385788
Closed

feat: Add syntax support for CREATE VECTOR INDEX#27027
skyelves wants to merge 1 commit intoprestodb:masterfrom
skyelves:export-D91385788

Conversation

@skyelves
Copy link
Copy Markdown
Member

@skyelves skyelves commented Jan 24, 2026

Differential Revision: D91385788

Summary by Sourcery

Add parser, AST, formatter, and traversal support for a new CREATE VECTOR INDEX SQL statement, including tests.

New Features:

  • Introduce a CreateVectorIndex AST node representing CREATE VECTOR INDEX statements with optional WHERE clause and index properties.
  • Extend the SQL grammar, AST builder, and SQL formatter to parse and render CREATE VECTOR INDEX statements.

Tests:

  • Add parser tests covering CREATE VECTOR INDEX syntax variants including qualified table names, WHERE filters, and WITH properties.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Jan 24, 2026

Reviewer's Guide

Introduce full parser, AST, formatter, and visitor support for a new CREATE VECTOR INDEX statement, including grammar changes, AST node definition, traversal/formatting hooks, and unit tests in TestSqlParser.

Class diagram for new CreateVectorIndex AST support

classDiagram
    class Statement
    class Node {
    }
    class NodeLocation
    class Identifier
    class QualifiedName
    class Expression
    class Property
    class AstVisitor {
      +R visitCreateTable(CreateTable node, C context)
      +R visitCreateVectorIndex(CreateVectorIndex node, C context)
      +R visitCreateType(CreateType node, C context)
    }
    class AstBuilder {
      +Node visitCreateVectorIndex(SqlBaseParser_CreateVectorIndexContext context)
    }
    class DefaultTraversalVisitor {
      +Void visitCreateVectorIndex(CreateVectorIndex node, C context)
    }
    class SqlFormatter {
      +Void visitCreateVectorIndex(CreateVectorIndex node, Integer indent)
    }

    class CreateVectorIndex {
      -Identifier indexName
      -QualifiedName tableName
      -List~Identifier~ columns
      -Optional~Expression~ where
      -List~Property~ properties
      +CreateVectorIndex(Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ where, List~Property~ properties)
      +CreateVectorIndex(NodeLocation location, Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ where, List~Property~ properties)
      -CreateVectorIndex(Optional~NodeLocation~ location, Identifier indexName, QualifiedName tableName, List~Identifier~ columns, Optional~Expression~ where, List~Property~ properties)
      +Identifier getIndexName()
      +QualifiedName getTableName()
      +List~Identifier~ getColumns()
      +Optional~Expression~ getWhere()
      +List~Property~ getProperties()
      +<R,C> R accept(AstVisitor visitor, Object context)
      +List~Node~ getChildren()
      +int hashCode()
      +boolean equals(Object obj)
      +String toString()
    }

    Statement <|-- CreateVectorIndex
    Node <|-- Statement

    CreateVectorIndex o--> Identifier
    CreateVectorIndex o--> QualifiedName
    CreateVectorIndex o--> Expression
    CreateVectorIndex o--> Property
    CreateVectorIndex o--> NodeLocation

    AstVisitor <.. CreateVectorIndex : visited_by
    CreateVectorIndex ..> AstVisitor : accept

    AstBuilder ..> CreateVectorIndex : creates
    DefaultTraversalVisitor ..> CreateVectorIndex : traverses
    SqlFormatter ..> CreateVectorIndex : formats

    AstVisitor <|-- DefaultTraversalVisitor
    AstVisitor <|-- SqlFormatter
Loading

File-Level Changes

Change Details Files
Add CREATE VECTOR INDEX grammar and non-reserved keywords, plus AstBuilder construction logic.
  • Extend SqlBase.g4 statement rule with a createVectorIndex alternative supporting ON table, column list, optional WHERE, and optional WITH properties.
  • Mark INDEX and VECTOR as non-reserved keywords to allow the new syntax.
  • Implement AstBuilder.visitCreateVectorIndex to build a CreateVectorIndex node from parser context, including index name, qualified table name, column list, optional WHERE expression, and properties list.
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
presto-parser/src/main/java/com/facebook/presto/sql/parser/AstBuilder.java
Introduce CreateVectorIndex AST node and wire it into visitors and traversal.
  • Create new CreateVectorIndex Statement subclass encapsulating index name, table name, column list, optional WHERE expression, and properties, with location-aware constructors and child traversal.
  • Add visitCreateVectorIndex hook to AstVisitor delegating to visitStatement.
  • Add DefaultTraversalVisitor.visitCreateVectorIndex to traverse index name, columns, WHERE expression, and properties.
presto-parser/src/main/java/com/facebook/presto/sql/tree/CreateVectorIndex.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/AstVisitor.java
presto-parser/src/main/java/com/facebook/presto/sql/tree/DefaultTraversalVisitor.java
Support formatting of CREATE VECTOR INDEX statements and add parser tests.
  • Implement SqlFormatter.visitCreateVectorIndex to render CREATE VECTOR INDEX with ON clause, column list, optional WHERE, and WITH properties in a style consistent with other DDL.
  • Add TestSqlParser.testCreateVectorIndex to cover basic syntax, qualified table names, WHERE clause, WITH properties, and combined WHERE+WITH variations.
presto-parser/src/main/java/com/facebook/presto/sql/SqlFormatter.java
presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParser.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 24, 2026

Codenotify: Notifying subscribers in CODENOTIFY files for diff b38d89c...4467128.

Notify File(s)
@aditi-pandit presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@elharo presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@kaikalur presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4
@rschlussel presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/SqlBase.g4

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In CreateVectorIndex, consider including tableName in both getChildren() and DefaultTraversalVisitor.visitCreateVectorIndex so that tree traversals and rewriters consistently see the referenced table alongside the index name, columns, filter, and properties.
  • You may want to enforce that the columns list in CreateVectorIndex is non-empty (similar to other DDL AST nodes) to avoid constructing semantically invalid index definitions at the AST level.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `CreateVectorIndex`, consider including `tableName` in both `getChildren()` and `DefaultTraversalVisitor.visitCreateVectorIndex` so that tree traversals and rewriters consistently see the referenced table alongside the index name, columns, filter, and properties.
- You may want to enforce that the `columns` list in `CreateVectorIndex` is non-empty (similar to other DDL AST nodes) to avoid constructing semantically invalid index definitions at the AST level.

## Individual Comments

### Comment 1
<location> `presto-parser/src/test/java/com/facebook/presto/sql/parser/TestSqlParser.java:1381-1379` </location>
<code_context>
+                                new Property(identifier("index_type"), new StringLiteral("ivf_rabitq4")))));
+    }
+
     @Test
     public void testCreateTableAsSelect()
     {
</code_context>

<issue_to_address>
**suggestion (testing):** Add negative parser tests for invalid CREATE VECTOR INDEX syntax

`testCreateVectorIndex` currently covers only successful parses. Please also add negative tests (e.g., missing column list `CREATE VECTOR INDEX idx ON t`, empty column list `ON t()`, malformed `WITH` clause such as `WITH index_type = 'ivf'` without parentheses or with trailing commas, `WHERE` after `WITH`) using `assertFails`/`assertInvalid` to ensure invalid statements are rejected and protect against future grammar regressions.

Suggested implementation:

```java
    @Test
    public void testCreateVectorIndex()
    {
        // Basic CREATE VECTOR INDEX
        assertStatement("CREATE VECTOR INDEX idx ON t(a, b)",
                new CreateVectorIndex(
                        identifier("idx"),
                        QualifiedName.of("t"),
                        ImmutableList.of(identifier("a"), identifier("b")),
                        Optional.empty(),
                        ImmutableList.of()));

        // With qualified table name

        // Negative tests for invalid CREATE VECTOR INDEX syntax

        // Missing column list
        assertFails("CREATE VECTOR INDEX idx ON t", "Missing column list for vector index");

        // Empty column list
        assertFails("CREATE VECTOR INDEX idx ON t()", "Empty column list for vector index");

        // Malformed WITH clause: missing parentheses
        assertFails("CREATE VECTOR INDEX idx ON t(a) WITH index_type = 'ivf'",
                "Invalid WITH clause for vector index; expected parenthesized list of properties");

        // Malformed WITH clause: trailing comma
        assertFails("CREATE VECTOR INDEX idx ON t(a) WITH (index_type = 'ivf',)",
                "Invalid WITH clause for vector index; trailing comma is not allowed");

        // WHERE clause is not allowed after WITH
        assertFails("CREATE VECTOR INDEX idx ON t(a) WITH (index_type = 'ivf') WHERE a > 1",
                "WHERE clause is not allowed in CREATE VECTOR INDEX")

```

1. Ensure that `assertFails` is available in this test class with the `(String sql, String message)` signature; if not, adapt these calls to the existing assertion helpers (e.g., `assertInvalid`, or `assertFails(String sql, String expectedMessagePattern)`).
2. If the existing helpers require an exception type (e.g., `ParsingException.class`) or use regex/substring matching, adjust the second argument accordingly or split into multiple overload-appropriate calls.
3. Optionally refine the error messages in the second argument to match the actual parser error messages produced for these invalid statements, or switch to a looser match (substring or regex) if the helper supports it, to avoid brittleness.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

skyelves added a commit to skyelves/presto that referenced this pull request Jan 24, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
@skyelves skyelves changed the title Temporary Commit at 1/23/2026, 8:39:48 PM Add syntax support for CREATE VECTOR INDEX Jan 24, 2026
@skyelves skyelves changed the title Add syntax support for CREATE VECTOR INDEX feat: Add syntax support for CREATE VECTOR INDEX Jan 24, 2026
@skyelves skyelves changed the title feat: Add syntax support for CREATE VECTOR INDEX feat: Add syntax support for CREATE VECTOR INDEX [DO NOT MERGE] Jan 24, 2026
@tdcmeehan
Copy link
Copy Markdown
Contributor

If you intend to eventually mark this as ready for review, could you please open an RFC? There's already some discussions related to vector indices and it would be good to align.

skyelves added a commit to skyelves/presto that referenced this pull request Jan 26, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Jan 26, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Jan 26, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
Copy link
Copy Markdown
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc in .md looks okay overall. Commenting only until @tdcmeehan's request for an RFC is addressed.

skyelves added a commit to skyelves/presto that referenced this pull request Jan 26, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Jan 26, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Jan 26, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
@skyelves skyelves force-pushed the export-D91385788 branch 2 times, most recently from 6233813 to c872381 Compare January 26, 2026 23:17
skyelves added a commit to skyelves/presto that referenced this pull request Feb 19, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Feb 26, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
@skyelves skyelves force-pushed the export-D91385788 branch 2 times, most recently from 06070c8 to a7fafbf Compare February 26, 2026 00:39
@NivinCS
Copy link
Copy Markdown
Contributor

NivinCS commented Feb 27, 2026

@skyelves Thanks for sharing the updated syntax changes. From a SQL grammar perspective, this appears to be aligned with the RFC direction (e.g., support for the UPDATING FOR clause).

Since this PR focuses only on parser-level changes and the Analyzer / SPI integration is expected to follow in subsequent PRs, it would be helpful to ensure that the introduced syntax does not implicitly constrain execution ownership or lifecycle management semantics.

In particular, the RFC review comments highlighted the need for connector-managed index lifecycle and incremental maintenance (e.g., snapshot-aware refresh in Iceberg), as well as SPI-based delegation of index build execution. Our expectation is that connectors should be able to internally leverage libraries such as jvector for ANN index materialization and maintenance.

It would be important for the upcoming Analyzer / SPI PRs to clarify how index build and update operations will be delegated to the connector layer, so that execution can remain connector-driven rather than engine-managed. This would help ensure alignment with the RFC feedback and support connector-specific implementations going forward.
cc: @gggrace14 @tdcmeehan @aditi-pandit

USING relation ON expression mergeCase+ #mergeInto
USING relation ON expression mergeCase+ #mergeInto
| CREATE VECTOR INDEX identifier ON TABLE? qualifiedName
'(' identifier (',' identifier)* ')'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the RFC review, it may be beneficial to keep the identifier column optional rather than mandatory, particularly for table formats such as Iceberg that provide stable row-level identity via hidden columns (e.g., $row_id). This would allow connectors to automatically fall back to lineage-based identifiers when a user-defined id is not provided.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. Changed it to optional

| MERGE INTO qualifiedName (AS? identifier)?
USING relation ON expression mergeCase+ #mergeInto
USING relation ON expression mergeCase+ #mergeInto
| CREATE VECTOR INDEX identifier ON TABLE? qualifiedName
Copy link
Copy Markdown
Contributor

@NivinCS NivinCS Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the TABLE keyword optional may introduce ambiguity if indexing needs to be performed over derived relations or connector-managed index sources, which are required for certain connector-managed embedding and incremental index maintenance workflows discussed in the RFC.

For example, consider a scenario where a user is ingesting raw text data into a base table (e.g., candidates_table), and embeddings are generated separately for efficient vector indexing. In such cases, connectors may adopt a twin-table approach, where the base table stores the raw text data, and a separate materialized view incrementally maintains the generated embeddings based on data freshness in the base table. The vector index would then be created on this materialized view rather than directly on the base table. In this case, the MV acts as a logical relation derived from the base table, and connector-managed incremental refresh (e.g., snapshot-aware updates in Iceberg) ensures index consistency.

It would be helpful to clarify whether the current syntax is expected to support such connector-managed indexing workflows where the index source may be a derived logical relation rather than a physical base table.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed TABLE keyword to mandatory. The current syntax only aims to support physical base table.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. While making the TABLE keyword mandatory clarifies that the current syntax targets physical base tables, it also restricts vector index creation to those tables only. For embedding pipelines where vectors are maintained in derived relations (e.g., materialized views), it would be helpful to allow indexing on those relations as well. To support this, the syntax could use a generic relation reference such as:

CREATE VECTOR INDEX identifier ON qualifiedName

instead of restricting it to ON TABLE qualifiedName. This would allow indexing both base tables and derived relations.

private final Identifier indexName;
private final QualifiedName tableName;
private final List<Identifier> columns;
private final Optional<Expression> updatingFor;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this predicate may influence incremental index maintenance behavior, it may be useful to clarify whether this expression is expected to be interpreted by the engine or delegated to the connector during index update planning. This would help ensure alignment with the RFC’s expectation of connector-managed lifecycle and refresh semantics.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be interpreted by the engine to a where predicate.

builder.append("CREATE VECTOR INDEX ");
builder.append(formatName(node.getIndexName()));
builder.append(" ON ");
builder.append(formatName(node.getTableName()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatter currently omits the optional TABLE keyword even when specified in the original statement. It may be preferable to preserve the original syntax to avoid confusion when round-tripping statements via EXPLAIN or query logging.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Added TABLE keyword

@skyelves skyelves changed the title feat: Add syntax support for CREATE VECTOR INDEX [DO NOT MERGE] feat: Add syntax support for CREATE VECTOR INDEX Mar 4, 2026
@steveburnett
Copy link
Copy Markdown
Contributor

When you have time, please resolve the file conflicts.

@skyelves skyelves force-pushed the export-D91385788 branch 4 times, most recently from 315047a to 7e6a848 Compare March 5, 2026 06:42
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
@skyelves skyelves force-pushed the export-D91385788 branch 2 times, most recently from ec46e42 to a05da76 Compare March 5, 2026 07:54
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Mar 5, 2026
Summary: Pull Request resolved: prestodb#27027

Differential Revision: D91385788
USING relation ON expression mergeCase+ #mergeInto
USING relation ON expression mergeCase+ #mergeInto
| CREATE VECTOR INDEX identifier ON TABLE qualifiedName
'(' identifier (',' identifier)? ')'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grammar currently restricts the column list to at most two identifiers (identifier (',' identifier)?). Is the intent to strictly enforce the (row_id, embedding_column) pattern? If so this makes sense for the current design, but it may limit future extensions where multiple embedding columns or additional metadata columns might need to be indexed.


properties
: '(' property (',' property)* ')'
: '(' property (',' property)* ','? ')'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need it?

| MERGE INTO qualifiedName (AS? identifier)?
USING relation ON expression mergeCase+ #mergeInto
USING relation ON expression mergeCase+ #mergeInto
| CREATE VECTOR INDEX identifier ON TABLE qualifiedName
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be

CREATE VECTOR INDEX qualifiedName

instead of identifier. Like create view or create table, as index is internally stored as a table.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @gggrace14 , In our implementation the index is not stored as a catalog table but as a connector-managed index file (Stored in s3 based object storage). Because of this, the index name behaves more like a logical identifier rather than a fully qualified relation. Using identifier seemed appropriate in this case, whereas qualifiedName would imply the index itself is a catalog object similar to CREATE TABLE or CREATE VIEW.

Comment on lines +27 to +28
{
private final Identifier indexName;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indexName should also be QualifiedName?

Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
@skyelves skyelves closed this Mar 10, 2026
skyelves added a commit to skyelves/presto that referenced this pull request Mar 10, 2026
Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Mar 10, 2026
Summary:

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
skyelves added a commit to skyelves/presto that referenced this pull request Mar 10, 2026
Summary:
Pull Request resolved: prestodb#27307

Pull Request resolved: prestodb#27027

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
skyelves pushed a commit to skyelves/presto that referenced this pull request Mar 13, 2026
Summary:
Pull Request resolved: prestodb#27307

Pull Request resolved: prestodb#27027

## High level design
The process for executing a CREATE VECTOR INDEX SQL statement is as follows:
1. SQL Input & Parsing:**

**SQL: CREATE VECTOR INDEX my_index ON my_table(id, embedding) WITH (...) UPDATING FOR ...**
**The Parser (SqlBase.g4) generates a CreateVectorIndex Abstract Syntax Tree (AST) node.**

2. Statement Analysis:

StatementAnalyzer.visitCreateVectorIndex() validates the source/target tables and extracts index properties.
This results in a structured CreateVectorIndexAnalysis object.
3. Logical Planning & Query Generation:
• LogicalPlanner.createVectorIndexPlan() builds the core execution query:
CREATE index_table AS SELECT create_vector_index(embedding, id) FROM my_table WHERE ds BETWEEN ...
• The resulting plan tree includes:

TableFinishNode(target = CreateVectorIndexReference)
└── TableWriterNode(target = CreateVectorIndexReference)
└── query plan
4. Connector Plan Optimization (Rewriting):

PRISM: The CreateVectorIndexRewriteOptimizer detects the CreateVectorIndexReference and rewrites the plan for optimization.
ICEBERG/OTHER: Other connector-specific optimizers may fire during this phase.
5. Execution and Metadata Handling (For connectors that don't rewrite):

TableWriteInfo Routing: The CreateVectorIndexReference triggers metadata.beginCreateVectorIndex().
Local Execution & Commit: The finisher and committer use the CreateVectorIndexHandle to call metadata.finishCreateVectorIndex() and metadata.commitPageSinkAsync().
6. ConnectorMetadata SPI:

Default: The standard implementation throws NOT_SUPPORTED.
Iceberg Override: The Iceberg connector implements this SPI to create the underlying table via the begin/finish calls.

Differential Revision: D91385788
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants