feat: Add WHEN MATCHED THEN DELETE support to MERGE INTO statement (#27409)#27409
Conversation
Reviewer's GuideAdds full support for the SQL-standard WHEN MATCHED THEN DELETE clause in MERGE INTO by extending the grammar, AST, planner, analyzer, formatter, and tests, including new Iceberg integration tests for delete-only and delete-with-insert MERGE scenarios. Sequence diagram for MERGE INTO with WHEN MATCHED THEN DELETEsequenceDiagram
actor User
participant PrestoCoordinator
participant SqlParser
participant AstBuilder
participant StatementAnalyzer
participant QueryPlanner
participant ExecutionEngine
participant Connector
User->>PrestoCoordinator: Submit MERGE INTO ... WHEN MATCHED THEN DELETE
PrestoCoordinator->>SqlParser: Parse SQL text
SqlParser-->>AstBuilder: Parse tree with mergeDelete alternative
AstBuilder->>AstBuilder: visitMergeDelete(context)
AstBuilder-->>PrestoCoordinator: Merge AST containing MergeDelete case
PrestoCoordinator->>StatementAnalyzer: Analyze MERGE statement
StatementAnalyzer->>StatementAnalyzer: Collect merge cases
StatementAnalyzer->>StatementAnalyzer: Detect MergeDelete case
StatementAnalyzer->>StatementAnalyzer: accessControl.checkCanDeleteFromTable(targetTable)
StatementAnalyzer-->>PrestoCoordinator: Analyzed MERGE with delete permission verified
PrestoCoordinator->>QueryPlanner: Plan MERGE execution
QueryPlanner->>QueryPlanner: getMergeCaseOperationNumber(MergeDelete)
QueryPlanner-->>PrestoCoordinator: DELETE_OPERATION_NUMBER
PrestoCoordinator->>ExecutionEngine: Create DeleteAndInsertMergeProcessor
ExecutionEngine->>Connector: Open ConnectorMergeSink with delete support
Connector-->>ExecutionEngine: MergeSink ready
ExecutionEngine->>ExecutionEngine: Apply DELETE_OPERATION_NUMBER for matched rows
ExecutionEngine->>Connector: Execute delete operations for matched rows
Connector-->>ExecutionEngine: Delete operations committed
ExecutionEngine-->>PrestoCoordinator: MERGE with delete completed
PrestoCoordinator-->>User: MERGE statement finished successfully
Class diagram for MergeDelete integration into MERGE AST and visitorsclassDiagram
class Node {
}
class MergeCase {
+List~Identifier~ getSetColumns()
+List~Expression~ getSetExpressions()
}
class MergeUpdate {
+List~Identifier~ getSetColumns()
+List~Expression~ getSetExpressions()
}
class MergeInsert {
+List~Identifier~ getSetColumns()
+List~Expression~ getSetExpressions()
}
class MergeDelete {
+MergeDelete()
+MergeDelete(NodeLocation location)
+MergeDelete(Optional~NodeLocation~ location)
+List~Identifier~ getSetColumns()
+List~Expression~ getSetExpressions()
+List~Node~ getChildren()
+int hashCode()
+boolean equals(Object obj)
+String toString()
}
class AstVisitor {
+R visitMergeCase(MergeCase node, C context)
+R visitMergeUpdate(MergeUpdate node, C context)
+R visitMergeInsert(MergeInsert node, C context)
+R visitMergeDelete(MergeDelete node, C context)
}
class DefaultTraversalVisitor {
+R visitMergeUpdate(MergeUpdate node, C context)
+R visitMergeDelete(MergeDelete node, C context)
}
class SqlFormatter_Formatter {
+Void visitMergeUpdate(MergeUpdate node, Integer indent)
+Void visitMergeDelete(MergeDelete node, Integer indent)
-Void appendMergeCaseWhen(boolean matched)
}
class QueryPlanner {
-static int getMergeCaseOperationNumber(MergeCase mergeCase)
}
class StatementAnalyzer {
-void analyzeMergeAccessControl(Merge node)
}
Node <|-- MergeCase
MergeCase <|-- MergeUpdate
MergeCase <|-- MergeInsert
MergeCase <|-- MergeDelete
MergeCase ..> Identifier
MergeCase ..> Expression
MergeDelete ..> NodeLocation
MergeDelete ..> AstVisitor
AstVisitor <.. MergeDelete : accept(visitor, context)
DefaultTraversalVisitor <|-- StatementAnalyzer
SqlFormatter_Formatter ..> MergeUpdate
SqlFormatter_Formatter ..> MergeDelete
QueryPlanner ..> MergeCase
StatementAnalyzer ..> MergeDelete
StatementAnalyzer ..> MergeInsert
StatementAnalyzer ..> MergeUpdate
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff fb302b1...bb7fb1b.
|
|
Please add a release note - or Please edit the PR title to follow semantic commit style to pass the failing and required CI check. See the failure in the test for advice. |
9bf3bdb to
4ba5d8f
Compare
…tement (prestodb#27409) Summary: Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. == RELEASE NOTES == - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. Differential Revision: D97683467
98c77c4 to
ba79d08
Compare
ba79d08 to
6834cf0
Compare
…restodb#27409) Summary: Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. This is needed for Iceberg support. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == Key Benefits of MERGE INTO vs. UPDATE/DELETE in Iceberg == - **Atomic Operations**: MERGE INTO performs inserts, updates, and deletes in a single, ACID-compliant transaction, ensuring data consistency. Separate commands require managing multiple transactions. - **Faster Writes (Merge-on-Read)**: Using Merge-on-Read, MERGE avoids full-file rewrites. Instead, it writes new data and creates small "delete files" that mark old rows as invalid. This is significantly faster for frequent updates and streaming data. - **Reduced Write Amplification**: Because only the affected rows or new records are written (instead of rewriting an entire partition or data file), MERGE reduces compute costs and overhead. - **Optimal CDC Handling**: MERGE INTO is the best way to apply CDC data. It efficiently processes updates, inserts, and deletes simultaneously, especially when paired with v3 deletion vectors. == RELEASE NOTES == General Changes * Add support for `WHEN MATCHED THEN DELETE` clause in `MERGE INTO` statements, completing the SQL:2011 MERGE specification. The existing MERGE implementation supported `WHEN MATCHED THEN UPDATE` and `WHEN NOT MATCHED THEN INSERT` but lacked the DELETE case. Differential Revision: D97683467
…restodb#27409) Summary: Pull Request resolved: prestodb#27409 Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. This is needed for Iceberg support. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == Key Benefits of MERGE INTO vs. UPDATE/DELETE in Iceberg == - **Atomic Operations**: MERGE INTO performs inserts, updates, and deletes in a single, ACID-compliant transaction, ensuring data consistency. Separate commands require managing multiple transactions. - **Faster Writes (Merge-on-Read)**: Using Merge-on-Read, MERGE avoids full-file rewrites. Instead, it writes new data and creates small "delete files" that mark old rows as invalid. This is significantly faster for frequent updates and streaming data. - **Reduced Write Amplification**: Because only the affected rows or new records are written (instead of rewriting an entire partition or data file), MERGE reduces compute costs and overhead. - **Optimal CDC Handling**: MERGE INTO is the best way to apply CDC data. It efficiently processes updates, inserts, and deletes simultaneously, especially when paired with v3 deletion vectors. == RELEASE NOTES == General Changes * Add support for `WHEN MATCHED THEN DELETE` clause in `MERGE INTO` statements, completing the SQL:2011 MERGE specification. The existing MERGE implementation supported `WHEN MATCHED THEN UPDATE` and `WHEN NOT MATCHED THEN INSERT` but lacked the DELETE case. Differential Revision: D97683467
6834cf0 to
a2160b8
Compare
|
|
|
…restodb#27409) Summary: Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. This is needed for Iceberg support. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == Key Benefits of MERGE INTO vs. UPDATE/DELETE in Iceberg == - **Atomic Operations**: MERGE INTO performs inserts, updates, and deletes in a single, ACID-compliant transaction, ensuring data consistency. Separate commands require managing multiple transactions. - **Faster Writes (Merge-on-Read)**: Using Merge-on-Read, MERGE avoids full-file rewrites. Instead, it writes new data and creates small "delete files" that mark old rows as invalid. This is significantly faster for frequent updates and streaming data. - **Reduced Write Amplification**: Because only the affected rows or new records are written (instead of rewriting an entire partition or data file), MERGE reduces compute costs and overhead. - **Optimal CDC Handling**: MERGE INTO is the best way to apply CDC data. It efficiently processes updates, inserts, and deletes simultaneously, especially when paired with v3 deletion vectors. == RELEASE NOTES == General Changes * Add support for `WHEN MATCHED THEN DELETE` clause in `MERGE INTO` statements, completing the SQL:2011 MERGE specification. The existing MERGE implementation supported `WHEN MATCHED THEN UPDATE` and `WHEN NOT MATCHED THEN INSERT` but lacked the DELETE case. Reviewed By: xiaoxmeng Differential Revision: D97683467
a2160b8 to
607f89c
Compare
…restodb#27409) Summary: Pull Request resolved: prestodb#27409 Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. This is needed for Iceberg support. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == Key Benefits of MERGE INTO vs. UPDATE/DELETE in Iceberg == - **Atomic Operations**: MERGE INTO performs inserts, updates, and deletes in a single, ACID-compliant transaction, ensuring data consistency. Separate commands require managing multiple transactions. - **Faster Writes (Merge-on-Read)**: Using Merge-on-Read, MERGE avoids full-file rewrites. Instead, it writes new data and creates small "delete files" that mark old rows as invalid. This is significantly faster for frequent updates and streaming data. - **Reduced Write Amplification**: Because only the affected rows or new records are written (instead of rewriting an entire partition or data file), MERGE reduces compute costs and overhead. - **Optimal CDC Handling**: MERGE INTO is the best way to apply CDC data. It efficiently processes updates, inserts, and deletes simultaneously, especially when paired with v3 deletion vectors. == RELEASE NOTES == General Changes * Add support for `WHEN MATCHED THEN DELETE` clause in `MERGE INTO` statements, completing the SQL:2011 MERGE specification. The existing MERGE implementation supported `WHEN MATCHED THEN UPDATE` and `WHEN NOT MATCHED THEN INSERT` but lacked the DELETE case. Reviewed By: xiaoxmeng Differential Revision: D97683467
607f89c to
7149737
Compare
290e302 to
a60137a
Compare
…restodb#27409) Summary: Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. This is needed for Iceberg support. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == Key Benefits of MERGE INTO vs. UPDATE/DELETE in Iceberg == - **Atomic Operations**: MERGE INTO performs inserts, updates, and deletes in a single, ACID-compliant transaction, ensuring data consistency. Separate commands require managing multiple transactions. - **Faster Writes (Merge-on-Read)**: Using Merge-on-Read, MERGE avoids full-file rewrites. Instead, it writes new data and creates small "delete files" that mark old rows as invalid. This is significantly faster for frequent updates and streaming data. - **Reduced Write Amplification**: Because only the affected rows or new records are written (instead of rewriting an entire partition or data file), MERGE reduces compute costs and overhead. - **Optimal CDC Handling**: MERGE INTO is the best way to apply CDC data. It efficiently processes updates, inserts, and deletes simultaneously, especially when paired with v3 deletion vectors. == NO RELEASE NOTE == Reviewed By: xiaoxmeng Differential Revision: D97683467
a60137a to
c7926fa
Compare
…restodb#27409) Summary: Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. This is needed for Iceberg support. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == Key Benefits of MERGE INTO vs. UPDATE/DELETE in Iceberg == - **Atomic Operations**: MERGE INTO performs inserts, updates, and deletes in a single, ACID-compliant transaction, ensuring data consistency. Separate commands require managing multiple transactions. - **Faster Writes (Merge-on-Read)**: Using Merge-on-Read, MERGE avoids full-file rewrites. Instead, it writes new data and creates small "delete files" that mark old rows as invalid. This is significantly faster for frequent updates and streaming data. - **Reduced Write Amplification**: Because only the affected rows or new records are written (instead of rewriting an entire partition or data file), MERGE reduces compute costs and overhead. - **Optimal CDC Handling**: MERGE INTO is the best way to apply CDC data. It efficiently processes updates, inserts, and deletes simultaneously, especially when paired with v3 deletion vectors. == NO RELEASE NOTE == Reviewed By: xiaoxmeng Differential Revision: D97683467
c7926fa to
5f12e76
Compare
…restodb#27409) Summary: Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. This is needed for Iceberg support. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == Key Benefits of MERGE INTO vs. UPDATE/DELETE in Iceberg == - **Atomic Operations**: MERGE INTO performs inserts, updates, and deletes in a single, ACID-compliant transaction, ensuring data consistency. Separate commands require managing multiple transactions. - **Faster Writes (Merge-on-Read)**: Using Merge-on-Read, MERGE avoids full-file rewrites. Instead, it writes new data and creates small "delete files" that mark old rows as invalid. This is significantly faster for frequent updates and streaming data. - **Reduced Write Amplification**: Because only the affected rows or new records are written (instead of rewriting an entire partition or data file), MERGE reduces compute costs and overhead. - **Optimal CDC Handling**: MERGE INTO is the best way to apply CDC data. It efficiently processes updates, inserts, and deletes simultaneously, especially when paired with v3 deletion vectors. == NO RELEASE NOTE == Reviewed By: xiaoxmeng Differential Revision: D97683467
5f12e76 to
5599f01
Compare
…restodb#27409) Summary: Pull Request resolved: prestodb#27409 Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. This is needed for Iceberg support. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == Key Benefits of MERGE INTO vs. UPDATE/DELETE in Iceberg == - **Atomic Operations**: MERGE INTO performs inserts, updates, and deletes in a single, ACID-compliant transaction, ensuring data consistency. Separate commands require managing multiple transactions. - **Faster Writes (Merge-on-Read)**: Using Merge-on-Read, MERGE avoids full-file rewrites. Instead, it writes new data and creates small "delete files" that mark old rows as invalid. This is significantly faster for frequent updates and streaming data. - **Reduced Write Amplification**: Because only the affected rows or new records are written (instead of rewriting an entire partition or data file), MERGE reduces compute costs and overhead. - **Optimal CDC Handling**: MERGE INTO is the best way to apply CDC data. It efficiently processes updates, inserts, and deletes simultaneously, especially when paired with v3 deletion vectors. == NO RELEASE NOTE == Reviewed By: xiaoxmeng Differential Revision: D97683467
5599f01 to
30efc63
Compare
…restodb#27409) Summary: Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. This is needed for Iceberg support. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == Key Benefits of MERGE INTO vs. UPDATE/DELETE in Iceberg == - **Atomic Operations**: MERGE INTO performs inserts, updates, and deletes in a single, ACID-compliant transaction, ensuring data consistency. Separate commands require managing multiple transactions. - **Faster Writes (Merge-on-Read)**: Using Merge-on-Read, MERGE avoids full-file rewrites. Instead, it writes new data and creates small "delete files" that mark old rows as invalid. This is significantly faster for frequent updates and streaming data. - **Reduced Write Amplification**: Because only the affected rows or new records are written (instead of rewriting an entire partition or data file), MERGE reduces compute costs and overhead. - **Optimal CDC Handling**: MERGE INTO is the best way to apply CDC data. It efficiently processes updates, inserts, and deletes simultaneously, especially when paired with v3 deletion vectors. == NO RELEASE NOTE == Reviewed By: xiaoxmeng Differential Revision: D97683467
30efc63 to
a743f33
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @apurva-meta for adding this feature, overall looks good to me! Would you mind adding documentation on WHEN MATCHED THEN DELETE to mrege.rst?
Also, is the change for Support SQL-standard time travel syntax related in this PR? Would it be OK to move it to a separate PR?
a743f33 to
ef4c8a2
Compare
Thanks @hantangwangd for the review. I have documented the MERGE INTO..DELETE in the merge.rst file. Also removed the time travel commit from this PR. Please have a look. |
steveburnett
left a comment
There was a problem hiding this comment.
LGTM! (docs)
Pull branch, local doc build, looks good. Thank you!
…restodb#27409) Summary: Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == RELEASE NOTES == General Changes * Add support for `WHEN MATCHED THEN DELETE` clause in `MERGE INTO` statements, completing the SQL:2011 MERGE specification. The existing MERGE implementation supported `WHEN MATCHED THEN UPDATE` and `WHEN NOT MATCHED THEN INSERT` but lacked the DELETE case. Reviewed By: xiaoxmeng Differential Revision: D97683467
ef4c8a2 to
1d65553
Compare
…restodb#27409) Summary: Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == RELEASE NOTES == General Changes * Add support for `WHEN MATCHED THEN DELETE` clause in `MERGE INTO` statements, completing the SQL:2011 MERGE specification. The existing MERGE implementation supported `WHEN MATCHED THEN UPDATE` and `WHEN NOT MATCHED THEN INSERT` but lacked the DELETE case. Reviewed By: xiaoxmeng Differential Revision: D97683467
1d65553 to
bb7fb1b
Compare
hantangwangd
left a comment
There was a problem hiding this comment.
Thanks @apurva-meta, lgtm!
…restodb#27409) (prestodb#27409) Summary: Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification. Changes across the full stack: - Grammar (SqlBase.g4): Added mergeDelete alternative to mergeCase rule - AST (MergeDelete.java): New node extending MergeCase with empty column/expression lists - Parser (AstBuilder): ANTLR parse tree to MergeDelete AST conversion - Visitors (AstVisitor, DefaultTraversalVisitor): visitMergeDelete dispatch - Formatter (SqlFormatter): SQL text generation for WHEN MATCHED THEN DELETE - Planner (QueryPlanner): MergeDelete maps to DELETE_OPERATION_NUMBER (=2) - Analyzer (StatementAnalyzer): checkCanDeleteFromTable access control - Tests (TestSqlParser): Parser round-trip test for MERGE with DELETE clause - Tests (IcebergDistributedTestBase): 3 integration tests for MERGE DELETE with Iceberg connector The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there. == RELEASE NOTES == General Changes * Add support for `WHEN MATCHED THEN DELETE` clause in `MERGE INTO` statements, completing the SQL:2011 MERGE specification. The existing MERGE implementation supported `WHEN MATCHED THEN UPDATE` and `WHEN NOT MATCHED THEN INSERT` but lacked the DELETE case. Reviewed By: xiaoxmeng Differential Revision: D97683467
Summary:
Implements the missing WHEN MATCHED THEN DELETE clause for the SQL-standard MERGE INTO statement in Presto. The existing MERGE implementation supports WHEN MATCHED THEN UPDATE and WHEN NOT MATCHED THEN INSERT, but lacked the DELETE case. This completes the SQL:2011 MERGE specification.
Changes across the full stack:
The execution engine (DeleteAndInsertMergeProcessor) and SPI (ConnectorMergeSink) already handle DELETE_OPERATION_NUMBER, so no changes were needed there.
== RELEASE NOTES ==
General Changes
WHEN MATCHED THEN DELETEclause inMERGE INTOstatements, completing the SQL:2011 MERGE specification. The existing MERGE implementation supportedWHEN MATCHED THEN UPDATEandWHEN NOT MATCHED THEN INSERTbut lacked the DELETE case.Reviewed By: xiaoxmeng
Differential Revision: D97683467