ALTER TABLE EXECUTE + Hive OPTIMIZE (explicit TableExecuteNode)#9665
ALTER TABLE EXECUTE + Hive OPTIMIZE (explicit TableExecuteNode)#9665losipiuk merged 8 commits intotrinodb:masterfrom
Conversation
d2629fe to
2851668
Compare
2851668 to
39a35ed
Compare
findepi
left a comment
There was a problem hiding this comment.
lots of editorials, nothing material. good job
core/trino-main/src/main/java/io/trino/sql/planner/optimizations/BeginTableWrite.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/optimizations/BeginTableWrite.java
Outdated
Show resolved
Hide resolved
| PlanNode source = rewriteModifyTableScan(((SemiJoinNode) node).getSource(), handle); | ||
| return replaceChildren(node, ImmutableList.of(source, ((SemiJoinNode) node).getFilteringSource())); | ||
| } | ||
| if (node instanceof JoinNode) { | ||
| PlanNode source = rewriteModifyTableScan(((JoinNode) node).getLeft(), handle); | ||
| return replaceChildren(node, ImmutableList.of(source, ((JoinNode) node).getRight())); |
There was a problem hiding this comment.
Previously we required that updated TS is on the left side of Join / SemiJoin.
I think this is an important property. I don't imagine how we could get a wrong SemiJoin, but getting flipped Join is certainly possible. cc @kasiafi
Is it still validated somewhere? or are we going to hit
There was a problem hiding this comment.
Actually that is still enforced by logic in findTableScanHandleForDeleteOrUpdate
core/trino-spi/src/main/java/io/trino/spi/connector/BeginTableExecuteResult.java
Outdated
Show resolved
Hide resolved
core/trino-spi/src/main/java/io/trino/spi/connector/BeginTableExecuteResult.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveTableExecuteHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/test/java/io/trino/plugin/hive/TestHiveConnectorTest.java
Outdated
Show resolved
Hide resolved
core/trino-main/src/main/java/io/trino/sql/planner/iterative/rule/RemoveEmptyTableExecute.java
Outdated
Show resolved
Hide resolved
|
|
||
| import static java.util.Objects.requireNonNull; | ||
|
|
||
| public final class TableExecuteHandle |
There was a problem hiding this comment.
What does this handle identify, exactly? It's not clear from the name.
There was a problem hiding this comment.
I updated javadoc a bit. Hope it is better now.
core/trino-main/src/main/java/io/trino/metadata/AbstractCatalogPropertyManager.java
Show resolved
Hide resolved
core/trino-parser/src/main/antlr4/io/trino/sql/parser/SqlBase.g4
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This commit should be folded into whichever commit introduce execution for ALTER TABLE EXECUTE. Otherwise, that other commit would be "broken".
In fact, I would squash all the commits related to adding the new feature into a single one before it gets merged. You can leave the unrelated changes separate (as long as they have value and stand on their own)
There was a problem hiding this comment.
In fact, I would squash all the commits related to adding the new feature into a single one before it gets merged
Yeah - that is the plan.
|
Docs PR here, can make additional updates as needed but otherwise waiting until this code is merged. #9682 |
f42d8f4 to
fadee63
Compare
|
AC. I will squash commits that are not self-contained later today or tomorrow. @martint please let me know if you want to dig deeper into this one. |
de2bbc2 to
aed3887
Compare
|
Squashed implementation commmits and rebased on top of current master |
aed3887 to
8469bd4
Compare
Allow using different key than catalog name for storing properites metadata. This is preparatory work for introducing table procedures, where each table procedure, even from single catalog, may allow different set of properties.
Commit adds SPI and execution support for new table procedures. New syntax extends ALTER TABLE family allowing for proper semantic analysis of target table (unlike what we have with CALL when table schema and table name are passed as string literals via table arguments). New syntax example: * ALTER TABLE <table> EXECUTE * ALTER TABLE <table> EXECUTE(value1, value2, ...) * ALTER TABLE <table> EXECUTE(param1 => value1, param2 => value2, ...) WHERE ... New table procedures allow for rewriting table data which makes them feasible for implementing data cleansing routines like: * compacting small files into larger ones for HIVE table * changing files sorting or bucketing scheme for a table * Iceberg OPTIMIZE Currently exectuion flow which _does not_ rewrite table data is not implemented. It will be implemented as a followup. Then current procedures available via `CALL` will be migrated to new mechanism. Procedures are exposed via connectors to engine via a set of new SPI methods and classes: * io.trino.spi.connector.TableProcedureMetadata * io.trino.spi.connector.TableProcedureExecutionMode * io.trino.spi.connector.ConnectorTableExecuteHandle * io.trino.spi.connector.ConnectorMetadata#getTableHandleForExecute * io.trino.spi.connector.ConnectorMetadata#getLayoutForTableExecute * io.trino.spi.connector.ConnectorMetadata#beginTableExecute * io.trino.spi.connector.ConnectorMetadata#finishTableExecute
Add support for compacting small files for non-transactional, non-bucketed Hive tables. ALTER TABLE xxxxx EXECUTE OPTIMIZE WITH(file_size_threshold = ...)
8469bd4 to
57e48d2
Compare
PR adds SPI and execution support for new table procedures.
New syntax extends ALTER TABLE family allowing for proper semantic
analysis of target table (unlike what we have with CALL when table
schema and table name are passed as string literals via table
arguments).
New syntax example:
ALTER TABLE <table> EXECUTE procedureALTER TABLE <table> EXECUTE procedure(value1, value2, ...)ALTER TABLE <table> EXECUTE procedure(param1 => value1, param2 => value2, ...) WHERE ...New table procedures allow for rewriting table data which makes them
feasible for implementing data cleansing routines like:
Currently exectuion flow which does not rewrite table data is not
implemented. It will be implemented as a followup. Then current
procedures available via
CALLwill be migrated to new mechanism.Procedures are exposed via connectors to engine via a set of new SPI
methods and classes:
io.trino.spi.connector.TableProcedureMetadataio.trino.spi.connector.TableProcedureExecutionModeio.trino.spi.connector.ConnectorTableExecuteHandleio.trino.spi.connector.ConnectorMetadata#getTableHandleForExecuteio.trino.spi.connector.ConnectorMetadata#getLayoutForTableExecuteio.trino.spi.connector.ConnectorMetadata#beginTableExecuteio.trino.spi.connector.ConnectorMetadata#finishTableExecute