Support time travel in Hudi connector#15084
Support time travel in Hudi connector#15084albericgenius wants to merge 1 commit intotrinodb:masterfrom
Conversation
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiConfig.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiErrorCode.java
Outdated
Show resolved
Hide resolved
...in/trino-hudi/src/main/java/io/trino/plugin/hudi/query/HudiReadOptimizedDirectoryLister.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiCopyOnWriteConnectorTest.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/AbstractTestQueryFramework.java
Outdated
Show resolved
Hide resolved
testing/trino-testing/src/main/java/io/trino/testing/H2QueryRunner.java
Outdated
Show resolved
Hide resolved
This isn't a time travel feature. Please take a look at Iceberg page in the issue. |
@ebyhr Thanks for pointing to the commit. Time travel is just a fancy name for a versioned snapshot query. While we're using |
@codope Yes. |
|
Got it. I think this commit is more relevant for us 1c23c69 which adds the necessary connector metadata APIs to support versioned snapshot queries. If we implement those APIs in |
a6c49d2 to
87553d8
Compare
...ain/src/main/java/io/trino/sql/planner/iterative/rule/HudiPushTableVersionIntoTableScan.java
Outdated
Show resolved
Hide resolved
...ain/src/main/java/io/trino/sql/planner/iterative/rule/HudiPushTableVersionIntoTableScan.java
Outdated
Show resolved
Hide resolved
...ain/src/main/java/io/trino/sql/planner/iterative/rule/HudiPushTableVersionIntoTableScan.java
Outdated
Show resolved
Hide resolved
|
When are we expecting this to be merged ? |
|
It depends on @albericgenius. I've waited that @albericgenius addresses comments. |
Got you, I will continue to work on this. |
|
Hey @albericgenius - Any update on this? |
87553d8 to
4c6c0d6
Compare
I updated a new revision, removed the Hudi specific code in the planner |
4c6c0d6 to
147db05
Compare
147db05 to
4b5412d
Compare
ebyhr
left a comment
There was a problem hiding this comment.
Could you add "Time travel queries" section to hudi.rst likes Iceberg documentation? It would be nice to mention the way to find versions from existing Hudi tables.
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiTableHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/BaseHudiConnectorTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiSmokeTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiSmokeTest.java
Outdated
Show resolved
Hide resolved
4b5412d to
719cd8a
Compare
719cd8a to
7c7cac5
Compare
7c7cac5 to
56975af
Compare
56975af to
e0a084c
Compare
e0a084c to
da2b230
Compare
|
@ebyhr i used the UI merge tool to resolve conflict issue, after that there is CI error: PR requires a rebase. Found: 1 merge.commit. and the cassandra CI test is not relative to my modification. |
|
Please rebase on master instead. |
f13e1d7 to
980d11a
Compare
Got you |
|
I guess this is close to getting merged 👀 |
|
@ebyhr still need your help :) |
980d11a to
350bf9b
Compare
ebyhr
left a comment
There was a problem hiding this comment.
Please rebase on master to resolve confclits.
| - ``varchar`` | ||
| - Current state of the instant | ||
|
|
||
| Rolling back to a previous version |
| String after = formatter.format(new Date()); | ||
|
|
||
| try { | ||
| assertThat(onTrino().executeQuery("SELECT id, name FROM hudi.default." + tableName + " FOR VERSION AS OF " + before)).hasNoRows(); |
There was a problem hiding this comment.
It would be better to get the version from $timeline table.
| onHudi().executeQuery("ALTER TABLE default." + tableName + " DROP COLUMNS (new_col)"); | ||
| onHudi().executeQuery("INSERT INTO default." + tableName + " VALUES (4, 'a4', 20, 1000)"); | ||
| String afterDropColumn = formatter.format(new Date()); | ||
| assertThat(onTrino().executeQuery("SELECT id, name FROM hudi.default." + tableName + " FOR VERSION AS OF " + afterDropColumn)) |
There was a problem hiding this comment.
The assertions having FOR VERSION AS OF specifies the latest version. That's not a common scenario when using time travel. I would recommend verifying with the past version. Also, it should verify the table definition.
| "hive", | ||
| forHostPath(configDir.getPath("hive.properties")), | ||
| CONTAINER_TRINO_ETC + "/catalog/hive.properties"); | ||
|
|
| @Test(groups = {HUDI, PROFILE_SPECIFIC_TESTS}) | ||
| public void testTimeTravelQuery() | ||
| { | ||
| String tableName = "test_hudi_cow_select_session_props" + randomNameSuffix(); |
There was a problem hiding this comment.
Why the table name contains select_session_props?
| internally used for providing the previous state of the table:: | ||
|
|
||
| SELECT * | ||
| FROM example.testdb.customer_orders FOR TIMESTAMP AS OF TIMESTAMP '2022-03-23 09:59:29.803 Europe/Vienna' |
There was a problem hiding this comment.
Does this PR contain a test case for FOR TIMESTAMP AS OF TIMESTAMP?
| formatter.setTimeZone(TimeZone.getTimeZone(ZoneOffset.UTC)); | ||
| String before = formatter.format(new Date()); | ||
| onHudi().executeQuery("SET hoodie.schema.on.read.enable=true"); | ||
| createNonPartitionedTable(tableName, COW_TABLE_TYPE); |
There was a problem hiding this comment.
Please add another test for MOR table type.
|
@ebyhr @albericgenius I have also implemented the time travel feature in my forked repo and the branch was cut from 418 tag, should i raise the PR to incorporate the feature? Please let me know if it sounds good i will raise the PR with the desired changes |
|
needs rebase |
|
any word on this getting released - we've built an EntityMatch feature into InsuranceLake which stores data in an an Entity Primary Table that uses Hudi. We'd like to be able to use Athena to do 'Time Travel'... |
|
Is this getting any love anytime soon? |
|
👋 @albericgenius - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that. We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks. |
|
This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua |
|
Closing this PR due to inactivity. Anyone interested please feel free to pick it back up and continue work on it here or in a new PR. |
Description
Fix #15003
Additional context and related issues
https://issues.apache.org/jira/browse/HUDI-2692
https://issues.apache.org/jira/browse/HUDI-2693
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text: