-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-25006: Commit Iceberg writes in HiveMetaHook instead of TezAM #2161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@pvary @lcspinter can you please take a look? |
| throws MetaException { | ||
| // construct the job context | ||
| JobConf jobConf = new JobConf(conf); | ||
| String tableName = TableIdentifier.of(table.getDbName(), table.getTableName()).toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this will get the Catalog information when @lcspinter's catalog related changes get in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure yet, I'll look into it. The catalog name might be stored in the table params, or we might need to load the Iceberg table to find out if we can. @lcspinter might have some more insights
iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergMetaHook.java
Show resolved
Hide resolved
iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
Outdated
Show resolved
Hide resolved
iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java
Outdated
Show resolved
Hide resolved
...andler/src/test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithEngine.java
Show resolved
Hide resolved
iceberg-handler/src/test/java/org/apache/iceberg/mr/hive/TestHiveShell.java
Show resolved
Hide resolved
|
@pvary I've uploaded a set of new commits addressing your review comments + changing the long-term approach with the temporary listing solution, as discussed. The listing solution is mostly based on the current downstream TP solution just simplified and refined a bit to save the information into the session conf. This should make it easy to swap out later with the proper vertex id-based implementation. |
d3ed117 to
3493405
Compare
Note: CI build will fail since Tez dependency is SNAPSHOT, using: apache/tez#101
TezTaskqueries the DAGClient to get each reducer (or mapper, if map-only) vertex's ID and number of tasks. There's some necessary translation between vertex ID and job ID.Assumption: for each table, there is only one vertex that writes to it.