Skip to content

Conversation

@zhangbutao
Copy link
Contributor

@zhangbutao zhangbutao commented Apr 20, 2023

What changes were proposed in this pull request?

Why are the changes needed?

https://iceberg.apache.org/releases/#121-release Iceberg1.2.1(include 1.2.0) has lots of improvement, e.g. branch commit and position_deletes metadata table.

And delete the patched code BaseUpdatePartitionSpec.java which was added for iceberg1.1.0.

Does this PR introduce any user-facing change?

No

How was this patch tested?

@aturoczy
Copy link

Why is this test failed? I don't see any relation to this commit. Needs to retrigger?

@aturoczy
Copy link

Actually it could have relation of the PR:

Client Execution succeeded but contained differences (error code = 1) after executing describe_iceberg_metadata_tables.q

java.lang.AssertionError:
Client Execution succeeded but contained differences (error code = 1) after executing dynamic_partition_writes.q
352,353c352,353

@zhangbutao
Copy link
Contributor Author

Actually it could have relation of the PR:

Client Execution succeeded but contained differences (error code = 1) after executing describe_iceberg_metadata_tables.q

java.lang.AssertionError: Client Execution succeeded but contained differences (error code = 1) after executing dynamic_partition_writes.q 352,353c352,353

@TuroczyX I am fixing it. :)

@zhangbutao zhangbutao changed the title Iceberg: Upgrade iceberg to 1.2.1 HIVE-27273: Iceberg: Upgrade iceberg to 1.2.1 Apr 23, 2023
@deniskuzZ
Copy link
Member

@zhangbutao have you checked if we need to migrate some of the changes from iceberg mr package as well? cc @InvisibleProgrammer

@InvisibleProgrammer
Copy link
Contributor

@zhangbutao have you checked if we need to migrate some of the changes from iceberg mr package as well? cc @InvisibleProgrammer

I think it is just a dependency upgrade.

@zhangbutao
Copy link
Contributor Author

@zhangbutao have you checked if we need to migrate some of the changes from iceberg mr package as well? cc @InvisibleProgrammer

I think you mean this PR of iceberg mr package: apache/iceberg#7123. I will port this change here.

In addition, i find several PR from iceberg hive-metastore package which not exist in Hive iceberg-handler module and I will check if these PR are valuable to Hive iceberg-handler. If ok ,I will port them in a separate PR.

WDYT? @deniskuzZ @InvisibleProgrammer

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 3 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@InvisibleProgrammer
Copy link
Contributor

I started to play with the porting, let me share some extra information and details:

There are two modules that we 'copy' into Hive: mr and hive-metastore.

When we do a Hive-Iceberg upgrade, it is worth checking if is there anything that should be ported from there as well - or otherwise, we can get unexpected behavior.

I was able to narrow down the promising commits to 7:

To gather them, I got the git commits between the 1.1.0 and 1.2.1 tags but only for the mr and hive-metastore folders with those commands:

zsoltmiskolczi@zsmiskolczi-MBP16 iceberg % git log apache-iceberg-1.1.0...apache-iceberg-1.2.1 --pretty=tformat:"%H %s %cs" --reverse -- hive-metastore
6b8f7e0e31a81029b478e7757aba749f5ed27f42 Hive: Set the database owner on namespace creation (#6045) 2022-11-28
6725dc4d5168f41180d4e4acb73fde5b78abf062 Hive: Merge identical catch branch (#6477) 2022-12-22
fede493d59f17ff2bfc0744b296d90bd36130386 Hive: Lock hardening (#6451) 2023-01-11
df87b2e214c7576184e85e1be145069fc0c4d871 Hive: Make UGI current user the owner of new Hive objects (#6324) 2023-01-18
881be5e5d3746da0d8f1a837d2d1bb0f83776b81 Build: Fix minor error-prone warnings (#6629) 2023-01-23
81bf8d30766b1b129b87abde15239645cb127046 Core: Avoid creating new metadata file on registerTable (#6591) 2023-01-25
333227fbd13821365cec1bdbfcb9314a239bea0f Hive: Refactor commit lock mechanism from HiveTableOperations (#6648) 2023-02-03
ba4818b871d36eb34fcdd20dc7f824447f5daa4a Use UGI shortUserName as the default owner of Hive objects (#6955) 2023-03-07

zsoltmiskolczi@zsmiskolczi-MBP16 iceberg % git log apache-iceberg-1.1.0...apache-iceberg-1.2.1 --pretty=tformat:"%H %s %cs" --reverse -- mr
b5102a6f6603b4ec776c849b2c0e5f212366f11d Build: Bump jackson-annotations from 2.14.0 to 2.14.1 (#6280) 2022-11-28
fede493d59f17ff2bfc0744b296d90bd36130386 Hive: Lock hardening (#6451) 2023-01-11
ab6ba6801cddd70d261357b514103fd8e95775f9 Build: Bump jackson-annotations from 2.14.1 to 2.14.2 (#6687) 2023-02-16

As you can see, there are 8 commits but one of them modified both folders.

I started to cherry-pick them but I haven't finished with it yet. I'll share the result when I'm finished.
According to my current knowledge, there are two commits that are hard to handle: fede493d59f17ff2bfc0744b296d90bd36130386 and 333227fbd13821365cec1bdbfcb9314a239bea0f. Both of them modify HiveTableOperations.java and this file contains a lot of differences between Hive-Iceberg and Iceberg. We have to make sure that we don't introduce new issues during the porting: there can be extra development in the Hive repository that is completely different from the Iceberg part. And also, based on my experience, there can be old changes that haven't been ported at all.

@zhangbutao , what do you thing about that?

@zhangbutao
Copy link
Contributor Author

@InvisibleProgrammer Thanks for sharing your thought! The two commits fede493d59f17ff2bfc0744b296d90bd36130386 and 333227fbd13821365cec1bdbfcb9314a239bea0f are realy hard to deal. We shoud carefully handle these commits. I also hope the author of two commits can give us some info to evaluate if it is very necessary to port in current Hive iceberg-handler moduel @pvary .

And also, based on my experience, there can be old changes that haven't been ported at all.

I think that is possible. That's great if you can find these valuable old changes. Thanks for your hard work!

@zhangbutao
Copy link
Contributor Author

@deniskuzZ I am thinking if we can merge PR firstly as i don't want this to block new feature(iceberg branch&tag) integration. In the meantime, @InvisibleProgrammer can continue to port other iceberg commits.
Thanks.

@InvisibleProgrammer
Copy link
Contributor

InvisibleProgrammer commented Apr 26, 2023

Feel free to do that. The only drawback is that you can miss some catalog-related changes. We did the same approach in the previous upgrade.

Anyway, it doesn't block any upgrade as if the catalog changes are not ported, it is not a full upgrade :D

@pvary
Copy link
Contributor

pvary commented Apr 26, 2023

@InvisibleProgrammer Thanks for sharing your thought! The two commits fede493d59f17ff2bfc0744b296d90bd36130386 and 333227fbd13821365cec1bdbfcb9314a239bea0f are realy hard to deal. We shoud carefully handle these commits. I also hope the author of two commits can give us some info to evaluate if it is very necessary to port in current Hive iceberg-handler moduel @pvary .

There is a new commit which could be helpful, because it removes the need to issue a HiveLock on alter table:

  • c3232b664745ebf761b6a74f4c5b55cc48bfd209: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 - This is based on 333227fbd13821365cec1bdbfcb9314a239bea0f and fede493d59f17ff2bfc0744b296d90bd36130386. Has to be a parallel change on Hive/Impala and every writers of the Iceberg table, but fixes stability and enhances commit performance
  • 333227fbd13821365cec1bdbfcb9314a239bea0f - Hive: Refactor commit lock mechanism from HiveTableOperations. This is mostly a refactoring to make it possible to do c3232b664745ebf761b6a74f4c5b55cc48bfd209
  • fede493d59f17ff2bfc0744b296d90bd36130386 - Hive: Lock hardening (#6451) - makes sure that the Lock used by the Iceberg commit are cleared up... If you do not have stability issues with stuck Hive Locks then you might skip backporting it.

@zhangbutao
Copy link
Contributor Author

@pvary Great! Thanks a lot!

@InvisibleProgrammer
Copy link
Contributor

@pvary , let me reflect on your writings one by one:

[* 333227fbd13821365cec1bdbfcb9314a239bea0f](c3232b664745ebf761b6a74f4c5b55cc48bfd209: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 - This is based on 333227fbd13821365cec1bdbfcb9314a239bea0f and fede493d59f17ff2bfc0744b296d90bd36130386. Has to be a parallel change on Hive/Impala and every writers of the Iceberg table, but fixes stability and enhances commit performance)c3232b664745ebf761b6a74f4c5b55cc48bfd209: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 - This is based on 333227fbd13821365cec1bdbfcb9314a239bea0f and fede493d59f17ff2bfc0744b296d90bd36130386. Has to be a parallel change on Hive/Impala and every writers of the Iceberg table, but fixes stability and enhances commit performance

That is a pretty cool change. I'm pretty sure it is worth porting. But I'm not sure if we have to port it during the 1.2.1 upgrade. What if I create a ticket to port it after we finish the 1.2.1 upgrade?

333227fbd13821365cec1bdbfcb9314a239bea0f - Hive: Refactor commit lock mechanism from HiveTableOperations. This is mostly a refactoring to make it possible to do c3232b664745ebf761b6a74f4c5b55cc48bfd209

That looks trivial, it is easy to port it during this update. Thank your for the context.

fede493d59f17ff2bfc0744b296d90bd36130386 - Hive: Lock hardening (#6451) - makes sure that the Lock used by the Iceberg commit are cleared up... If you do not have stability issues with stuck Hive Locks then you might skip backporting it.

That is the tricky and ugly one that concerns me the most: The HiveTableOperations class is almost 1000 lines long in the Hive repository and about 500 lines in the Iceberg one. And it looks like that commit is the root cause of the difference. But as you wrote it should be released together with all writers, it makes it not only ugly but evil as well.
What do you think, even if we have no stability issues with Hive Locks, is that worth porting?
What do you think, what would be the best way to handle it?
My first thought is the same as at 333227fbd13821365cec1bdbfcb9314a239bea0f: we should port it in a separated ticket after the 1.2.1 upgrade. But I don't want to keep that significant difference between the two repositories for a long time.
And also, do you have any suggestion about how it should be handled in the community? I mean, I assume we have to start a conversation that includes Impala and other components as well. I'm not even know, do you know how many project can be affected?

@deniskuzZ
Copy link
Member

deniskuzZ commented Apr 27, 2023

@zhangbutao, @InvisibleProgrammer could we please create a backport ticket that includes:

  1. c3232b664745ebf761b6a74f4c5b55cc48bfd209: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882;
  2. 333227fbd13821365cec1bdbfcb9314a239bea0f - Hive: Refactor commit lock mechanism from HiveTableOperations;
  3. fede493d59f17ff2bfc0744b296d90bd36130386 - Hive: Lock hardening
    '+' what is in Zsolt's list

@deniskuzZ deniskuzZ merged commit 6331e98 into apache:master Apr 27, 2023
@zhangbutao
Copy link
Contributor Author

@deniskuzZ Sure, will create tickets to backport these iceberg commits. I guess @InvisibleProgrammer is woking on that and i am also pleasure to do these stuff if @InvisibleProgrammer need some assistance.
Thanks.

@InvisibleProgrammer
Copy link
Contributor

https://issues.apache.org/jira/browse/HIVE-27306

@pvary
Copy link
Contributor

pvary commented May 2, 2023

@pvary , let me reflect on your writings one by one:

[* 333227fbd13821365cec1bdbfcb9314a239bea0f](c3232b664745ebf761b6a74f4c5b55cc48bfd209: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 - This is based on 333227fbd13821365cec1bdbfcb9314a239bea0f and fede493d59f17ff2bfc0744b296d90bd36130386. Has to be a parallel change on Hive/Impala and every writers of the Iceberg table, but fixes stability and enhances commit performance)c3232b664745ebf761b6a74f4c5b55cc48bfd209: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 - This is based on 333227fbd13821365cec1bdbfcb9314a239bea0f and fede493d59f17ff2bfc0744b296d90bd36130386. Has to be a parallel change on Hive/Impala and every writers of the Iceberg table, but fixes stability and enhances commit performance

That is a pretty cool change. I'm pretty sure it is worth porting. But I'm not sure if we have to port it during the 1.2.1 upgrade. What if I create a ticket to port it after we finish the 1.2.1 upgrade?

This is the somewhat evil one, as if the feature is turned on, then it should happen for all of the writers at the same time (there are configs which makes it possible for all of the use-cases we were able to come up during the review rounds).

fede493d59f17ff2bfc0744b296d90bd36130386 - Hive: Lock hardening (#6451) - makes sure that the Lock used by the Iceberg commit are cleared up... If you do not have stability issues with stuck Hive Locks then you might skip backporting it.

That is the tricky and ugly one that concerns me the most: The HiveTableOperations class is almost 1000 lines long in the Hive repository and about 500 lines in the Iceberg one.

This is just changes some of the retry mechanism. Hard to backport, but no other risk, definitely no compatibility concerns here.
You might wanted to talk about c3232b664745ebf761b6a74f4c5b55cc48bfd209 when you mention interproject sync, but I think it is not strictly necessary because of the configuration possibilities.

yeahyung pushed a commit to yeahyung/hive that referenced this pull request Jul 20, 2023
…by Peter Vary, Zsolt Miskolczi, Denys Kuzmenko)

Closes apache#4252
tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023
…by Peter Vary, Zsolt Miskolczi, Denys Kuzmenko)

Closes apache#4252
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants