Skip to content

Conversation

@InvisibleProgrammer
Copy link
Contributor

@InvisibleProgrammer InvisibleProgrammer commented May 4, 2023

It is about updating Iceberg catalog related changes after upgrading the Iceberg dependency to 1.2.1 (#4252)

It contains commits cherry picked that changed files in the mr folder (iceberg-handler) and the hive-metastore folder (iceberg-catalog):

git log apache-iceberg-1.1.0...apache-iceberg-1.2.1 --pretty=oneline --reverse -- mr
git log apache-iceberg-1.1.0...apache-iceberg-1.2.1 --pretty=oneline --reverse -- hive-metastore

And additionally, it contains

  • locking mechanism refactor (it is released in Iceberg 1.3.0 but we decided to port them as well
  • some extra commits (I will call them prerequisites) that are necessary to do the porting. They should be ported back a long-time ago.
  • and some extra refactor to get the Iceberg and Hive repositories closer. For locking, Iceberg and Hive uses different naming and code structure. I did some refactor so after this, the Hive version of the code will be way closer to the Iceberg version.
  • and fixes to make the tests green. Hive and Iceberg has different locking level. And also, after the locking refactors, Iceberg wants to handle locking as well. Iceberg uses EXCLUSE locks, Hive uses EXCL_WRITE locks. Now, we keep the EXCL_WRITE locks and propagate Hive transaction ids to the Iceberg code.

What changes were proposed in this pull request?

Commits in porting order

b215c4861a0c967f8d2a2b0444697d7877d33992 Hive: Set the Table owner on table creation (#5763)
6b8f7e0e31a81029b478e7757aba749f5ed27f42 Hive: Set the database owner on namespace creation (#6045) 2022-11-28
b5102a6f6603b4ec776c849b2c0e5f212366f11d Build: Bump jackson-annotations from 2.14.0 to 2.14.1 (#6280) 2022-11-28 — Skip - build.gradle changes only
6725dc4d5168f41180d4e4acb73fde5b78abf062 Hive: Merge identical catch branch (#6477) 2022-12-22

Prerequisite for Locking hardening: https://issues.apache.org/jira/browse/HIVE-26576 - Alter table calls on Iceberg tables can inadvertently change metadata_location

fede493d59f17ff2bfc0744b296d90bd36130386 Hive: Lock hardening (#6451) 2023-01-11.
df87b2e214c7576184e85e1be145069fc0c4d871 Hive: Make UGI current user the owner of new Hive objects (#6324) 2023-01-18
881be5e5d3746da0d8f1a837d2d1bb0f83776b81 Build: Fix minor error-prone warnings (#6629) 2023-01-23

2843db8cbe2a2b6eba2209ec6f758d2530d5c94b Core: Add SerializableTable (#2403)

Prerequisite for commit lock mechanism:
apache/iceberg@28b5d0e

333227fbd13821365cec1bdbfcb9314a239bea0f Hive: Refactor commit lock mechanism from HiveTableOperations (#6648) 2023-02-03

ba4818b871d36eb34fcdd20dc7f824447f5daa4a Use UGI shortUserName as the default owner of Hive objects (#6955) 2023-03-07
ab6ba6801cddd70d261357b514103fd8e95775f9 Build: Bump jackson-annotations from 2.14.1 to 2.14.2 (#6687) 2023-02-16 - can be skipped. No build.gradle files in the hive repo
c3232b664745ebf761b6a74f4c5b55cc48bfd209: Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882;

extra commits to pass the tests:
2c2903250293adc91ce179478a2a3a368780cd7e: Nessie: Use AssertJ assertions (#2684)

Extra commits during the review:
484c5e7 - Port apache/iceberg@81bf8d3: Core: Avoid creating new metadata file on registerTable (apache/iceberg#6591)

Why are the changes needed?

Last upgrade was when we upgraded from 0.14.1 to 1.1.0

Does this PR introduce any user-facing change?

No.

Commits added with 1.3.0 release:
ee08ce5455df57a347081594dfc6ea5ca1380c74 MR: Remove deprecated AssertHelpers (#7159) 2023-03-21 - Done
ef5c7318aa98c016cc32c4595cc65d70ec1ab7be Hive: Support customizable ClientPool (#6698) 2023-03-24 - Done
37cb7caa75e7419a683f84793b77190f4cd3c899 MR: Skip filter translation if there are no leaves (#7123) 2023-04-24 - Nothing to do, it leads to empty commit
c3232b664745ebf761b6a74f4c5b55cc48bfd209 Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 (#6570) 2023-04-25 - Nothing to do. Already ported.
bbacaf41e23923af3640f3cba3e7aec834abb9e8 Hive: Clean up expired metastore clients (#7310) 2023-04-26 - Done
79c88a1775c4e2019fff00de7520826388158424 Hive: Support connecting to multiple HMS-Catalog on same HMS URL (#7441) 2023-05-01 - Done
cad9c6e795ddd5d3e7b98e1b19940daf1fdb1450 Hive: Remove deprecated AssertHelpers (#7482) 2023-05-02 - Done
f07bedc86fae9a271f6d585081345eabb8982cee Build: Bump com.fasterxml.jackson.core:jackson-annotations (#7601) 2023-05-17 - Done
be6235cf753aa2518404177b1b6f884bff60bf04 Build: Bump com.esotericsoftware:kryo-shaded from 4.0.2 to 4.0.3 (#7669) 2023-05-22 - Done

Extra necessary commit to fix failing TestCatalogs#testLoadTableFromLocation test: apache/iceberg#5657

How was this patch tested?

Precommit tests

@InvisibleProgrammer
Copy link
Contributor Author

Hi, @pvary .

This is the PR that contains the changes that we discussed in #4252.

At current phase, it is a draft and I'm in the middle of porting the commits but I think it is the time to start a conversation about it.

First of all, I have a question about the namings: do you know why Iceberg and Hive has different naming for the class using for locking? In Hive, it is called HiveCommitLog and in Iceberg, it is HiveMetastoreLock.

Thanks,
Zsolt

@pvary
Copy link
Contributor

pvary commented May 8, 2023

First of all, I have a question about the namings: do you know why Iceberg and Hive has different naming for the class using for locking? In Hive, it is called HiveCommitLog and in Iceberg, it is HiveMetastoreLock.

It was after my time, but I expect that @szlta give up on getting his refactor into Iceberg, but needed it in Hive.

@szlta
Copy link
Contributor

szlta commented May 10, 2023

First of all, I have a question about the namings: do you know why Iceberg and Hive has different naming for the class using for locking? In Hive, it is called HiveCommitLog and in Iceberg, it is HiveMetastoreLock.

It was after my time, but I expect that @szlta give up on getting his refactor into Iceberg, but needed it in Hive.

Yup, I did that refactor with 2e916a3 so that we can acquire the commit lock from HiveIcebergMetaHook too. The reason why it was not ported to Iceberg was that Iceberg was depending on an old Hive version where some of the hooks (especially for alter_table) didn't exist yet.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug C 8 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 75 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@simhadri-g simhadri-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1 (non-binding)

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@deniskuzZ deniskuzZ merged commit 462a6e0 into apache:master Jun 28, 2023
yeahyung pushed a commit to yeahyung/hive that referenced this pull request Jul 20, 2023
… Miskolczi, Dmitriy Fingerman, reviewed by Ayush Saxena, Denys Kuzmenko, Simhadri Govindappa, Sourabh Badhya)

Closes (apache#4291)
tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023
… Miskolczi, Dmitriy Fingerman, reviewed by Ayush Saxena, Denys Kuzmenko, Simhadri Govindappa, Sourabh Badhya)

Closes (apache#4291)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.