[GOBBLIN-1774] Util for detecting non optional uniontype columns based on Hive Table metadata#3632
Merged
Will-Lo merged 1 commit intoapache:masterfrom Feb 13, 2023
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3632 +/- ##
============================================
+ Coverage 46.56% 46.61% +0.04%
- Complexity 10666 10707 +41
============================================
Files 2133 2133
Lines 83541 83612 +71
Branches 9288 9299 +11
============================================
+ Hits 38905 38977 +72
Misses 41074 41074
+ Partials 3562 3561 -1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
bd2f2fc to
79323de
Compare
umustafi
reviewed
Feb 1, 2023
...ive-registration/src/test/java/org/apache/gobblin/hive/metastore/HiveMetaStoreUtilsTest.java
Outdated
Show resolved
Hide resolved
vikrambohra
suggested changes
Feb 6, 2023
...in-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreUtils.java
Outdated
Show resolved
Hide resolved
...in-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreUtils.java
Outdated
Show resolved
Hide resolved
79323de to
e7255cb
Compare
vikrambohra
approved these changes
Feb 7, 2023
| * See https://github.com/apache/iceberg/issues/189 | ||
| * Util for detecting if a table has a complex union (aka non-optional unions) column types. | ||
| * | ||
| * @param t |
ZihanLi58
reviewed
Feb 8, 2023
| return t.getColumns().stream() | ||
| .map(HiveRegistrationUnit.Column::getType) | ||
| .map(Object::toString) | ||
| .anyMatch(columnType -> columnType.contains("uniontype")); |
Contributor
There was a problem hiding this comment.
Seems to me we only check union type but not non-optional union type. Is this intentional? If so, please update the description to reflect that as well.
ZihanLi58
reviewed
Feb 9, 2023
...in-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreUtils.java
Outdated
Show resolved
Hide resolved
homatthew
commented
Feb 10, 2023
| if (!isAvroFormat(hiveTable)) { | ||
| // All values in ORC are optional / nullable | ||
| return false; | ||
| if (hiveTable.getProps().contains("avro.schema.literal")) { |
Contributor
Author
There was a problem hiding this comment.
This case is true for all tables written and managed by Gobblin
homatthew
commented
Feb 10, 2023
| .map(HiveRegistrationUnit.Column::getType) | ||
| .map(Object::toString) | ||
| .anyMatch(columnType -> columnType.contains("uniontype") && !columnType.contains("void")); | ||
| if (isNonAvroFormat(hiveTable)) { |
Contributor
Author
There was a problem hiding this comment.
This is a fallback case if schema literal is not set. Where we can use the ORC type parser to determine if the column is a non-optional union.
This does not work if the underlying table is not ORC based
fd8276c to
12318bf
Compare
vikrambohra
suggested changes
Feb 10, 2023
...in-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreUtils.java
Outdated
Show resolved
Hide resolved
...in-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreUtils.java
Outdated
Show resolved
Hide resolved
...in-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreUtils.java
Outdated
Show resolved
Hide resolved
...in-hive-registration/src/main/java/org/apache/gobblin/hive/metastore/HiveMetaStoreUtils.java
Outdated
Show resolved
Hide resolved
a4b426d to
ab7ef60
Compare
- This util will be used across GMIP and compaction to handle non optional unions Non optional unions are compatible with Avro / Orc but not in Iceberg, so special workarounds are necessary to have tables with both types of data
ab7ef60 to
91aaa25
Compare
Contributor
Author
|
Squashed |
phet
added a commit
to phet/gobblin
that referenced
this pull request
Feb 13, 2023
* upstream/master: [GOBBLIN-1774] Util for detecting non optional uniontypes Hive tables (apache#3632) [GOBBLIN-1773] Fix bugs in quota manager (apache#3636) [GOBBLIN-1782] Fix Merge State for Flow Pending Resume statuses (apache#3639) [GOBBLIN-1755] Support extended ACLs and sticky bit for file based distcp (apache#3616) [GOBBLIN-1780] Refactor/rename YarnServiceIT to YarnServiceTest (apache#3637) [GOBBLIN-1778] Add house keeping thread in DagManager to periodically sync in memory state with mysql table (apache#3635) Register gauge metrics for change monitors (apache#3634)
4 tasks
phet
added a commit
to phet/gobblin
that referenced
this pull request
Mar 24, 2023
* upstream/master: [GOBBLIN-1774] Util for detecting non optional uniontypes Hive tables (apache#3632) [GOBBLIN-1773] Fix bugs in quota manager (apache#3636) [GOBBLIN-1782] Fix Merge State for Flow Pending Resume statuses (apache#3639) [GOBBLIN-1755] Support extended ACLs and sticky bit for file based distcp (apache#3616) [GOBBLIN-1780] Refactor/rename YarnServiceIT to YarnServiceTest (apache#3637) [GOBBLIN-1778] Add house keeping thread in DagManager to periodically sync in memory state with mysql table (apache#3635) Register gauge metrics for change monitors (apache#3634)
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Problem Statement:
uniontypes(i.e. uniontypes that can be null). These are supported as struct types instead.Tests
Commits