-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-57] Added Orc Writer to Support Orc in Hudi #2320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
manijndl7
commented
Dec 10, 2020
- Added Orc Writer Integrated it with Factory with all Major supported Data Types.
- Tested On docker env with hive-exec-2.3.1 and orc-core-1.3.3
- Added Orc Utils to Support Major Operation with Orc files.
|
@vinothchandar @n3nash can you please have a look |
Codecov Report
@@ Coverage Diff @@
## master #2320 +/- ##
============================================
- Coverage 52.29% 52.18% -0.12%
Complexity 2630 2630
============================================
Files 329 331 +2
Lines 14743 14773 +30
Branches 1484 1488 +4
============================================
- Hits 7710 7709 -1
- Misses 6419 6449 +30
- Partials 614 615 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
Hi @manijndl7 , thanks for your contribution. Excited to see this feature coming! |
|
Thanks @garyli1019 for checking this !! yea its in my pipeline , i am currently focusing on writing ORC Reader then i will commit the test cases. |
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good start! I will mark this PR WIP for now.
Is there a good first milestone we are targeting here? for e.g Write data using Spark Datasource and read via Hive, for COW tables.
| <dependency> | ||
| <groupId>org.apache.hive</groupId> | ||
| <artifactId>hive-exec</artifactId> | ||
| <version>2.3.1</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use the hive.version property that we already have?
| <dependency> | ||
| <groupId>org.apache.orc</groupId> | ||
| <artifactId>orc-core</artifactId> | ||
| <version>1.3.3</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
introduce a property for orc version just like we ahve for parquet?
| private static final String DEFAULT_MERGE_DATA_VALIDATION_CHECK_ENABLED = "false"; | ||
|
|
||
|
|
||
| public static final String ORC_STRIPE_SIZE = "hoodie.hive.exec.orc.default.stripe.size"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could move these to HoodieStorageConfig along with other parquet options?
|
|
||
| public class OrcUtils { | ||
|
|
||
| public static TypeInfo getOrcField(Schema fieldSchema) throws IllegalArgumentException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this something you have written from scratch? wondering if there is no built in way like parquet-avro to aid this conversion. Guess not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this code is borrowed or reused from somewhere else, then we would have to cite/credit the source
|
AFAIK @prashantwason @vingov are also interested in this. @manijndl7 are you still actively working on this? or do you mind if one of them take this across finish line/merge efforts? |
|
Yes, we are interested in ORC support. We have already started an internal project towards this end. |
|
@prashantwason please feel free to grab HUDI-57. |
|
Hi @vinothchandar , my progress is bit slow because of current project going on i would be happy if someone make it to the finish line |
|
@manijndl7 thanks for letting us know! no worries! hope to see you back soon |
|
@vinothchandar can i close this PR ? since other people will work on this. |
|
yes. agreed |