Skip to content

Conversation

@manijndl7
Copy link

  • Added Orc Writer Integrated it with Factory with all Major supported Data Types.
  • Tested On docker env with hive-exec-2.3.1 and orc-core-1.3.3
  • Added Orc Utils to Support Major Operation with Orc files.

@manijndl7
Copy link
Author

@vinothchandar @n3nash can you please have a look

@codecov-io
Copy link

codecov-io commented Dec 10, 2020

Codecov Report

Merging #2320 (f0d5b72) into master (bd9ccec) will decrease coverage by 0.11%.
The diff coverage is 6.45%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2320      +/-   ##
============================================
- Coverage     52.29%   52.18%   -0.12%     
  Complexity     2630     2630              
============================================
  Files           329      331       +2     
  Lines         14743    14773      +30     
  Branches       1484     1488       +4     
============================================
- Hits           7710     7709       -1     
- Misses         6419     6449      +30     
- Partials        614      615       +1     
Flag Coverage Δ Complexity Δ
hudicli 38.50% <ø> (ø) 0.00 <ø> (ø)
hudiclient 100.00% <ø> (ø) 0.00 <ø> (ø)
hudicommon 55.08% <100.00%> (-0.02%) 0.00 <0.00> (ø)
hudihadoopmr 32.39% <0.00%> (-0.56%) 0.00 <0.00> (ø)
huditimelineservice 65.30% <ø> (ø) 0.00 <ø> (ø)
hudiutilities 70.11% <ø> (ø) 0.00 <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
...a/org/apache/hudi/hadoop/HoodieOrcInputFormat.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
.../hadoop/realtime/HoodieOrcRealtimeInputFormat.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...org/apache/hudi/common/model/HoodieFileFormat.java 100.00% <100.00%> (ø) 3.00 <0.00> (ø)
...e/hudi/common/table/log/HoodieLogFormatWriter.java 77.11% <0.00%> (-1.70%) 23.00% <0.00%> (ø%)

@garyli1019
Copy link
Member

Hi @manijndl7 , thanks for your contribution. Excited to see this feature coming!
Would you write some test cases to verify this will work? Also, the end to end test is easy to write as well. Please check the test cases in hudi-spark module.

@manijndl7
Copy link
Author

Thanks @garyli1019 for checking this !! yea its in my pipeline , i am currently focusing on writing ORC Reader then i will commit the test cases.

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a good start! I will mark this PR WIP for now.

Is there a good first milestone we are targeting here? for e.g Write data using Spark Datasource and read via Hive, for COW tables.

<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>2.3.1</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use the hive.version property that we already have?

<dependency>
<groupId>org.apache.orc</groupId>
<artifactId>orc-core</artifactId>
<version>1.3.3</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introduce a property for orc version just like we ahve for parquet?

private static final String DEFAULT_MERGE_DATA_VALIDATION_CHECK_ENABLED = "false";


public static final String ORC_STRIPE_SIZE = "hoodie.hive.exec.orc.default.stripe.size";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could move these to HoodieStorageConfig along with other parquet options?


public class OrcUtils {

public static TypeInfo getOrcField(Schema fieldSchema) throws IllegalArgumentException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something you have written from scratch? wondering if there is no built in way like parquet-avro to aid this conversion. Guess not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this code is borrowed or reused from somewhere else, then we would have to cite/credit the source

@vinothchandar vinothchandar self-assigned this Dec 16, 2020
@vinothchandar vinothchandar added the status:in-progress Work in progress label Dec 16, 2020
@vinothchandar
Copy link
Member

AFAIK @prashantwason @vingov are also interested in this. @manijndl7 are you still actively working on this? or do you mind if one of them take this across finish line/merge efforts?

@prashantwason
Copy link
Member

Yes, we are interested in ORC support. We have already started an internal project towards this end.

@vinothchandar
Copy link
Member

@prashantwason please feel free to grab HUDI-57.

@manijndl7
Copy link
Author

Hi @vinothchandar , my progress is bit slow because of current project going on i would be happy if someone make it to the finish line

@vinothchandar
Copy link
Member

@manijndl7 thanks for letting us know! no worries! hope to see you back soon

@manijndl7
Copy link
Author

manijndl7 commented Feb 12, 2021

@vinothchandar can i close this PR ? since other people will work on this.

@vinothchandar
Copy link
Member

yes. agreed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status:in-progress Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants