-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Implement HoodieLogFormat replacing Avro as the default log format #162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@prazanna why does it say CLA not signed for you ? |
vinothchandar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall ..
We still need to implement rollbacks and as-it-is, this will break the HoodieRealtimeInputFormat until that's fixed. I will make the changes there..
|
|
||
| public enum HoodieFileFormat { | ||
| PARQUET(".parquet"), AVRO(".avro"); | ||
| PARQUET(".parquet"), HOODIE_LOG(".log"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be just LOG instead of HOODIE_LOG ?
| /** | ||
| * Writer interface to allow appending block to this file format | ||
| */ | ||
| interface Writer extends Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sub interfaces. ;). I like these and also nested classes (which most folks seem unexcited for some reason)
| return HoodieCorruptBlock.fromBytes(content); | ||
| } | ||
|
|
||
| private boolean isBlockCorrupt(int blocksize) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of seeking once to find if its corrupt and seek again to read the data, can we determine the corrupt block from just the exception thrown when we attempt to read?
| @@ -0,0 +1,164 @@ | |||
| /* | |||
| * Copyright (c) 2016 Uber Technologies, Inc. (hoodie-dev-group@uber.com) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2017..
| return HoodieLogBlockType.AVRO_DATA_BLOCK; | ||
| } | ||
|
|
||
| public static HoodieLogBlock fromBytes(byte[] content, Schema readerSchema) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be move to AvroUtils?
| String baseCommitTime, int version) { | ||
| return String.format("%s_%s%s.%d", fileId, baseCommitTime, logFileExtension, version); | ||
| String baseCommitTime, int version) { | ||
| return "." + String.format("%s_%s%s.%d", fileId, baseCommitTime, logFileExtension, version); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets pull "." into a constant..
…ith hudi incr source (apache#7132) (apache#162) Co-authored-by: Sivabalan Narayanan <n.siva.b@gmail.com>
Fixes #149
Implemented a custom log format for hoodie merge on read