-
Notifications
You must be signed in to change notification settings - Fork 1
[HUDI-1468] Support custom clustering strategies and preserve commit … #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@codope Can you help tag places you made code changes on top of sk/clusteringImprovements branch ? This is a big PR, so it'll help me understand your changes better. |
0054ac9 to
b247b6e
Compare
|
|
||
| /** | ||
| * Pluggable implementation for writing data into new file groups based on ClusteringPlan. | ||
| */ | ||
| public abstract class ClusteringExecutionStrategy<T extends HoodieRecordPayload,I,K,O> implements Serializable { | ||
| public abstract class ClusteringExecutionStrategy<T extends HoodieRecordPayload, I, K, O> implements Serializable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satishkotha One change in this class is that I have moved the transform() method to corresponding strategy classes. There is some duplicate code but due to virtual key changes I had to do that. Otherwise, we need significant refactoring i.e. move the HoodieSparkKeyGeneratorFactory and several key generators to hudi-client-common instead of hudi-spark-client.
| HoodieFileReader<R> baseFileReader, HoodieMergedLogRecordScanner scanner, Schema schema, String payloadClass, | ||
| Option<Pair<String,String>> simpleKeyGenFieldsOpt) throws IOException { | ||
| Iterator<R> baseIterator = baseFileReader.getRecordIterator(schema); | ||
| while (baseIterator.hasNext()) { | ||
| GenericRecord record = (GenericRecord) baseIterator.next(); | ||
| HoodieRecord<T> hoodieRecord = simpleKeyGenFieldsOpt.isPresent() | ||
| HoodieRecord<? extends HoodieRecordPayload> hoodieRecord = simpleKeyGenFieldsOpt.isPresent() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satishkotha Here's another change which I had to do to accommodate virtual keys changes https://github.com/apache/hudi/pull/3315/files#diff-ec6e010de1137a42412dad69d088607b543c7904e3e5a68e460fc64c3e7868d1
| /** | ||
| * Transform IndexedRecord into HoodieRecord. | ||
| */ | ||
| private HoodieRecord<T> transform(IndexedRecord indexedRecord) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transform method moved from ClusteringExecutionStrategy.
codope
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@satishkotha I have commented where I have made changes. Apart from those files, I added tests in TestHoodieClientOnCopyOnWriteStorage and TestHoodieMergeOnReadTable.
…metadata as part of clustering Fix clustering tests with commit metadata Resolve conflicts Resolve checkstyle issues
b247b6e to
c734fc2
Compare
…metadata as part of clustering
Fix clustering tests with commit metadata
Tips
What is the purpose of the pull request
(For example: This pull request adds quick-start document.)
Brief change log
(for example:)
Verify this pull request
(Please pick either of the following options)
This pull request is a trivial rework / code cleanup without any test coverage.
(or)
This pull request is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.