-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-2480] FileSlice after pending compaction-requested instant-time… #3703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -151,8 +151,9 @@ class MergeOnReadSnapshotRelation(val sqlContext: SQLContext, | |||||
| // Load files from the global paths if it has defined to be compatible with the original mode | ||||||
| val inMemoryFileIndex = HoodieSparkUtils.createInMemoryFileIndex(sqlContext.sparkSession, globPaths.get) | ||||||
| val fsView = new HoodieTableFileSystemView(metaClient, | ||||||
| metaClient.getActiveTimeline.getCommitsTimeline | ||||||
| .filterCompletedInstants, inMemoryFileIndex.allFiles().toArray) | ||||||
| // file-slice after pending compaction-requested instant-time is also considered valid | ||||||
| metaClient.getCommitsAndCompactionTimeline.filterCompletedAndCompactionInstants, | ||||||
| inMemoryFileIndex.allFiles().toArray) | ||||||
|
||||||
| public HoodieTimeline getCommitsAndCompactionTimeline() { |
hudi/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieFileGroup.java
Line 120 in 5515a0d
| private boolean isFileSliceCommitted(FileSlice slice) { |
the point i'm confused at is how we can decide the log files with base commit time of a pending compaction action is committed successfully ? I see some code to compare the timestamp but that is not enough, some intermediate or corrupt files may also have the log files with pending compaction instant time as base commit time right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes there could be pending writes like that. let me grok this and get back to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally filter log blocks , not log files. i.e we would consider all log files written against the same base commit time and read through them to resolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has the effect of including
compaction.requestedetc in the timeline passed to the fs viewThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getCommitsAndCompactionTimeline()is reallygetCommitsOrCompactionTimeline()