Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -303,8 +303,8 @@ private List<MergeOnReadInputSplit> buildFileIndex() {
}

HoodieTableFileSystemView fsView = new HoodieTableFileSystemView(metaClient,
metaClient.getActiveTimeline().getCommitsTimeline()
.filterCompletedInstants(), fileStatuses);
// file-slice after pending compaction-requested instant-time is also considered valid
metaClient.getCommitsAndCompactionTimeline().filterCompletedAndCompactionInstants(), fileStatuses);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this has the effect of including compaction.requested etc in the timeline passed to the fs view

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCommitsAndCompactionTimeline() is really getCommitsOrCompactionTimeline()

String latestCommit = fsView.getLastInstant().get().getTimestamp();
final String mergeType = this.conf.getString(FlinkOptions.MERGE_TYPE);
final AtomicInteger cnt = new AtomicInteger(0);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -463,6 +463,32 @@ void testReadWithWiderSchema(HoodieTableType tableType) throws Exception {
TestData.assertRowDataEquals(result, TestData.DATA_SET_INSERT);
}

/**
* Test reading file groups with compaction plan scheduled and delta logs.
* File-slice after pending compaction-requested instant-time should also be considered valid.
*/
@Test
void testReadMORWithCompactionPlanScheduled() throws Exception {
Map<String, String> options = new HashMap<>();
// compact for each commit
options.put(FlinkOptions.COMPACTION_DELTA_COMMITS.key(), "1");
options.put(FlinkOptions.COMPACTION_ASYNC_ENABLED.key(), "false");
beforeEach(HoodieTableType.MERGE_ON_READ, options);

// write three commits
for (int i = 0; i < 6; i += 2) {
List<RowData> dataset = TestData.dataSetInsert(i + 1, i + 2);
TestData.writeData(dataset, conf);
}

InputFormat<RowData, ?> inputFormat1 = this.tableSource.getInputFormat();
assertThat(inputFormat1, instanceOf(MergeOnReadInputFormat.class));

List<RowData> actual = readData(inputFormat1);
final List<RowData> expected = TestData.dataSetInsert(1, 2, 3, 4, 5, 6);
TestData.assertRowDataEquals(actual, expected);
}

// -------------------------------------------------------------------------
// Utilities
// -------------------------------------------------------------------------
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -237,7 +237,7 @@ public static List<Pair<Option<HoodieBaseFile>, List<String>>> groupLogsByBaseFi
try {
// Both commit and delta-commits are included - pick the latest completed one
Option<HoodieInstant> latestCompletedInstant =
metaClient.getActiveTimeline().getCommitsTimeline().filterCompletedInstants().lastInstant();
metaClient.getCommitsAndCompactionTimeline().filterCompletedAndCompactionInstants().lastInstant();

Stream<FileSlice> latestFileSlices = latestCompletedInstant
.map(instant -> fsView.getLatestMergedFileSlicesBeforeOrOn(relPartitionPath, instant.getTimestamp()))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -151,8 +151,9 @@ class MergeOnReadSnapshotRelation(val sqlContext: SQLContext,
// Load files from the global paths if it has defined to be compatible with the original mode
val inMemoryFileIndex = HoodieSparkUtils.createInMemoryFileIndex(sqlContext.sparkSession, globPaths.get)
val fsView = new HoodieTableFileSystemView(metaClient,
metaClient.getActiveTimeline.getCommitsTimeline
.filterCompletedInstants, inMemoryFileIndex.allFiles().toArray)
// file-slice after pending compaction-requested instant-time is also considered valid
metaClient.getCommitsAndCompactionTimeline.filterCompletedAndCompactionInstants,
inMemoryFileIndex.allFiles().toArray)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vinothchandar @nsivabalan, i need your help for this view, i dive into the code a little, and the line confused me:

public HoodieTimeline getCommitsAndCompactionTimeline() {
, and this line:
private boolean isFileSliceCommitted(FileSlice slice) {
,

the point i'm confused at is how we can decide the log files with base commit time of a pending compaction action is committed successfully ? I see some code to compare the timestamp but that is not enough, some intermediate or corrupt files may also have the log files with pending compaction instant time as base commit time right ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes there could be pending writes like that. let me grok this and get back to you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally filter log blocks , not log files. i.e we would consider all log files written against the same base commit time and read through them to resolve

val partitionPaths = fsView.getLatestBaseFiles.iterator().asScala.toList.map(_.getFileStatus.getPath.getParent)


Expand Down