Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
import org.apache.hudi.common.engine.HoodieLocalEngineContext;
import org.apache.hudi.common.model.FileSlice;
import org.apache.hudi.common.model.HoodieBaseFile;
import org.apache.hudi.common.model.HoodieLogFile;
import org.apache.hudi.common.model.HoodieTableQueryType;
import org.apache.hudi.common.table.HoodieTableConfig;
import org.apache.hudi.common.table.HoodieTableMetaClient;
Expand Down Expand Up @@ -253,6 +254,7 @@ private List<FileStatus> listStatusForSnapshotMode(JobConf job,
partitionedFileSlices.values()
.stream()
.flatMap(Collection::stream)
.filter(fileSlice -> checkIfValidFileSlice(fileSlice))
.map(fileSlice -> createFileStatusUnchecked(fileSlice, fileIndex, virtualKeyInfoOpt))
.collect(Collectors.toList())
);
Expand All @@ -261,6 +263,20 @@ private List<FileStatus> listStatusForSnapshotMode(JobConf job,
return targetFiles;
}

protected boolean checkIfValidFileSlice(FileSlice fileSlice) {
Option<HoodieBaseFile> baseFileOpt = fileSlice.getBaseFile();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkIfValidFileSlice -> filterValidFileSlice ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkIfValidFileSlice -> filterValidFileSlice ?

I'm not sure that if a boolean returned method return may throw an IllegalXXXXException, it may be more appropriate to be named with checkXXXX. Is it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the naming convention, isValidFileSlice sounds better. Also, I notice that the override of this method in HoodieMergeOnReadTableInputFormat is logically same. So, why do we need to override?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, i misread. I take it back. It is not the same. I think this is good to land. Any method rename you can take up in a subsequent PR as we're pretty close to the release.

Copy link
Contributor Author

@5herhom 5herhom Aug 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per the naming convention, isValidFileSlice sounds better. Also, I notice that the override of this method in HoodieMergeOnReadTableInputFormat is logically same. So, why do we need to override?

Re: Also, I notice that the override of this method in HoodieMergeOnReadTableInputFormat is logically same. So, why do we need to override?

If do not override checkIfValidFileSlice(), the valid file slice in mor realtime query which match the condition !baseFileOpt.isPresent() && latestLogFileOpt.isPresent() will be removed in the filter.

Option<HoodieLogFile> latestLogFileOpt = fileSlice.getLatestLogFile();

if (baseFileOpt.isPresent()) {
return true;
} else if (latestLogFileOpt.isPresent()) {
// It happens when reading optimized query to mor.
return false;
} else {
throw new IllegalStateException("Invalid state: base-file has to be present for " + fileSlice.getFileId());
}
}

private void validate(List<FileStatus> targetFiles, List<FileStatus> legacyFileStatuses) {
List<FileStatus> diff = CollectionUtils.diff(targetFiles, legacyFileStatuses);
checkState(diff.isEmpty(), "Should be empty");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,18 @@ protected FileStatus createFileStatusUnchecked(FileSlice fileSlice, HiveHoodieTa
}
}

@Override
protected boolean checkIfValidFileSlice(FileSlice fileSlice) {
Option<HoodieBaseFile> baseFileOpt = fileSlice.getBaseFile();
Option<HoodieLogFile> latestLogFileOpt = fileSlice.getLatestLogFile();

if (baseFileOpt.isPresent() || latestLogFileOpt.isPresent()) {
return true;
} else {
throw new IllegalStateException("Invalid state: either base-file or log-file has to be present for " + fileSlice.getFileId());
}
}

/**
* Keep the logic of mor_incr_view as same as spark datasource.
* Step1: Get list of commits to be fetched based on start commit and max commits(for snapshot max commits is -1).
Expand Down