-
Notifications
You must be signed in to change notification settings - Fork 3k
Fix NPE when counting entries #1077
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
When calling |
|
@rdblue @jerryshao Would you please help to review this PR at your convenience? It could be moved into the assert clause at the end, but IMHO a separate line might make it more clear and straightforward |
| private Schema lazySchema() { | ||
| if (schema == null) { | ||
| if (requestedSchema != null) { | ||
| if (requestedSchema != null && requestedSchema.size() != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. Iceberg should support projecting 0 columns.
|
Looks like there are two problems. First, public Type findType(String name) {
Preconditions.checkArgument(!name.isEmpty(), "Invalid column name: (empty)");
- return findType(lazyNameToId().get(name));
+ Integer id = lazyNameToId().get(name);
+ if (id != null) {
+ return findType(id);
+ }
+ return null;
}Second, if the file projection is null, it is still dereferenced. So we need to fix that as well in protected CloseableIterable<FileScanTask> planFiles(
TableOperations ops, Snapshot snapshot, Expression rowFilter, boolean caseSensitive, boolean colStats) {
CloseableIterable<ManifestFile> manifests = CloseableIterable.withNoopClose(snapshot.manifests());
- Schema fileSchema = new Schema(schema().findType("data_file").asStructType().fields());
+ Type fileProjection = schema().findType("data_file");
+ Schema fileSchema = fileProjection != null ? new Schema(fileProjection.asStructType().fields()) : new Schema();
String schemaString = SchemaParser.toJson(schema());
String specString = PartitionSpecParser.toJson(PartitionSpec.unpartitioned());
ResidualEvaluator residuals = ResidualEvaluator.unpartitioned(rowFilter);After that, adding this to the entries table test case works: |
|
@rdblue thanks for reviewing and sharing your idea! |
|
@rdblue The PR is updated. is updated as When meeting the null file projection (yielded by projecting 0 columns), it returns null instead of an empty schema, to address the issue of |
|
Part of the problem is that we changed this table over to use a As for why you're getting the current error, I'm not sure why your manifest doesn't have the Anyway, I think the solution is to remove the |
|
@waterlx, do you want to update this PR? |
|
@rdblue Yes, I will work on it during the weekend. |
|
@rdblue The PR is updated to yield an empty schema when projecting 0 columns, as you suggested here, Would you please help to review the curent changes at your most convenience? Thanks! |
|
After looking at this again, I don't think this should do any projection. From my last comment:
Data tasks should not project data, so this should ignore attempted projection. |
|
@waterlx, I tried the approach I suggested and it doesn't seem to match what we do for the file tables. Since this PR is a simple fix, I merged it. Thanks for fix this! Also, I had to rebase and squash by hand since this was a bit old. That's why it's marked closed instead of merged. |
…rows an IllegalArgumentException for partitioned tables)
- When running Spark aggregation query on "entries" metadata table, empty projection is passed in.
- However, data_file is required field, so violatesjava.lang.IllegalArgumentException: Missing required field: data_file in BuildAvroProjection.record
- apache#1077 fixes it only for non-partitioned tables, but only due to the (expected?) behavior in PruneColumns where empty structs are not pruned.
This PR is to address a NPE when counting on table's entries (found by @jerryshao), which could be triggered by
The stacktrace is as follow: