-
Notifications
You must be signed in to change notification settings - Fork 3k
Returns isUnpartitioned=true for VoidTransform on all fields #3059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I agree with Russell. It would be great to add some tests to this. The overall change looks good. |
|
@RussellSpitzer @rdblue Added the tests but not sure if I put them at the right place/file. PTAL :) |
RussellSpitzer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, I only had a minor question about whether we should test v1 tables who have had spec columns added and then removed but I think this isn't strictly necessary.
@RussellSpitzer Sorry for taking this long. I've added the test to check unpartitioned spec after removing all columns. PTAL. |
|
@RussellSpitzer spark is failing but I don't think it's relevant to this PR. I'm going to rebase and force push |
|
@xinbinhuang Let's double check that test, I'm worried that there may be some part of the code that assumes "isUnpartitioned" means empty partition spec when we actually define it now as "empty or all void". Let's make sure the error isn't because of that |
|
Ok so the test failure is real and is caused by the check here iceberg/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java Lines 543 to 544 in f5a7537
Which uses "isUnpartitioned" to check whether or not to use UnpartitionedDataWriter. UnpartitionedWriter passes through "null" as the value of the Spec which causes an NPE Where we attempt to copy through the values into the spec (which is all void transforms). Before this patch V1 Tables with only void transforms would be considered "partitioned" by this code and deal with this correctly (but perhaps more confusingly.) As far as I can tell we have a few ways to go forward here.
I think I am leaning towards 1 or 3. I feel like we are dealing with an unpartitioned spec and because of V1 this means we have to be a bit careful about how we deal with that. |
api/src/main/java/org/apache/iceberg/transforms/VoidTransform.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RussellSpitzer @rdblue I did it slightly different from what Russell suggested, as I think this's slightly cleaner and easier to reason about. WDYT?
| if (isPartitioned) { | ||
| this.partitionData = copyPartitionData(spec, newPartition, partitionData); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only copy if the spec is partitionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather we fix this in one of the other ways we discussed. This ends up feeling a little strange to me since you can call "withPartition" and the arg is ignored. I don't like when functions can take arguments but then not doing anything with them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aokolnychyi Do you mind taking a look at this change?
I have a chat with @RussellSpitzer offline, and would love to have your thoughts on this.
The current implementation
- works similar to how we do check for other methods inside the builder
iceberg/core/src/main/java/org/apache/iceberg/DataFiles.java
Lines 238 to 245 in 73ee251
public Builder withPartitionPath(String newPartitionPath) { Preconditions.checkArgument(isPartitioned || newPartitionPath.isEmpty(), "Cannot add partition data for an unpartitioned table"); if (!newPartitionPath.isEmpty()) { this.partitionData = fillFromPath(spec, newPartitionPath, partitionData); } return this; } iceberg/core/src/main/java/org/apache/iceberg/DataFiles.java
Lines 259 to 266 in 73ee251
public Builder withSplitOffsets(List<Long> offsets) { if (offsets != null) { this.splitOffsets = copyList(offsets); } else { this.splitOffsets = null; } return this; }
- Compatible with builder's whole building logics
- builder assign
partitionDatato benullif the spec is not partitioned:this.partitionData = isPartitioned ? newPartitionData(spec) : null; - And it do another check and pass in
nullto the finalGenericDataFile:specId, filePath, format, isPartitioned ? partitionData.copy() : null,
- builder assign
Alternatively, we are thinking:
GenericRecord emptyRecord = GenericRecord.create(spec.schema());
delegate = writerFactory.newDataWriter(outputFile, spec, emptyRecord);at around here:
iceberg/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java
Line 594 in 1e5abce
| delegate = writerFactory.newDataWriter(outputFile, spec, null); |
| this.spec = spec; | ||
| this.specId = spec.specId(); | ||
| this.isPartitioned = spec.fields().size() > 0; | ||
| this.isPartitioned = !spec.isUnpartitioned(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More inclusive check for isPartitioned to includes V1 table
| } | ||
|
|
||
| static PartitionData copyPartitionData(PartitionSpec spec, StructLike partitionData, PartitionData reuse) { | ||
| Preconditions.checkArgument(!spec.isUnpartitioned(), "Can't copy partition data to a unpartitioned table"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying method explicitly check for isUnpartitioned() in case this method is called outside the builder. (FileMetadata.java)
| this.spec = spec; | ||
| this.specId = spec.specId(); | ||
| this.isPartitioned = spec.fields().size() > 0; | ||
| this.isPartitioned = !spec.isUnpartitioned(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do similar thing to FileMetadata.java
| if (isPartitioned) { | ||
| this.partitionData = DataFiles.copyPartitionData(spec, newPartition, partitionData); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do similar thing to FileMetadata.java
|
@rdblue @aokolnychyi @kbendick would you help take a look when you have time? Would love to get this merged before it goes into stale. thank you :) |
|
Sorry it took me so long to get to this, I think this is good to go once the PR has been rebased |
3a5de40 to
a415f3a
Compare
|
(@RussellSpitzer sorry didn't see the last message from you) |
|
@xinbinhuang Looks good to me, once tests pass I think we are good to go. |
|
My commit title was inverted, mia culpa. For anyone looking this up in the future I meant that "all void transforms should be false" |
closes: #3014