-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[Python] support custom target name in partition spec builder #2689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi Jun, I think I have a very similar PR queued up on this. Let me review what you have here, and see what else needs to be added(I can rebase me PR). There is a lot of work in the partition spec that needs to be rolled out and I've been lagging on getting the PR's to the Open source |
|
@TGooch44 Thanks for the comment. Please let me know other related changes and I can add them. Also, we may ship them separately with multiple PRs. |
TGooch44
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. I'm not sure about the tox mypy issue for py38 when I run tox -epy38 locally on a fresh env, I don't get any errors.
python/iceberg/api/partition_spec.py
Outdated
| self.partition_names.add(name) | ||
| return self | ||
|
|
||
| def check_for_redundant_partitions(self, field): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worthwhile introducing a function that checks for redundant partitions and adds the field? Each transform method does the same block creating the partition field and then adding it. Something like this:
def check_redundant_and_add_field(self, field_id: int, name: str, transform: 'Transform') -> None:
part_field = (PartitionField(field_id,
self.__next_field_id(),
name,
transform))
self.check_for_redundant_partitions(part_field)
self.fields.append(part_field)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, good idea. We can actually update check_for_redundant_partitions method to include adding field.
| PartitionSpec.builder_for(schema).truncate("dec", 10).build(), | ||
| PartitionSpec.builder_for(schema).truncate("s", 10).build(), | ||
| # todo support them | ||
| # PartitionSpec.builder_for(schema).add_without_field_id(6, "dec_unsupported", "unsupported").build(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this not supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, put todos for unimplemented Transforms, e.g. BucketUUID, etc. Also, need to update from_string to support unrecognized cases. In Java, it returns return new UnknownTransform<>(type, transform);
So I think it might be better to have them in a different PR.
|
Seems the failures are from linters |
TGooch44
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I'll get my merge access sorted out today, and I can get this merged. We may need to do something to address the linting issue, although I'm still able to run tox on your branch and get no linting errors.
Support custom target name in partition spec builder and also fix some other issues. Additionally, add unit tests for partition spec.