-
Notifications
You must be signed in to change notification settings - Fork 3k
Python: Add year, month, day, and hour transforms #5462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@Fokko can you help to review it? Thanks. |
Fokko
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments, but this looks great @jun-he
| assert TestType.parse_raw(json).__root__ == transform | ||
|
|
||
|
|
||
| @pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of a nit, but I think this test both tests the str and repr, which I think makes sense to split into two tests (one for repr and one for str.
We haven't discussed this, but I'm not a fan of the whole parameterize functionality in pytest mostly because the decorator initializes before the actual tests ran. For example, if there is an error in the YearTransform() constructor, it will fail directly. This conflicts a bit with my personal TDD approach to development.
Therefore I just split everything into separate tests: https://github.com/apache/iceberg/blob/master/python/tests/test_types.py#L546
This way you don't have to check on which input the test is failing.
I've also added this to the list of Python topics for the next community sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for splitting out cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌 Split it into two tests.
Let's chat in the sync meeting to see if we should avoid parameterize at all. I can have followup changes if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jun-he, when in doubt, do not parameterize. If you need to do anything special whatsoever, like updating an assert to use __root__ instead of isinstance(..., YearTransform) then parameters are not a good idea.
|
Hey @jun-he We're thinking of doing a first Python release, and I would love to have this in as well. Would you have some time to dig into the comments? Thanks! |
python/pyiceberg/transforms.py
Outdated
| """ | ||
|
|
||
| __root__: Literal["year"] = Field(default="year") | ||
| _source_type: IcebergType = PrivateAttr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like these can all be removed because they aren't used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, removed _source_type from most of the transforms.
python/tests/test_transforms.py
Outdated
| ) | ||
| def test_datetime_transform_serde(transform, json): | ||
| assert transform.json() == json | ||
| assert TestType.parse_raw(json).__root__ == transform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too generic. Each class should be tested individual to validate that parsing the string results in an instance of the correct transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split this into different serialize and deserialize tests.
python/pyiceberg/transforms.py
Outdated
| SECOND = 0 | ||
|
|
||
|
|
||
| class TimeTransform(Transform[S, int]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can these also be Singleton?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
Thanks @jun-he looks great! Thanks! 🚀 |
To split the PR #3450, open a new PR to add time (year, month, day, hour) transforms.