Change types into dataclasses#4767
Conversation
python/src/iceberg/types.py
Outdated
|
|
||
| def __new__(cls, *fields: NestedField): | ||
| def __new__(cls, *fields: NestedField, **kwargs): | ||
| if "fields" in kwargs: |
There was a problem hiding this comment.
Doesn't this need to check that *fields is empty?
There was a problem hiding this comment.
There seems to be a subtle behavioral change that requires this. Without this logic, the following test is failing:
assert str(type_var) == str(eval(repr(type_var)))When we dig into this:
>> repr(type_var)
StructType(
fields=(
NestedField(field_id=1, name='optional_field', field_type=IntegerType(), is_optional=True),
NestedField(field_id=2, name='required_field', field_type=FixedType(length=5), is_optional=False),
...
)
)It looks like it tries to initialize it again using a keyword argument:
>> eval(repr(type_var))
{TypeError}__new__() got an unexpected keyword argument 'fields'Instead of variable unpacking.
Using this logic we give priority to the keyword argument:
StructField(
NestedField(field_id=1, name='optional_field', field_type=IntegerType(), is_optional=True),
NestedField(field_id=2, name='required_field', field_type=FixedType(length=5), is_optional=False),
)
or
StructField(
fields=(
NestedField(field_id=1, name='optional_field', field_type=IntegerType(), is_optional=True),
NestedField(field_id=2, name='required_field', field_type=FixedType(length=5), is_optional=False),
)
)I've added an additional check to see if fields is empty.
|
I really like how much this removes! Nice work. |
python/src/iceberg/types.py
Outdated
| @property | ||
| def type(self) -> IcebergType: | ||
| return self._type | ||
| return self.field_type |
There was a problem hiding this comment.
Should we just rename this property to "field_type"? It might require some small changes in other parts of the library but it's probably better to not clash with a keyword and field_type is the argument name anyway. We can also do this in a follow-up PR.
There was a problem hiding this comment.
Yes, I had exactly the same thoughts. I would love to change this to field_type to avoid clashing with the internal type.
| def __init__(self, *fields: NestedField, **kwargs): | ||
| if "fields" in kwargs: | ||
| fields = kwargs["fields"] | ||
| object.__setattr__(self, "fields", fields) |
There was a problem hiding this comment.
This should be nested in the if statement right above it right? If there's no fields kwarg we can't set it.
There was a problem hiding this comment.
This logic allows you to both pass in the fields through args and kwargs:
StructField(
NestedField(field_id=1, name='optional_field', field_type=IntegerType(), is_optional=True),
NestedField(field_id=2, name='required_field', field_type=FixedType(length=5), is_optional=False),
)
or
StructField(
fields=(
NestedField(field_id=1, name='optional_field', field_type=IntegerType(), is_optional=True),
NestedField(field_id=2, name='required_field', field_type=FixedType(length=5), is_optional=False),
)
)This was a change required as explained in #4767 (comment)
7c3d870 to
88bddb7
Compare
Proposal to change the types into dataclasses. This has several improvments: - We can use the dataclasss field(repr=True) to include the fields in the representation, instead of building our own strings - We can assign the types in the post_init when they are dynamic (List, Maps, Structs etc) , or just override them when they are static (Primitives) - We don't have to implement any eq methods because they come for free - The types are frozen, which is kind of nice since we re-use them - The code is much more consise - We can assign the min/max of the int/long/float as Final as of 3.8: https://peps.python.org/pep-0591/ My inspiration was the comment by Kyle: apache#4742 (comment) This would entail implementing eq, but why not use the generated one since we're comparing all the attributes :) Would love to get you input
Add missing words to spelling
88bddb7 to
5ac86f4
Compare
Proposal to change the types into dataclasses.
This has several improvments:
My inspiration was the comment by Kyle:
#4742 (comment)
This would entail implementing eq, but why not use the generated one since we're comparing all the attributes :)
Would love to get you input