Skip to content

Change types into dataclasses#4767

Merged
danielcweeks merged 5 commits intoapache:masterfrom
Fokko:fd-change-types-to-dataclass
May 17, 2022
Merged

Change types into dataclasses#4767
danielcweeks merged 5 commits intoapache:masterfrom
Fokko:fd-change-types-to-dataclass

Conversation

@Fokko
Copy link
Copy Markdown
Contributor

@Fokko Fokko commented May 13, 2022

Proposal to change the types into dataclasses.

This has several improvments:

  • We can use the dataclasss field(repr=True) to include the fields in the representation, instead of building our own strings
  • We can assign the types in the post_init when they are dynamic (List, Maps, Structs etc) , or just override them when they are static (Primitives)
  • We don't have to implement any eq methods because they come for free
  • The types are frozen, which is kind of nice since we re-use them
  • The code is much more consise
  • We can assign the min/max of the int/long/float as Final as of 3.8: https://peps.python.org/pep-0591/

My inspiration was the comment by Kyle:
#4742 (comment)

This would entail implementing eq, but why not use the generated one since we're comparing all the attributes :)

Would love to get you input


def __new__(cls, *fields: NestedField):
def __new__(cls, *fields: NestedField, **kwargs):
if "fields" in kwargs:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this need to check that *fields is empty?

Copy link
Copy Markdown
Contributor Author

@Fokko Fokko May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a subtle behavioral change that requires this. Without this logic, the following test is failing:

assert str(type_var) == str(eval(repr(type_var)))

When we dig into this:

>> repr(type_var)
StructType(
    fields=(
        NestedField(field_id=1, name='optional_field', field_type=IntegerType(), is_optional=True), 
        NestedField(field_id=2, name='required_field', field_type=FixedType(length=5), is_optional=False), 
        ...
    )
)

It looks like it tries to initialize it again using a keyword argument:

>> eval(repr(type_var))
{TypeError}__new__() got an unexpected keyword argument 'fields'

Instead of variable unpacking.

Using this logic we give priority to the keyword argument:

StructField(
    NestedField(field_id=1, name='optional_field', field_type=IntegerType(), is_optional=True), 
    NestedField(field_id=2, name='required_field', field_type=FixedType(length=5), is_optional=False), 
)
or
StructField(
    fields=(
        NestedField(field_id=1, name='optional_field', field_type=IntegerType(), is_optional=True), 
        NestedField(field_id=2, name='required_field', field_type=FixedType(length=5), is_optional=False), 
    )
)

I've added an additional check to see if fields is empty.

@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented May 14, 2022

I really like how much this removes! Nice work.

Copy link
Copy Markdown
Contributor

@samredai samredai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @Fokko ! Left a few comments but overall LGTM 👍

@property
def type(self) -> IcebergType:
return self._type
return self.field_type
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just rename this property to "field_type"? It might require some small changes in other parts of the library but it's probably better to not clash with a keyword and field_type is the argument name anyway. We can also do this in a follow-up PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I had exactly the same thoughts. I would love to change this to field_type to avoid clashing with the internal type.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relative small change: f466165 :)

def __init__(self, *fields: NestedField, **kwargs):
if "fields" in kwargs:
fields = kwargs["fields"]
object.__setattr__(self, "fields", fields)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be nested in the if statement right above it right? If there's no fields kwarg we can't set it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic allows you to both pass in the fields through args and kwargs:

StructField(
    NestedField(field_id=1, name='optional_field', field_type=IntegerType(), is_optional=True), 
    NestedField(field_id=2, name='required_field', field_type=FixedType(length=5), is_optional=False), 
)
or
StructField(
    fields=(
        NestedField(field_id=1, name='optional_field', field_type=IntegerType(), is_optional=True), 
        NestedField(field_id=2, name='required_field', field_type=FixedType(length=5), is_optional=False), 
    )
)

This was a change required as explained in #4767 (comment)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, makes sense!

@Fokko Fokko force-pushed the fd-change-types-to-dataclass branch from 7c3d870 to 88bddb7 Compare May 16, 2022 15:59
Fokko added 3 commits May 16, 2022 18:05
Proposal to change the types into dataclasses.

This has several improvments:

- We can use the dataclasss field(repr=True) to include the fields in the representation, instead of building our own strings
- We can assign the types in the post_init when they are dynamic (List, Maps, Structs etc) , or just override them when they are static (Primitives)
- We don't have to implement any eq methods because they come for free
- The types are frozen, which is kind of nice since we re-use them
- The code is much more consise
- We can assign the min/max of the int/long/float as Final as of 3.8: https://peps.python.org/pep-0591/

My inspiration was the comment by Kyle:
apache#4742 (comment)

This would entail implementing eq, but why not use the generated one since we're comparing all the attributes :)

Would love to get you input
Add missing words to spelling
@Fokko Fokko force-pushed the fd-change-types-to-dataclass branch from 88bddb7 to 5ac86f4 Compare May 16, 2022 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants