Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add clarifying docs to transform result types #1211

Merged
merged 5 commits into from
Oct 9, 2024

Conversation

kevinzwang
Copy link
Contributor

Follow-up from #1208, documents the conclusions from the discussion.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR.

WDYT of adding a comment in the DayTransform's result_type function explaining that using DateType is to enable engines to display a human-readable representation of the partition value

def result_type(self, source: IcebergType) -> IcebergType: ...
def result_type(self, source: IcebergType) -> IcebergType:
"""
Return the type of the result of the transform.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Return the type of the result of the transform.
Returns the `IcebergType` produced by this transform given a source type.

https://github.com/apache/iceberg/blob/main/api/src/main/java/org/apache/iceberg/transforms/Transform.java#L71-L77

@kevinjqliu
Copy link
Contributor

CI is failing due to dead loom link, removing here #1213

"""
Return the type of the result of the transform.

This type does not need to conform to the Iceberg spec, as long as it is converted to the correct type in the stored metadata.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reading this as someone less familiar with the discussions in the other MR, this doesn't tell me much. Would it be good to link that MR here so future folks have the context of the discussion? Otherwise, might be good to go into more detail here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I agree it's not capturing the whole context, let me see if I can come up with a suggestion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinjqliu just following up on this. Do you have an alternative in mind? Otherwise, I will link the relevant PR here.

"""
Return the type of the result of the transform.

This type does not need to conform to the Iceberg spec, as long as it is converted to the correct type in the stored metadata.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This type does not need to conform to the Iceberg spec, as long as it is converted to the correct type in the stored metadata.
This method defines both the physical and display representation of the partition field.
The physical representation must conform to the Iceberg spec. The display representation
can deviate from the spec, such as by transforming the value into a more human-readable format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinzwang @corleyma wdyt? does this capture the context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is good, maybe also update the DayTranform result_type function
something like,

    The physical representation conforms to the Iceberg spec as DateType is internally
    converted to int. The DateType returned here provides a more human-readable way to
    display the partition field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me!

@kevinzwang
Copy link
Contributor Author

@kevinjqliu @corleyma updated the doc strings! Let me know if there's any other changes I should make

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding docs around this

@kevinjqliu
Copy link
Contributor

CI is blocked on #1213, i'll work to resolve it

@kevinjqliu
Copy link
Contributor

#1213 is merged, so the CI issue should be fixed
@kevinzwang do you mind rebasing off main?

@kevinzwang
Copy link
Contributor Author

@kevinjqliu oh I merged instead of rebase. Either way, I think it makes sense to squash and merge this PR into main anyway

@kevinzwang
Copy link
Contributor Author

Also I don't think I have perms to merge myself so feel free to push the button whenever

@kevinjqliu kevinjqliu merged commit 7cf0c22 into apache:main Oct 9, 2024
7 checks passed
@kevinjqliu
Copy link
Contributor

Merged! Thanks @kevinzwang for the contribution!

sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
* add clarifying docs to transform result types

* add more context to docs

* re-add fixed first line

* fix lint
sungwy pushed a commit to sungwy/iceberg-python that referenced this pull request Dec 7, 2024
* add clarifying docs to transform result types

* add more context to docs

* re-add fixed first line

* fix lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants