Skip to content

Conversation

@kbendick
Copy link
Contributor

@kbendick kbendick commented Aug 4, 2022

This is an offshoot of #5305 and partially closes #5349.

Adds a system.truncate function that can be used in Spark SQL, as well as can be used as a FunctionCatalog function that can be turned into a transform for storage partitioned joins.

This also breaks the definition of Truncate out into utility functions inside of TruncateUtil. Because different usages validate input at different times, all of the functions in TruncateUtil do not validate their input and instead assume that the input is validated by the calling code. This allows for the Truncate transforms to validate their width one time (on instantiation), and for the Spark truncate function to skip input validation for faster generated code.

@kbendick kbendick force-pushed the kb-add-spark-truncate-function branch 5 times, most recently from a602e50 to eae6bb8 Compare August 4, 2022 18:09
@rdblue rdblue changed the title [SPARK] - Add Truncate Function to FunctionCatalog and Break Out Truncate Definition Into Shared Utility Class Spark: Support truncate in FunctionCatalog Aug 4, 2022
@kbendick kbendick force-pushed the kb-add-spark-truncate-function branch from ddb851c to f33b356 Compare August 11, 2022 17:23
@kbendick kbendick force-pushed the kb-add-spark-truncate-function branch from 1d63a04 to 9e86776 Compare August 11, 2022 19:02
@kbendick
Copy link
Contributor Author

@aokolnychyi @rdblue I addressed all of your feedback.

Please take a look when you get a chance. Also, on the subject of nullability for the width column, I was able to make it work but reverted it based on this comment thread: #5431 (comment)

Knowing that Spark isn't the best at keeping track of nullability, I think this is better and adheres to the contract laid out in ScalarFunction as quoted by @aokolnychyi.

But take a look at the commit that added nullability-checking on the width field if you'd like: ad5343f

@kbendick kbendick force-pushed the kb-add-spark-truncate-function branch from fd7ef4b to ce6dba1 Compare August 11, 2022 20:58
@rdblue
Copy link
Contributor

rdblue commented Aug 11, 2022

Looks good to me. @aokolnychyi, do you want to take another look?

@aokolnychyi
Copy link
Contributor

@rdblue, let me take a quick now.

Copy link
Contributor

@aokolnychyi aokolnychyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@aokolnychyi aokolnychyi merged commit 6a5051b into apache:master Aug 12, 2022
@aokolnychyi
Copy link
Contributor

Thanks, @kbendick! Could you cherry-pick this to 3.2?

@kbendick kbendick deleted the kb-add-spark-truncate-function branch August 12, 2022 18:19
@kbendick
Copy link
Contributor Author

Here's the PR for bucket. All the feedback from this PR was more or less applied there as well - #5513

@kbendick
Copy link
Contributor Author

Thanks, @kbendick! Could you cherry-pick this to 3.2?

Sure thing!

zhongyujiang pushed a commit to zhongyujiang/iceberg that referenced this pull request Apr 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Spark’s FunctionCatalog for Existing Transformations

3 participants