Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coerce BinaryView/Utf8View to LargeBinary/LargeUtf8 on output. #12271

Merged
merged 7 commits into from
Sep 9, 2024

Conversation

wiedld
Copy link
Contributor

@wiedld wiedld commented Aug 31, 2024

Which issue does this PR close?

Closes #12119

Rationale for this change

We plan to change Datafusion to use BinaryVIew and Utf8View by default, however, the rest of the arrow ecosystem may not be ready for these data types. As such, we want to provide the option to cast the query output to a non-view type.

Because either Utf8 or LargeUtf8 can represented as a Utf8View, the cast on output converts to the larger type. Same with BinaryView to LargeBinary.

What changes are included in this PR?

During type coercion, after the plan has been coerced -- then based upon the final output type determine if an additional cast is needed.

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.
I made one existing API public (no longer crate private), and we added the optimizer config option expand_views_at_output.

@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Aug 31, 2024
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 31, 2024
@wiedld wiedld marked this pull request as ready for review September 4, 2024 20:53
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wiedld -- this looks about perfect. I think it needs one more test, but otherwise it looks great

I had some small code improvement suggestions too but I don't think they are necessary

datafusion/optimizer/src/analyzer/type_coercion.rs Outdated Show resolved Hide resolved
datafusion/optimizer/src/analyzer/type_coercion.rs Outdated Show resolved Hide resolved
datafusion/optimizer/src/analyzer/type_coercion.rs Outdated Show resolved Hide resolved
datafusion/optimizer/src/analyzer/type_coercion.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @wiedld -- looks great to me

let if_not_coerced = "Projection: a\n Sort: a ASC NULLS FIRST\n Projection: a\n EmptyRelation";
do_not_coerce_on_output(plan.clone(), if_not_coerced)?;
// Plan B: coerce requested: Utf8View => LargeUtf8 only on outermost
let if_coerced = "Projection: CAST(a AS LargeUtf8)\n Sort: a ASC NULLS FIRST\n Projection: a\n EmptyRelation";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@alamb alamb merged commit 391f5cb into apache:main Sep 9, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 9, 2024

🚀

@alamb alamb deleted the 12119/coerce-on-output branch September 9, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate documentation Improvements or additions to documentation logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert Utf8View/BinaryView --> Utf8 / Binary at output
2 participants