Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support casting between BinaryView <--> Utf8 and LargeUtf8 #6180

Merged
merged 5 commits into from
Aug 8, 2024

Conversation

xinlifoobar
Copy link
Contributor

Which issue does this PR close?

Closes #6162 and related #6163, apache/datafusion#11752

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 2, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xinlifoobar -- I think this code does what the ticket described

However, I think this implementation's speed could be improved significantly.

Any chance you are also willing to add a benchmark in this (or another PR)? Perhaps in

let string_view_array = cast(&dict_array, &DataType::Utf8View).unwrap();

I think important benchmarks would be:

Utf8Array --> StringViewArray
Utf8Array --> BinaryViewArray

BinaryViewArray --> Utf8Array
StringViewArray --> Utf8Array

StringViewArray --> BinaryViewArray
BinaryViewArray --> StringViewArray

arrow-cast/src/cast/mod.rs Outdated Show resolved Hide resolved
@xinlifoobar
Copy link
Contributor Author

Thank you @xinlifoobar -- I think this code does what the ticket described

However, I think this implementation's speed could be improved significantly.

Any chance you are also willing to add a benchmark in this (or another PR)? Perhaps in

let string_view_array = cast(&dict_array, &DataType::Utf8View).unwrap();

I think important benchmarks would be:

Utf8Array --> StringViewArray Utf8Array --> BinaryViewArray

BinaryViewArray --> Utf8Array StringViewArray --> Utf8Array

StringViewArray --> BinaryViewArray BinaryViewArray --> StringViewArray

Done

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @xinlifoobar and @XiangpengHao -- this is great.

I went over this PR carefully and left some code suggestions but I also think they could be done as a follow on PR. I'll plan to merge this PR tomorrow to give a chance to respond.

Thanks again

arrow-cast/src/cast/mod.rs Outdated Show resolved Hide resolved
arrow-cast/src/cast/mod.rs Outdated Show resolved Hide resolved
arrow-cast/src/cast/mod.rs Show resolved Hide resolved
arrow-cast/src/cast/mod.rs Outdated Show resolved Hide resolved
arrow-cast/src/cast/mod.rs Outdated Show resolved Hide resolved
arrow-cast/src/cast/mod.rs Outdated Show resolved Hide resolved
arrow-cast/src/cast/mod.rs Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xinlifoobar -- this looks really nice

@alamb alamb merged commit 4bd737d into apache:master Aug 8, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Aug 8, 2024

🚀

@xinlifoobar xinlifoobar deleted the dev/xinli/bin_view_str branch August 9, 2024 01:27
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support casting BinaryView --> Utf8 and LargeUtf8
3 participants