Skip to content

Conversation

@zhuqi-lucas
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

  1. Port arrow-rs optimization for get_buffer_memory_size for gc string view
  2. Add fast path for no buffer case, so we don't need to do any calculation further.

What changes are included in this PR?

  1. Port arrow-rs optimization for get_buffer_memory_size for gc string view
  2. Add fast path for no buffer case, so we don't need to do any calculation further.

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the physical-plan Changes to the physical-plan crate label Aug 1, 2025
@zhuqi-lucas
Copy link
Contributor Author

FYI @alamb , i try to do some optimization in this PR:

  1. Port arrow-rs optimization for get_buffer_memory_size for gc string view
  2. Add fast path for no buffer case, so we don't need to do any calculation further.

Thanks!

@alamb
Copy link
Contributor

alamb commented Aug 1, 2025

Another potential idea is to simply switch to using the coalesce kernel directly (so we don't have to maintain a separate copy)

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zhuqi-lucas -- this makes sense to me

@alamb alamb added the performance Make DataFusion faster label Aug 1, 2025
@zhuqi-lucas
Copy link
Contributor Author

Another potential idea is to simply switch to using the coalesce kernel directly (so we don't have to maintain a separate copy)

Thank you @alamb for the review! I agree that switching to the upstream coalesce kernel would be the ideal follow-up solution.
Is there anything I can do to help push this forward? I'd be happy to contribute.

@alamb
Copy link
Contributor

alamb commented Aug 3, 2025

Another potential idea is to simply switch to using the coalesce kernel directly (so we don't have to maintain a separate copy)

Thank you @alamb for the review! I agree that switching to the upstream coalesce kernel would be the ideal follow-up solution. Is there anything I can do to help push this forward? I'd be happy to contribute.

Thanks @zhuqi-lucas -- as always super appreciated

Maybe once we upgrade to DataFusion 56 you can revive #16249 ? I think it is basically ok, but would benefit from having someone else go through the code and double check. The existing failures I think were due to other changes made at the time, and I suspect (hope) it will "just work"

@alamb alamb merged commit 6d9b76e into apache:main Aug 3, 2025
27 checks passed
@zhuqi-lucas
Copy link
Contributor Author

Another potential idea is to simply switch to using the coalesce kernel directly (so we don't have to maintain a separate copy)

Thank you @alamb for the review! I agree that switching to the upstream coalesce kernel would be the ideal follow-up solution. Is there anything I can do to help push this forward? I'd be happy to contribute.

Thanks @zhuqi-lucas -- as always super appreciated

Maybe once we upgrade to DataFusion 56 you can revive #16249 ? I think it is basically ok, but would benefit from having someone else go through the code and double check. The existing failures I think were due to other changes made at the time, and I suspect (hope) it will "just work"

Thank you @alamb , i will revive it after we upgrade to 56!

Standing-Man pushed a commit to Standing-Man/datafusion that referenced this pull request Aug 4, 2025
…ast path for no buffer for gc string view (apache#17008)

* Port arrow-rs optimization for get_buffer_memory_size for gc string view

* add comments and fast path
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Make DataFusion faster physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port arrow-rs optimization for get_buffer_memory_size for gc string view

2 participants