Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade arrow2/convert and use native buffers for the tensor u8 types #1375

Merged
merged 2 commits into from
May 9, 2023

Conversation

jleibs
Copy link
Member

@jleibs jleibs commented Feb 22, 2023

Depends on upstream: DataEngineeringLabs/arrow2-convert#103

Using Buffer types here should keep us from making a copy on deserialize.

TODO:

  • Show this works based on memory profiling.

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I've added a line to CHANGELOG.md (if this is a big enough change to warrant it)

@jleibs jleibs marked this pull request as ready for review February 22, 2023 01:34
@emilk emilk marked this pull request as draft February 22, 2023 13:20
@emilk
Copy link
Member

emilk commented Feb 22, 2023

Converted to draft until we have an arrow2_convert release

@jleibs jleibs force-pushed the jleibs/buffer_for_u8_tensor branch from 902bb08 to 92b759c Compare February 22, 2023 16:28
@jleibs jleibs added 🏹 arrow concerning arrow 🚀 performance Optimization, memory use, etc labels Feb 22, 2023
@emilk
Copy link
Member

emilk commented Feb 24, 2023

Sadly, this does not help with the memory duplication of tensors.
This can be seen with RERUN_TRACK_ALLOCATIONS=1 cargo r -p test_image_memory:

image

We see that the archive step still creates a complete copy of the memory.

The only real solution to this is:

@Wumpf
Copy link
Member

Wumpf commented Apr 19, 2023

@jleibs do we have the necessary arrow2convert version by now?

@emilk
Copy link
Member

emilk commented Apr 19, 2023

No; there hasn't been a new arrow2-convert for over two months. It is time we fork arrow2-convert and make our own version of it.

@jleibs jleibs force-pushed the jleibs/buffer_for_u8_tensor branch from 755b81a to 90f51b6 Compare May 9, 2023 12:36
@jleibs jleibs removed the blocked can't make progress right now label May 9, 2023
@jleibs jleibs marked this pull request as ready for review May 9, 2023 12:36
@jleibs jleibs changed the title Use buffers for the tensor u8 types Upgrade arrow2/convert and use native buffers for the tensor u8 types May 9, 2023
@jleibs jleibs merged commit ba5e177 into main May 9, 2023
@jleibs jleibs deleted the jleibs/buffer_for_u8_tensor branch May 9, 2023 13:43
jprochazk pushed a commit that referenced this pull request May 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow 🚀 performance Optimization, memory use, etc
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants