Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow2 erased refcounted clones performance issues #1746

Closed
teh-cmc opened this issue Mar 31, 2023 · 1 comment · Fixed by #1727
Closed

arrow2 erased refcounted clones performance issues #1746

teh-cmc opened this issue Mar 31, 2023 · 1 comment · Fixed by #1727
Labels
🏹 arrow concerning arrow 🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Mar 31, 2023

That one is very bad, this is obviously something we do all over the place.

Background:

  • Basically all arrow Arrays are Buffers when you reach the actual physical type (which can be quite deep because all our components are compound types etc)
  • Buffer is literally an Arc over some bytes, so cheap to clone
  • But then we carry around erased arrow arrays, i.e. trait objects: Box<dyn Array>
  • Trait objects cannot implement Clone because they literally dont know their size/layout at compile time
  • But there's a hack for that
  • All of the above is what I mean when I say "erased refcounted clone"

See #1745 for detailed benchmarks.

This is yet another example where fixing the issue once in DataCell and then using DataCell everywhere would be nice...

@teh-cmc teh-cmc added 🏹 arrow concerning arrow ⛃ re_datastore affects the datastore itself 🚀 performance Optimization, memory use, etc labels Mar 31, 2023
@teh-cmc
Copy link
Member Author

teh-cmc commented Apr 1, 2023

Written down some detailed thoughts about this in #1745 (comment).

The TL;DR of it being: wrap the body of a DataCell in an Arc, and enjoy a life with less problems.

The core issue is the same as in #1738: see #1738 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏹 arrow concerning arrow 🚀 performance Optimization, memory use, etc ⛃ re_datastore affects the datastore itself
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant