Skip to content
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.

Possible to factor out the null bytemap and/or use more of arrow compute API? #191

Closed
davesque opened this issue Oct 7, 2020 · 1 comment

Comments

@davesque
Copy link

davesque commented Oct 7, 2020

I noticed that fletcher converts the null bitmap into a null bytemap as a step in many computations for arrays that have null values. Do you have any interest in eventually factoring this step out or accepting PRs that do? I think that would involve a fair bit of custom Cython or Numba code that manually iterates over the null bitmap along with the values buffer. But it might be worth doing and could narrow the gap or even overtake Pandas on some of the benchmarks in your benchmarking suite.

Also, I noticed a number of other places where it might be possible to make simple calls to the Arrow compute API. I made a simple modification to the FletcherBaseArray.sum method to just make a direct call to pyarrow.compute.sum. This does make it so that you can't specify any special behavior regarding nulls via skipna. However, it speeds things up by a lot (35-40% faster than Pandas or Fletcher). It makes me wonder if it wouldn't be worth implementing more of Fletcher's internals via Cython and Arrow's compute API.

What are your thoughts on these things?

@xhochy
Copy link
Owner

xhochy commented Feb 22, 2023

This project has been archived as development has ceased around 2021.
With the support of Apache Arrow-backed extension arrays in pandas, the major goal of this project has been fulfilled.

@xhochy xhochy closed this as completed Feb 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants