You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 22, 2023. It is now read-only.
I noticed that fletcher converts the null bitmap into a null bytemap as a step in many computations for arrays that have null values. Do you have any interest in eventually factoring this step out or accepting PRs that do? I think that would involve a fair bit of custom Cython or Numba code that manually iterates over the null bitmap along with the values buffer. But it might be worth doing and could narrow the gap or even overtake Pandas on some of the benchmarks in your benchmarking suite.
Also, I noticed a number of other places where it might be possible to make simple calls to the Arrow compute API. I made a simple modification to the FletcherBaseArray.sum method to just make a direct call to pyarrow.compute.sum. This does make it so that you can't specify any special behavior regarding nulls via skipna. However, it speeds things up by a lot (35-40% faster than Pandas or Fletcher). It makes me wonder if it wouldn't be worth implementing more of Fletcher's internals via Cython and Arrow's compute API.
What are your thoughts on these things?
The text was updated successfully, but these errors were encountered:
This project has been archived as development has ceased around 2021.
With the support of Apache Arrow-backed extension arrays in pandas, the major goal of this project has been fulfilled.
I noticed that fletcher converts the null bitmap into a null bytemap as a step in many computations for arrays that have null values. Do you have any interest in eventually factoring this step out or accepting PRs that do? I think that would involve a fair bit of custom Cython or Numba code that manually iterates over the null bitmap along with the values buffer. But it might be worth doing and could narrow the gap or even overtake Pandas on some of the benchmarks in your benchmarking suite.
Also, I noticed a number of other places where it might be possible to make simple calls to the Arrow compute API. I made a simple modification to the
FletcherBaseArray.sum
method to just make a direct call topyarrow.compute.sum
. This does make it so that you can't specify any special behavior regarding nulls viaskipna
. However, it speeds things up by a lot (35-40% faster than Pandas or Fletcher). It makes me wonder if it wouldn't be worth implementing more of Fletcher's internals via Cython and Arrow's compute API.What are your thoughts on these things?
The text was updated successfully, but these errors were encountered: