Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

pandas out_flavor for ctable #184

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

ARF1
Copy link

@ARF1 ARF1 commented May 3, 2015

Closes #176.
Simplifies implementation of #66.

Summary:

  • introduction of an abstraction layer for the "results array"
  • implementation of a numpy specialisation of the abstraction layer
  • implementation of a pandas specialisation of the abstraction layer

This is a quick hack to demonstrate the possible performance gains by using a output flavor with column major ordering, here: the pandas dataframe.

The architecture would need to be improved upon since this implementation suffers a x3-4 performance penalty for db[1] -type queries due to increased python overhead. For queries returning a larger number of rows this penalty disappears.

Timing results in #176.

* introduction of an abstraction layer for the "output array"
* implementation of an numpy specialisation of the abstraction layer
* implementation of a pandas specialisation of the abstraction layer
@FrancescAlted
Copy link
Member

Would you mind to add some benchmarks in the 'bench/' directory showing the advantage of this approach? My idea is to setup a speed regression check based on different benchmarks there.
Thanks!

@ARF1 ARF1 force-pushed the pandas_out_flavor branch from 1534fc4 to 5766048 Compare May 5, 2015 17:42
@ARF1
Copy link
Author

ARF1 commented May 5, 2015

@FrancescAlted

Would you mind to add some benchmarks in the 'bench/' directory showing the advantage of this approach?

I would be happy to. I just need to clarify what you are looking for:

This PR (pandas out_flavor) was only intended as a proof-of-concept, it was not really intended for inclusion in the code-base. The architecture of the more general #187 (abstraction layer) is more performant (and easier to read).

Would you like me to provide a sample implementation of a pandas "out_flavor" for the new #187 (abstraction layer) instead and a benchmark for that? I.e. with a benchmark in analogy to bench\getitem.py.

Or would you like a "rawer" benchmark, avoiding __getitem__() (and its overhead) showing only the best possible performance for filling a pandas dataframe? Sort of like bench\pandas-todataframe.py does?

@ARF1
Copy link
Author

ARF1 commented May 5, 2015

@FrancescAlted On reflection, I probably was not as clear as I could have been: when you speak of "this approach", do you mean

  • the column-major (vs. row-major) result array in isolation or
  • the abstraction layer (in whatever version) plus the pandas out-flavor implementation (vs. the current non-abstracted out flavor)?

@esc
Copy link
Member

esc commented May 23, 2015

What do you want us to do with the pull-request?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandas out_flavor for better ctable performance
3 participants