Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: high performance backend #8

Open
imagovrn opened this issue Jan 26, 2018 · 4 comments
Open

Python: high performance backend #8

imagovrn opened this issue Jan 26, 2018 · 4 comments
Labels

Comments

@imagovrn
Copy link
Contributor

imagovrn commented Jan 26, 2018

More Efficient Python Implementation

Current flatdata-py implementation is pure python. So far we have used it only for processing smaller datasets and for inspection/debugging. It was noticed that on large datasets it performs quite slowly. It would be useful to have an implementation with performance not too far from C++ one. In order to achieve that, we could do following:

  • Benchmark two implementations on the same data, to know the gap, monitor the benchmarks in CI. Performance benchmarks #9
  • Optimize pure-python implementation.
  • Introduce parallel processing in pure python implementation (or ease integration with a library that would do it for us, like dask).
  • As an alternative approach, create flatdata-py-ext implementation which would build and use binary extensions to improve performance.
@boxdot
Copy link
Collaborator

boxdot commented Jan 27, 2018

As far as I understand, the python implementation is fully functional.

I think we should make this issue more precise. E.g. by specifying what performance problems you see right now. Some benchmark numbers could also help. This would enable us either to split this issue or introduce a precise check-list what needs to be done.

@imagovrn
Copy link
Contributor Author

@boxdot Thanks for the comment. Updated the issue. And should stop creating items here from phone, not to confuse anybody.

@gferon
Copy link
Contributor

gferon commented Feb 6, 2018

I'm curious, do you already have something that we could commit to produce performance figures? i.e. compare C++ implementation vs the Python implementation with different Python runtimes (CPython, PyPy, ...)

@imagovrn
Copy link
Contributor Author

imagovrn commented Feb 9, 2018

@gferon not yet. That'd be #9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants