Python Performance

Python High Performance - Second Edition
Python and performance
Many things Performance in Python (Kaggle Kernel)
NumPy aware dynamic Python compiler using LLVM | Numba
Profiling in Python - by Markus Kunesch
Speed up Python/Pandas
Pandas GroupBy speedup
Vectorizations
- Python & Vectorization
- SO: what-is-vectorization?
- numpy.vectorize()
- Array Programming With NumPy: What is Vectorization?
- numba.vectorize()
- Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects (See Vectorization section towards lower half of the page)
- Chapter 4. NumPy Basics: Arrays and Vectorized Computation by Wes McKinney
- Backtest Trading Strategies with Pandas — Vectorized Backtesting
numba
Tools and frameworks: Dask, Swifter, Modin, etc...
- dtype_diet: Attempt to shrink Pandas dtypes without losing data so you have more RAM (and maybe more speed)
- dtype_diet
- Speed Up Pandas apply function using Dask or Swifter (tutorial)
- Swifter using only single core issue (allow_dask_on_strings(enable=True))
- Notes for Swifter:
  - set_npartitions and also make sure to mind your data type. For instance, if you are using text you need to set it explicitly (Notice that you need to explicitly set that the data is text, otherwise it’s really slow)
  - Swifter runs dask
  - look into the extra options
- sk-dist: Distributed scikit-learn meta-estimators in PySpark (optimise model training)
- Reddit & GitHub discussions
- High fidelity benchmark runner | Homepage
- ReBench: Execute and Document Benchmarks Reproducibly | implementation
- ipython_memory_usage: IPython tool to report memory usage deltas for every command you type.
- perfplot | github
- Opytimizer • A Nature-Inspired Python Optimizer. Did you ever reach a bottleneck in your computational experiments?
How the CPython compiler works
High Performance Python talk by Ian Oszvald: Blogs: 1 o 2 | Slides | Useful resources shared | Python Performance 2nd Edition git repo
- Making Pandas Fly (EuroPython 2020) | Blog
- Making Pandas Fly (PyDataAmsterdam 2020) | Blog
- Making Pandas Fly (PyDataUK 2020) | Blog
- Making Pandas Fly (PyDataBudapest 2020) | Blog
- Flying Pandas - Dask, Modin and Vaex (Remote Pizza Python 2020) | Blog
- Process 120 million taxi trips and explore in real-time with Dash, Plotly and Vaex: Interactive Dashboard | Blogpost | Code | Tutorial for data processing
- Tools for Higher Performance python (ODSC 2019) | Blog
- Tools for Higher Performance python (PyDataCambridge 2019) | Blog
- Sprinting Pandas
- High Performance Python book by Ian Ozvald & Micha Gorelick | High Performance Python book examples github repo
Performance highlights (notes)
- best practice - when and how to focus on performance
- profiling - understand what's slow to focus your efforts (timeit,line_profiler, pyspy)
- making numerical code faster - better algorithms, numpy, numba,joblib for parallelisation
- faster pandas - solving tasks faster, avoiding subtle errors that will eat your time
- unit tests - use of unit tests to support correctness during optimisation
- estimate benefits in scenarios for optimisation vs refactoring vs buying hardware vs other options

Contributing

Contributions are very welcome, please share back with the wider community (and get credited for it)!

Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.

Back to main page (table of contents)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python-Performance.md

Python-Performance.md

Python Performance

Contributing

Files

Python-Performance.md

Latest commit

History

Python-Performance.md

File metadata and controls

Python Performance

Contributing