- Python High Performance - Second Edition
- Python and performance
- Many things Performance in Python (Kaggle Kernel)
- NumPy aware dynamic Python compiler using LLVM | Numba
- Profiling in Python - by Markus Kunesch
- Speed up Python/Pandas
- How to optimise your Pandas code
- Python Itertools: For a faster and memory efficient code
- Python/Pandas performance (notebook)
- High-Performance Pandas: eval() and query()
- Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects
- From Python to Numpy
- How do you speed up your numerical calculations in Numpy and Pandas? Using a small library called NumExpr with symbolic expression and other cool tricks.
- Speeding up python code using numpy
- Speed Up Your #Python and #Pandas with NumExpr via T. Scott Clendaniel
- Using Cython Nuitka Numba ShedSkin Pythran Transonic
- using Dask / Vaex / Modin to speed up Pandas-like operations
- Pandas GroupBy speedup
- Improving the performance of Pandas Group
- Pandas groupby
- Faster pandas with parallel processing
- Optimize Custom Grouping Function
- Speed up pandas 4x
- Faster than
for
loops or fast loops- Speeding up Python Code: Fast Filtering and Slow Loops
- If you have slow loops in Python, you can fix it…until you can’t
- PythonSpeed: PerformanceTips: Loops
- Pandas-Tricks: notebook
- Pandas: Enhancing performance
- A Beginner’s Guide to Optimizing Pandas Code for Speed
- Ten Tricks To Speed Up Your Python Codes by @perishleaf (TDS)
- Become a Pro at Pandas, Python’s data manipulation Library
- Python Pandas at Extreme Performance
- Vectorizations
- Python & Vectorization
- SO: what-is-vectorization?
- numpy.vectorize()
- Array Programming With NumPy: What is Vectorization?
- numba.vectorize()
- Fast, Flexible, Easy and Intuitive: How to Speed Up Your Pandas Projects (See Vectorization section towards lower half of the page)
- Chapter 4. NumPy Basics: Arrays and Vectorized Computation by Wes McKinney
- Backtest Trading Strategies with Pandas — Vectorized Backtesting
- numba
- Tools and frameworks: Dask, Swifter, Modin, etc...
- dtype_diet: Attempt to shrink Pandas dtypes without losing data so you have more RAM (and maybe more speed)
- dtype_diet
- Speed Up Pandas apply function using Dask or Swifter (tutorial)
- Swifter using only single core issue (
allow_dask_on_strings(enable=True
)) - Notes for Swifter:
- set_npartitions and also make sure to mind your data type. For instance, if you are using text you need to set it explicitly (Notice that you need to explicitly set that the data is text, otherwise it’s really slow)
- Swifter runs dask
- look into the extra options
- sk-dist: Distributed scikit-learn meta-estimators in PySpark (optimise model training)
- Reddit & GitHub discussions
- Reddit: Thoughts on Dask
- Reddit: Speeding up Text-preprocessing using Dask
- GitHub issue: Query: What is the difference between Dask and Modin?
- Reddit: how to use Docker to create your own "Micro-Cluster" Lab to experiment with Spark & Dask!
- Reddit: Python Pandas at Extreme Performance
- Reddit: Speeding up Text-preprocessing using Dask
- Reddit: Data Analysis With Dask – A Python Scale-Out, Parallel Computation Framework For Big Data
- Reddit: Dask vs Modin vs Spark?
- Reddit: Parallelizing Feature Engineering with Dask
- Reddit: Fast Parallel Data Analysis and Processing in Python with Dask Dataframes
- Reddit: Parellel programming in Python
- Reddit: How to efficiently use dask or any other parallel...
- Reddit: Dask golem
- High fidelity benchmark runner | Homepage
- ReBench: Execute and Document Benchmarks Reproducibly | implementation
- ipython_memory_usage: IPython tool to report memory usage deltas for every command you type.
- perfplot | github
- Opytimizer • A Nature-Inspired Python Optimizer. Did you ever reach a bottleneck in your computational experiments?
- How the CPython compiler works
- High Performance Python talk by Ian Oszvald: Blogs: 1 o 2 | Slides | Useful resources shared | Python Performance 2nd Edition git repo
- Making Pandas Fly (EuroPython 2020) | Blog
- Making Pandas Fly (PyDataAmsterdam 2020) | Blog
- Making Pandas Fly (PyDataUK 2020) | Blog
- Making Pandas Fly (PyDataBudapest 2020) | Blog
- Flying Pandas - Dask, Modin and Vaex (Remote Pizza Python 2020) | Blog
- Process 120 million taxi trips and explore in real-time with Dash, Plotly and Vaex: Interactive Dashboard | Blogpost | Code | Tutorial for data processing
- Tools for Higher Performance python (ODSC 2019) | Blog
- Tools for Higher Performance python (PyDataCambridge 2019) | Blog
- Sprinting Pandas
- High Performance Python book by Ian Ozvald & Micha Gorelick | High Performance Python book examples github repo
- Performance highlights (notes)
- best practice - when and how to focus on performance
- profiling - understand what's slow to focus your efforts (timeit,line_profiler, pyspy)
- making numerical code faster - better algorithms, numpy, numba,joblib for parallelisation
- faster pandas - solving tasks faster, avoiding subtle errors that will eat your time
- unit tests - use of unit tests to support correctness during optimisation
- estimate benefits in scenarios for optimisation vs refactoring vs buying hardware vs other options
Contributions are very welcome, please share back with the wider community (and get credited for it)!
Please have a look at the CONTRIBUTING guidelines, also have a read about our licensing policy.
Back to main page (table of contents)