Python vs Cython Performance Benchmark

A comprehensive benchmark comparing pure Python and Cython implementations for generating and writing a 1 million row pandas DataFrame to a parquet file with snappy compression.

Overview

This benchmark evaluates the performance difference between:

Pure Python: Standard Python implementation using pandas and list comprehensions
Cython: Optimized implementation using Cython with C-level optimizations, static typing, and numpy arrays

Benchmark Task

The benchmark measures the time to:

Generate a pandas DataFrame with 1,000,000 rows containing:
- id: Sequential integer IDs
- value1: Computed float values (i * 2.5)
- value2: Computed float values (sqrt(i))
- category: String categories (10 unique values)
- flag: Boolean values (alternating True/False)
Write the DataFrame to a parquet file with snappy compression

Installation

Prerequisites

Python 3.8+
GCC compiler (for building Cython extensions)

Setup

Clone the repository:

git clone https://github.com/nasirus/bench_python_c.git
cd bench_python_c

Install dependencies:

pip install -r requirements.txt

Build the Cython extension:

python setup.py build_ext --inplace

Usage

Run the benchmark:

python benchmark.py

The benchmark will:

Run each implementation 5 times
Calculate average, minimum, and maximum times
Display detailed performance comparison
Clean up generated parquet files automatically

Benchmark Results

Test Environment

Python 3.12.3
pandas 2.2.3
pyarrow 18.0.0
Cython 3.0.11
numpy 2.1.3

Performance Results

Pure Python Implementation:

Average Generation Time: 0.9574s
Average Writing Time: 0.1696s
Average Total Time: 1.1270s

Cython Implementation:

Average Generation Time: 0.1371s
Average Writing Time: 0.1562s
Average Total Time: 0.2934s

Performance Comparison

Metric	Speedup
DataFrame Generation	6.98x
Parquet Writing	1.09x
Total	3.84x

Cython is 284.1% faster than Pure Python for the complete workflow.

Key Findings

DataFrame Generation: Cython shows the most significant improvement (~7x faster) due to:
- Static typing and C-level loops
- Direct numpy array manipulation
- Elimination of Python interpreter overhead
- Use of C math functions (sqrt)
Parquet Writing: Minimal difference (~1.09x) because:
- Both implementations use the same pyarrow engine
- I/O operations are dominated by compression and disk writes
- Limited optimization opportunities at Python level
Overall Performance: Cython provides a ~3.84x speedup, making it excellent for:
- Data generation and transformation tasks
- Compute-intensive operations
- Processing large datasets

Implementation Details

Pure Python (`python_impl.py`)

Uses standard Python loops and list comprehensions
Relies on pandas DataFrame constructor
Simple and readable implementation

Cython (`cython_impl.pyx`)

Uses static typing with cdef
Pre-allocates numpy arrays for efficiency
Utilizes C math functions from libc.math
Disables bounds checking and wraparound for maximum performance
Employs C division for faster arithmetic

Files

benchmark.py: Main benchmark runner script
python_impl.py: Pure Python implementation
cython_impl.pyx: Cython implementation
setup.py: Build script for Cython extension
requirements.txt: Python dependencies

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
QUICKSTART.md		QUICKSTART.md
README.md		README.md
benchmark.py		benchmark.py
cython_impl.pyx		cython_impl.pyx
python_impl.py		python_impl.py
requirements.txt		requirements.txt
setup.py		setup.py
test_validation.py		test_validation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python vs Cython Performance Benchmark

Overview

Benchmark Task

Installation

Prerequisites

Setup

Usage

Benchmark Results

Test Environment

Performance Results

Performance Comparison

Key Findings

Implementation Details

Pure Python (`python_impl.py`)

Cython (`cython_impl.pyx`)

Files

License

About

Uh oh!

Releases

Packages

Languages

nasirus/bench_python_c

Folders and files

Latest commit

History

Repository files navigation

Python vs Cython Performance Benchmark

Overview

Benchmark Task

Installation

Prerequisites

Setup

Usage

Benchmark Results

Test Environment

Performance Results

Performance Comparison

Key Findings

Implementation Details

Pure Python (python_impl.py)

Cython (cython_impl.pyx)

Files

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Pure Python (`python_impl.py`)

Cython (`cython_impl.pyx`)

Packages