apache · andygrove · Dec 31, 2025 · Dec 30, 2025 · Dec 30, 2025 · Dec 30, 2025
diff --git a/microbenchmarks/README.md b/microbenchmarks/README.md
@@ -0,0 +1,86 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Microbenchmarks
+
+This directory contains microbenchmarks for comparing DataFusion and DuckDB performance on individual SQL functions. Unlike the TPC-H and TPC-DS benchmarks which test full query execution, these microbenchmarks focus on the performance of specific SQL functions and expressions.
+
+## Overview
+
+The benchmarks generate synthetic data, write it to Parquet format, and then measure the execution time of various SQL functions across both DataFusion and DuckDB. Results include per-function timing comparisons and summary statistics.
+
+## Setup
+
+Create a virtual environment and install dependencies:
+
+```shell
+cd microbenchmarks
+python3 -m venv venv
+source venv/bin/activate
+pip install -r requirements.txt
+```
+
+## Usage
+
+Run a benchmark:
+
+```shell
+python microbenchmarks.py
+```
+
+### Options
+
+| Option | Default | Description |
+|--------|---------|-------------|
+| `--rows` | `1000000` | Number of rows in the generated test data |
+| `--warmup` | `2` | Number of warmup iterations before timing |
+| `--iterations` | `5` | Number of timed iterations (results are averaged) |
+| `--output` | stdout | Output file path for markdown results |
+
+### Examples
+
+Run the benchmark with default settings:
+
+```shell
+python microbenchmark.py
+```
+
+Run the benchmark with 10 million rows:
+
+```shell
+python microbenchmarks.py --rows 10000000
+```
+
+Run the benchmark and save results to a file:
+
+```shell
+python microbenchmarks.py --output results.md
+```
+
+## Output
+
+The benchmark outputs a markdown table comparing execution times:
+
+| Function | DataFusion (ms) | DuckDB (ms) | Speedup | Faster |
+|----------|----------------:|------------:|--------:|--------|
+| trim | 12.34 | 15.67 | 1.27x | DataFusion |
+| lower | 8.90 | 7.50 | 1.19x | DuckDB |
+| ... | ... | ... | ... | ... |
+
+A summary section shows overall statistics including how many functions each engine won and total execution times.