apache · andygrove · Dec 13, 2025 · Dec 13, 2025
diff --git a/docs/source/contributor-guide/benchmarking.md b/docs/source/contributor-guide/benchmarking.md
@@ -43,4 +43,185 @@ Available benchmarking guides:
 - [Benchmarking on macOS](benchmarking_macos.md)
 - [Benchmarking on AWS EC2](benchmarking_aws_ec2)
 
-We also have many micro benchmarks that can be run from an IDE located [here](https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark).
+## Comet Benchmark Suite
+
+Comet includes a comprehensive benchmark suite located in `spark/src/test/scala/org/apache/spark/sql/benchmark/` that provides detailed performance measurements for various aspects of query execution. The benchmark suite compares Spark native execution with Comet acceleration to measure performance improvements across different workload patterns.
+
+### Benchmark Categories
+
+#### Expression-Level Benchmarks (Micro)
+These benchmarks focus on specific expression types and operations:
+
+- CometArithmeticBenchmark: Tests arithmetic operations (add, subtract, multiply, divide) on integer and decimal types
+- CometStringExpressionBenchmark: Tests string operations including substring, space, trim, length, concat, and regex_replace
+- CometConditionalExpressionBenchmark: Tests conditional expressions like CASE WHEN and IF statements
+- CometPredicateExpressionBenchmark: Tests predicate operations including IN expressions and filtering
+- CometDatetimeExpressionBenchmark: Tests date/time operations including truncation at various granularities (year, month, day, hour, etc.)
+
+#### Operator-Level Benchmarks
+These benchmarks focus on query execution operators:
+
+- CometAggregateBenchmark: Tests aggregation operations (SUM, MIN, MAX, COUNT, COUNT DISTINCT) with varying cardinalities
+- CometShuffleBenchmark: Tests shuffle operations using CometShuffleManager with various data types and partition counts
+- CometExecBenchmark: Tests general execution performance with project and filter operations at different selectivity levels
+
+#### I/O and Scan Benchmarks
+- CometReadBenchmark: Comprehensive scan performance testing including:
+  - Direct Parquet reader performance across all data types
+  - Iceberg table format support
+  - Encrypted Parquet column scanning
+  - Dictionary encoding optimizations
+  - Wide table scanning (single column from tables with 10-100 columns)
+  - Filter pushdown with varying selectivity
+
+#### Full Query Suite Benchmarks
+- CometTPCHQueryBenchmark: Runs all 22 TPC-H queries with configurable scale factors
+- CometTPCDSQueryBenchmark: Runs TPC-DS queries (v1.4 and v2.7.0 variants) with configurable scale factors
+- CometTPCDSMicroBenchmark: Runs 25 targeted micro-benchmarks using TPC-DS data, focusing on specific operations like decimal handling, joins, aggregations, and string functions
+
+### Running Benchmarks
+
+#### Basic Command Structure
+```bash
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-<CLASS_NAME> [-- --args]
+```
+
+The `SPARK_GENERATE_BENCHMARK_FILES=1` environment variable is required to output benchmark results to files in the `spark/benchmarks/` directory.
+
+#### Expression and Operator Benchmark Examples
+```bash
+# Arithmetic operations
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometArithmeticBenchmark
+
+# String operations
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometStringExpressionBenchmark
+
+# Aggregation operations
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometAggregateBenchmark
+
+# Shuffle operations
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometShuffleBenchmark
+
+# Scan operations
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometReadBenchmark
+```
+
+#### TPC Benchmark Examples
+
+First, generate the required test data:
+
+```bash
+# Generate TPC-H data (1GB scale factor)
+cd $COMET_HOME
+make benchmark-org.apache.spark.sql.GenTPCHData -- --location /tmp --scaleFactor 1
+
+# Generate TPC-DS data (requires tpcds-kit)
+cd /tmp && git clone https://github.com/databricks/tpcds-kit.git
+cd tpcds-kit/tools && make OS=MACOS  # Use OS=LINUX on Linux
+cd $COMET_HOME && mkdir /tmp/tpcds
+make benchmark-org.apache.spark.sql.GenTPCDSData -- --dsdgenDir /tmp/tpcds-kit/tools --location /tmp/tpcds --scaleFactor 1
+```
+
+Then run the benchmarks:
+
+```bash
+# Run TPC-H benchmarks
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometTPCHQueryBenchmark -- --data-location /tmp/tpch/sf1_parquet
+
+# Run TPC-DS benchmarks
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometTPCDSQueryBenchmark -- --data-location /tmp/tpcds
+
+# Run TPC-DS micro benchmarks
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometTPCDSMicroBenchmark -- --data-location /tmp/tpcds
+```
+
+#### Benchmark Options
+Many benchmarks support additional options:
+
+```bash
+# Filter specific queries (e.g., only run TPC-H queries 3, 5, and 13)
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometTPCHQueryBenchmark -- --data-location /tmp/tpch/sf1_parquet --query-filter q3,q5,q13
+
+# Enable Cost-Based Optimization
+SPARK_GENERATE_BENCHMARK_FILES=1 make benchmark-org.apache.spark.sql.benchmark.CometTPCDSQueryBenchmark -- --data-location /tmp/tpcds --cbo
+
+# Use larger scale factors
+make benchmark-org.apache.spark.sql.GenTPCHData -- --location /tmp --scaleFactor 10
+```
+
+### Understanding Benchmark Results
+
+#### Execution Modes Compared
+The benchmarks typically compare several execution modes:
+
+1. Spark Native: Standard Spark execution without Comet
+2. Comet Scan: Comet scan acceleration with Spark execution
+3. Comet Scan + Exec: Full Comet acceleration (scan + execution operators)
+4. Comet Native DataFusion: Full native DataFusion execution path
+5. Comet Native Iceberg: Iceberg-compatible native execution
+
+#### Key Metrics
+- Execution Time: Wall clock time for query execution
+- Throughput: Rows processed per second
+- Speedup Ratio: Performance improvement over Spark native
+- Memory Usage: Peak memory consumption during execution
+
+#### Output Location
+Benchmark results are written to:
+- Directory: `spark/benchmarks/`
+- File format: `<BenchmarkClass>-results.txt`
+- Content: Detailed timing results and performance comparisons
+
+### Advanced Configuration
+
+#### Environment Variables
+- `SPARK_GENERATE_BENCHMARK_FILES=1`: Required for file output
+- `MAVEN_OPTS`: JVM options (default: `-Xmx20g`)
+- `PROFILES`: Maven profiles for Spark versions (default: Spark 3.5)
+
+#### Key Configuration Options
+Benchmarks can be tuned using Spark configuration:
+
+```bash
+# Memory configuration
+--conf spark.executor.memory=8g
+--conf spark.executor.memoryOverhead=2g
+
+# Shuffle configuration
+--conf spark.sql.shuffle.partitions=200
+--conf spark.comet.columnar.shuffle.async.thread.num=4
+--conf spark.comet.columnar.shuffle.spill.threshold=134217728
+
+# Comet-specific settings
+--conf spark.comet.enabled=true
+--conf spark.comet.exec.enabled=true
+--conf spark.comet.nativeScan.impl=COMET  # or DATAFUSION, ICEBERG_COMPAT
+```
+
+### Benchmark Development
+
+All benchmarks extend `CometBenchmarkBase`, which provides:
+
+- Test Data Generation: Helper methods for creating Parquet tables with various compression and encryption options
+- Comet Integration: Automatic setup of SparkSession with Comet extensions
+- Comparison Framework: `runWithComet()` method to run tests with Comet both enabled and disabled
+- Result Collection: Standardized timing and metrics collection
+
+When developing new benchmarks, consider:
+
+1. Test Data Variety: Include different data types, null patterns, and cardinalities
+2. Realistic Workloads: Base tests on common query patterns
+3. Multiple Scenarios: Test various configurations (dictionary encoding, compression, etc.)
+4. Clear Naming: Use descriptive benchmark and case names
+5. Resource Management: Ensure proper cleanup of temporary tables and files
+
+### Benchmark Suite Summary
+
+| Benchmark Type | Count | Focus Area | Example Use Case |
+|---|---|---|---|
+| Expression-Level | 5 | Individual operations | Optimize arithmetic functions |
+| Operator-Level | 3 | Query operators | Optimize aggregation algorithms |
+| I/O & Scan | 1 | Data access | Optimize Parquet reading |
+| Full Query Suite | 3 | End-to-end | Validate overall performance |
+
+The benchmark suite provides comprehensive coverage from low-level expression evaluation to full TPC query execution, enabling both targeted optimization and overall performance validation.