Skip to content

Commit

Permalink
Add sort integration benchmark
Browse files Browse the repository at this point in the history
  • Loading branch information
2010YOUY01 committed Nov 8, 2024
1 parent 34d9d3a commit ec00de9
Show file tree
Hide file tree
Showing 3 changed files with 376 additions and 1 deletion.
24 changes: 24 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -330,6 +330,30 @@ steps.
The tests sort the entire dataset using several different sort
orders.

## Sort Integration

Test performance of end-to-end sort SQL queries. (While the `Sort` benchmark focuses on a single sort executor, this benchmark tests how sorting is executed across multiple CPU cores by benchmarking sorting the whole relational table.)

Sort integration benchmark runs whole table sort queries on TPCH `lineitem` table, with different characteristics. For example, different number of sort keys, different sort key cardinality, different number of payload columns, etc.

See [`sort_integration.rs`](src/bin/sort_integration.rs) for more details.

### Sort Integration Benchmark Example Runs
1. Run all queries with default setting:
```bash
cargo run --release --bin sort_integration -- benchmark -p '....../datafusion/benchmarks/data/tpch_sf1' -o '/tmp/sort_integration.json'
```

2. Run a specific query:
```bash
cargo run --release --bin sort_integration -- benchmark -p '....../datafusion/benchmarks/data/tpch_sf1' -o '/tmp/sort_integration.json' --query 2
```

3. Run all queries with `bench.sh` script:
```bash
./bench.sh run sort_integration
```

## IMDB

Run Join Order Benchmark (JOB) on IMDB dataset.
Expand Down
19 changes: 18 additions & 1 deletion benchmarks/bench.sh
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,10 @@ main() {
# same data as for tpch
data_tpch "1"
;;
sort_integration)
# same data as for tpch
data_tpch "1"
;;
*)
echo "Error: unknown benchmark '$BENCHMARK' for data generation"
usage
Expand Down Expand Up @@ -252,6 +256,9 @@ main() {
external_aggr)
run_external_aggr
;;
sort_integration)
run_sort_integration
;;
*)
echo "Error: unknown benchmark '$BENCHMARK' for run"
usage
Expand Down Expand Up @@ -546,7 +553,17 @@ run_external_aggr() {
# number-of-partitions), and by default `--partitions` is set to number of
# CPU cores, we set a constant number of partitions to prevent this
# benchmark to fail on some machines.
$CARGO_COMMAND --bin external_aggr -- benchmark --partitions 4 --iterations 5 --path "${TPCH_DIR}" -o "${RESULTS_FILE}"
$CARGO_COMMAND --bin external_aggr -- benchmark --partitions 5 --iterations 5 --path "${TPCH_DIR}" -o "${RESULTS_FILE}"
}

# Runs the sort integration benchmark
run_sort_integration() {
TPCH_DIR="${DATA_DIR}/tpch_sf1"
RESULTS_FILE="${RESULTS_DIR}/sort_integration.json"
echo "RESULTS_FILE: ${RESULTS_FILE}"
echo "Running sort integration benchmark..."

$CARGO_COMMAND --bin sort_integration -- benchmark --iterations 5 --path "${TPCH_DIR}" -o "${RESULTS_FILE}"
}


Expand Down
Loading

0 comments on commit ec00de9

Please sign in to comment.