-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add benchmark derived from TPC-H #355
Merged
Merged
Changes from 6 commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
d3e7414
Add benchmark derived from TPC-H
andygrove 68f1afd
remove test
andygrove 57ee9b2
Misc updates
andygrove e46b15f
rat
andygrove 6305a6f
rat
andygrove bc84bbd
python lint
andygrove 3680bf4
add copyright to create_tables.sql
andygrove File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
data | ||
results.csv |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,78 @@ | ||
<!--- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
# DataFusion Python Benchmarks Derived from TPC-H | ||
|
||
## Create Release Build | ||
|
||
From repo root: | ||
|
||
```bash | ||
maturin develop --release | ||
``` | ||
|
||
Note that release builds take a really long time, so you may want to temporarily comment out this section of the | ||
root Cargo.toml when frequently building. | ||
|
||
```toml | ||
[profile.release] | ||
lto = true | ||
codegen-units = 1 | ||
``` | ||
|
||
## Generate Data | ||
|
||
```bash | ||
./tpch-gen.sh 1 | ||
``` | ||
|
||
## Run Benchmarks | ||
|
||
```bash | ||
python tpch.py ./data ./queries | ||
``` | ||
|
||
A summary of the benchmark timings will be written to `results.csv`. For example: | ||
|
||
```csv | ||
setup,1.4 | ||
q1,2978.6 | ||
q2,679.7 | ||
q3,2943.7 | ||
q4,2894.9 | ||
q5,3592.3 | ||
q6,1691.4 | ||
q7,3003.9 | ||
q8,3818.7 | ||
q9,4237.9 | ||
q10,2344.7 | ||
q11,526.1 | ||
q12,2284.6 | ||
q13,1009.2 | ||
q14,1738.4 | ||
q15,1942.1 | ||
q16,499.8 | ||
q17,5178.9 | ||
q18,4127.7 | ||
q19,2056.6 | ||
q20,2162.5 | ||
q21,8046.5 | ||
q22,754.9 | ||
total,58513.2 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
-- Licensed to the Apache Software Foundation (ASF) under one | ||
-- or more contributor license agreements. See the NOTICE file | ||
-- distributed with this work for additional information | ||
-- regarding copyright ownership. The ASF licenses this file | ||
-- to you under the Apache License, Version 2.0 (the | ||
-- "License"); you may not use this file except in compliance | ||
-- with the License. You may obtain a copy of the License at | ||
-- | ||
-- http://www.apache.org/licenses/LICENSE-2.0 | ||
-- | ||
-- Unless required by applicable law or agreed to in writing, | ||
-- software distributed under the License is distributed on an | ||
-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
-- KIND, either express or implied. See the License for the | ||
-- specific language governing permissions and limitations | ||
-- under the License. | ||
|
||
CREATE EXTERNAL TABLE customer ( | ||
c_custkey INT NOT NULL, | ||
c_name VARCHAR NOT NULL, | ||
c_address VARCHAR NOT NULL, | ||
c_nationkey INT NOT NULL, | ||
c_phone VARCHAR NOT NULL, | ||
c_acctbal DECIMAL(15, 2) NOT NULL, | ||
c_mktsegment VARCHAR NOT NULL, | ||
c_comment VARCHAR NOT NULL, | ||
c_extra VARCHAR NOT NULL, | ||
) | ||
STORED AS CSV | ||
WITH HEADER ROW DELIMITER '|' | ||
LOCATION '$PATH/customer.csv'; | ||
|
||
CREATE EXTERNAL TABLE lineitem ( | ||
l_orderkey INT NOT NULL, | ||
l_partkey INT NOT NULL, | ||
l_suppkey INT NOT NULL, | ||
l_linenumber INT NOT NULL, | ||
l_quantity DECIMAL(15, 2) NOT NULL, | ||
l_extendedprice DECIMAL(15, 2) NOT NULL, | ||
l_discount DECIMAL(15, 2) NOT NULL, | ||
l_tax DECIMAL(15, 2) NOT NULL, | ||
l_returnflag VARCHAR NOT NULL, | ||
l_linestatus VARCHAR NOT NULL, | ||
l_shipdate DATE NOT NULL, | ||
l_commitdate DATE NOT NULL, | ||
l_receiptdate DATE NOT NULL, | ||
l_shipinstruct VARCHAR NOT NULL, | ||
l_shipmode VARCHAR NOT NULL, | ||
l_comment VARCHAR NOT NULL, | ||
l_extra VARCHAR NOT NULL, | ||
) | ||
STORED AS CSV | ||
WITH HEADER ROW DELIMITER '|' | ||
LOCATION '$PATH/lineitem.csv'; | ||
|
||
CREATE EXTERNAL TABLE nation ( | ||
n_nationkey INT NOT NULL, | ||
n_name VARCHAR NOT NULL, | ||
n_regionkey INT NOT NULL, | ||
n_comment VARCHAR NOT NULL, | ||
n_extra VARCHAR NOT NULL, | ||
) | ||
STORED AS CSV | ||
WITH HEADER ROW DELIMITER '|' | ||
LOCATION '$PATH/nation.csv'; | ||
|
||
CREATE EXTERNAL TABLE orders ( | ||
o_orderkey INT NOT NULL, | ||
o_custkey INT NOT NULL, | ||
o_orderstatus VARCHAR NOT NULL, | ||
o_totalprice DECIMAL(15, 2) NOT NULL, | ||
o_orderdate DATE NOT NULL, | ||
o_orderpriority VARCHAR NOT NULL, | ||
o_clerk VARCHAR NOT NULL, | ||
o_shippriority INT NULL, | ||
o_comment VARCHAR NOT NULL, | ||
o_extra VARCHAR NOT NULL, | ||
) | ||
STORED AS CSV | ||
WITH HEADER ROW DELIMITER '|' | ||
LOCATION '$PATH/orders.csv'; | ||
|
||
CREATE EXTERNAL TABLE part ( | ||
p_partkey INT NOT NULL, | ||
p_name VARCHAR NOT NULL, | ||
p_mfgr VARCHAR NOT NULL, | ||
p_brand VARCHAR NOT NULL, | ||
p_type VARCHAR NOT NULL, | ||
p_size INT NULL, | ||
p_container VARCHAR NOT NULL, | ||
p_retailprice DECIMAL(15, 2) NOT NULL, | ||
p_comment VARCHAR NOT NULL, | ||
p_extra VARCHAR NOT NULL, | ||
) | ||
STORED AS CSV | ||
WITH HEADER ROW DELIMITER '|' | ||
LOCATION '$PATH/part.csv'; | ||
|
||
CREATE EXTERNAL TABLE partsupp ( | ||
ps_partkey INT NOT NULL, | ||
ps_suppkey INT NOT NULL, | ||
ps_availqty INT NOT NULL, | ||
ps_supplycost DECIMAL(15, 2) NOT NULL, | ||
ps_comment VARCHAR NOT NULL, | ||
ps_extra VARCHAR NOT NULL, | ||
) | ||
STORED AS CSV | ||
WITH HEADER ROW DELIMITER '|' | ||
LOCATION '$PATH/partsupp.csv'; | ||
|
||
CREATE EXTERNAL TABLE region ( | ||
r_regionkey INT NOT NULL, | ||
r_name VARCHAR NOT NULL, | ||
r_comment VARCHAR NOT NULL, | ||
r_extra VARCHAR NOT NULL, | ||
) | ||
STORED AS CSV | ||
WITH HEADER ROW DELIMITER '|' | ||
LOCATION '$PATH/region.csv'; | ||
|
||
CREATE EXTERNAL TABLE supplier ( | ||
s_suppkey INT NOT NULL, | ||
s_name VARCHAR NOT NULL, | ||
s_address VARCHAR NOT NULL, | ||
s_nationkey INT NOT NULL, | ||
s_phone VARCHAR NOT NULL, | ||
s_acctbal DECIMAL(15, 2) NOT NULL, | ||
s_comment VARCHAR NOT NULL, | ||
s_extra VARCHAR NOT NULL, | ||
) | ||
STORED AS CSV | ||
WITH HEADER ROW DELIMITER '|' | ||
LOCATION '$PATH/supplier.csv'; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
-- Benchmark Query 1 derived from TPC-H query 1 under the terms of the TPC Fair Use Policy. | ||
-- TPC-H queries are Copyright 1993-2022 Transaction Processing Performance Council. | ||
select | ||
l_returnflag, | ||
l_linestatus, | ||
sum(l_quantity) as sum_qty, | ||
sum(l_extendedprice) as sum_base_price, | ||
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price, | ||
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge, | ||
avg(l_quantity) as avg_qty, | ||
avg(l_extendedprice) as avg_price, | ||
avg(l_discount) as avg_disc, | ||
count(*) as count_order | ||
from | ||
lineitem | ||
where | ||
l_shipdate <= date '1998-12-01' - interval '68 days' | ||
group by | ||
l_returnflag, | ||
l_linestatus | ||
order by | ||
l_returnflag, | ||
l_linestatus; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
-- Benchmark Query 10 derived from TPC-H query 10 under the terms of the TPC Fair Use Policy. | ||
-- TPC-H queries are Copyright 1993-2022 Transaction Processing Performance Council. | ||
select | ||
c_custkey, | ||
c_name, | ||
sum(l_extendedprice * (1 - l_discount)) as revenue, | ||
c_acctbal, | ||
n_name, | ||
c_address, | ||
c_phone, | ||
c_comment | ||
from | ||
customer, | ||
orders, | ||
lineitem, | ||
nation | ||
where | ||
c_custkey = o_custkey | ||
and l_orderkey = o_orderkey | ||
and o_orderdate >= date '1993-07-01' | ||
and o_orderdate < date '1993-07-01' + interval '3' month | ||
and l_returnflag = 'R' | ||
and c_nationkey = n_nationkey | ||
group by | ||
c_custkey, | ||
c_name, | ||
c_acctbal, | ||
c_phone, | ||
n_name, | ||
c_address, | ||
c_comment | ||
order by | ||
revenue desc limit 20; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
-- Benchmark Query 11 derived from TPC-H query 11 under the terms of the TPC Fair Use Policy. | ||
-- TPC-H queries are Copyright 1993-2022 Transaction Processing Performance Council. | ||
select | ||
ps_partkey, | ||
sum(ps_supplycost * ps_availqty) as value | ||
from | ||
partsupp, | ||
supplier, | ||
nation | ||
where | ||
ps_suppkey = s_suppkey | ||
and s_nationkey = n_nationkey | ||
and n_name = 'ALGERIA' | ||
group by | ||
ps_partkey having | ||
sum(ps_supplycost * ps_availqty) > ( | ||
select | ||
sum(ps_supplycost * ps_availqty) * 0.0001000000 | ||
from | ||
partsupp, | ||
supplier, | ||
nation | ||
where | ||
ps_suppkey = s_suppkey | ||
and s_nationkey = n_nationkey | ||
and n_name = 'ALGERIA' | ||
) | ||
order by | ||
value desc; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
-- Benchmark Query 12 derived from TPC-H query 12 under the terms of the TPC Fair Use Policy. | ||
-- TPC-H queries are Copyright 1993-2022 Transaction Processing Performance Council. | ||
select | ||
l_shipmode, | ||
sum(case | ||
when o_orderpriority = '1-URGENT' | ||
or o_orderpriority = '2-HIGH' | ||
then 1 | ||
else 0 | ||
end) as high_line_count, | ||
sum(case | ||
when o_orderpriority <> '1-URGENT' | ||
and o_orderpriority <> '2-HIGH' | ||
then 1 | ||
else 0 | ||
end) as low_line_count | ||
from | ||
orders, | ||
lineitem | ||
where | ||
o_orderkey = l_orderkey | ||
and l_shipmode in ('FOB', 'SHIP') | ||
and l_commitdate < l_receiptdate | ||
and l_shipdate < l_commitdate | ||
and l_receiptdate >= date '1995-01-01' | ||
and l_receiptdate < date '1995-01-01' + interval '1' year | ||
group by | ||
l_shipmode | ||
order by | ||
l_shipmode; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
-- Benchmark Query 13 derived from TPC-H query 13 under the terms of the TPC Fair Use Policy. | ||
-- TPC-H queries are Copyright 1993-2022 Transaction Processing Performance Council. | ||
select | ||
c_count, | ||
count(*) as custdist | ||
from | ||
( | ||
select | ||
c_custkey, | ||
count(o_orderkey) | ||
from | ||
customer left outer join orders on | ||
c_custkey = o_custkey | ||
and o_comment not like '%express%requests%' | ||
group by | ||
c_custkey | ||
) as c_orders (c_custkey, c_count) | ||
group by | ||
c_count | ||
order by | ||
custdist desc, | ||
c_count desc; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
-- Benchmark Query 14 derived from TPC-H query 14 under the terms of the TPC Fair Use Policy. | ||
-- TPC-H queries are Copyright 1993-2022 Transaction Processing Performance Council. | ||
select | ||
100.00 * sum(case | ||
when p_type like 'PROMO%' | ||
then l_extendedprice * (1 - l_discount) | ||
else 0 | ||
end) / sum(l_extendedprice * (1 - l_discount)) as promo_revenue | ||
from | ||
lineitem, | ||
part | ||
where | ||
l_partkey = p_partkey | ||
and l_shipdate >= date '1995-02-01' | ||
and l_shipdate < date '1995-02-01' + interval '1' month; |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this also contain the disclaimer about being derived from TPC-H since it is performing DDL? Don't know the answer there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I will go ahead and add this.