Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 5 additions & 10 deletions src/query/service/tests/it/sql/planner/optimizer/data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ This directory contains test data for TPC-DS optimizer tests. The tests are stru

```
data
├── tables/ # SQL table definitions
└── yaml/ # YAML test case definitions
├── tables/ # SQL table definitions
├── statistics/ # SQL table definitions
└── cases/ # YAML test case definitions and golden files
```

## YAML Test Case Format
Expand Down Expand Up @@ -37,12 +38,6 @@ column_statistics: # Column statistics
ndv: 10 # Number of distinct values
null_count: 0 # Number of null values

raw_plan: | # Expected raw plan
...

optimized_plan: | # Expected optimized plan
...

good_plan: | # Optional expected good plan
...
```
Expand All @@ -63,5 +58,5 @@ To add a new test case:

If the expected output of a test changes (e.g., due to optimizer improvements):

1. Run the test to see the actual output.
2. Update the `raw_plan`, `optimized_plan`, or `good_plan` field in the YAML file to match the actual output.
1. Run the test with UPDATE_GOLDENFILES to generate new file.
2. Checking that changes to files are as expected.
172 changes: 172 additions & 0 deletions src/query/service/tests/it/sql/planner/optimizer/data/cases/Q01.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
Raw plan:
Limit
├── limit: [100]
├── offset: [0]
└── Sort
├── sort keys: [default.customer.c_customer_id (#79) ASC NULLS LAST]
├── limit: [NONE]
└── EvalScalar
├── scalars: [customer.c_customer_id (#79) AS (#79)]
└── Filter
├── filters: [gt(ctr1.ctr_total_return (#48), SUBQUERY), eq(store.s_store_sk (#49), ctr1.ctr_store_sk (#7)), eq(store.s_state (#73), 'TN'), eq(ctr1.ctr_customer_sk (#3), customer.c_customer_sk (#78))]
├── subquerys
│ └── Subquery (Scalar)
│ ├── output_column: derived.sum(ctr_total_return) / if(count(ctr_total_return) = 0, 1, count(ctr_total_return)) * 1.2 (#147)
│ └── EvalScalar
│ ├── scalars: [multiply(divide(sum(ctr_total_return) (#145), if(eq(count(ctr_total_return) (#146), 0), 1, count(ctr_total_return) (#146))), 1.2) AS (#147)]
│ └── Aggregate(Initial)
│ ├── group items: []
│ ├── aggregate functions: [sum(ctr_total_return) AS (#145), count(ctr_total_return) AS (#146)]
│ └── EvalScalar
│ ├── scalars: [ctr2.ctr_total_return (#144) AS (#144), ctr2.ctr_total_return (#144) AS (#144)]
│ └── Filter
│ ├── filters: [eq(ctr1.ctr_store_sk (#7), ctr2.ctr_store_sk (#103))]
│ └── EvalScalar
│ ├── scalars: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103), Sum(sr_return_amt) (#144) AS (#144)]
│ └── Aggregate(Initial)
│ ├── group items: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103)]
│ ├── aggregate functions: [Sum(sr_return_amt) AS (#144)]
│ └── EvalScalar
│ ├── scalars: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103), store_returns.sr_return_amt (#107) AS (#107)]
│ └── Filter
│ ├── filters: [eq(store_returns.sr_returned_date_sk (#96), date_dim.d_date_sk (#116)), eq(date_dim.d_year (#122), 2001)]
│ └── Join(Cross)
│ ├── build keys: []
│ ├── probe keys: []
│ ├── other filters: []
│ ├── Scan
│ │ ├── table: default.store_returns (#4)
│ │ ├── filters: []
│ │ ├── order by: []
│ │ └── limit: NONE
│ └── Scan
│ ├── table: default.date_dim (#5)
│ ├── filters: []
│ ├── order by: []
│ └── limit: NONE
└── Join(Cross)
├── build keys: []
├── probe keys: []
├── other filters: []
├── Join(Cross)
│ ├── build keys: []
│ ├── probe keys: []
│ ├── other filters: []
│ ├── EvalScalar
│ │ ├── scalars: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7), Sum(sr_return_amt) (#48) AS (#48)]
│ │ └── Aggregate(Initial)
│ │ ├── group items: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7)]
│ │ ├── aggregate functions: [Sum(sr_return_amt) AS (#48)]
│ │ └── EvalScalar
│ │ ├── scalars: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7), store_returns.sr_return_amt (#11) AS (#11)]
│ │ └── Filter
│ │ ├── filters: [eq(store_returns.sr_returned_date_sk (#0), date_dim.d_date_sk (#20)), eq(date_dim.d_year (#26), 2001)]
│ │ └── Join(Cross)
│ │ ├── build keys: []
│ │ ├── probe keys: []
│ │ ├── other filters: []
│ │ ├── Scan
│ │ │ ├── table: default.store_returns (#0)
│ │ │ ├── filters: []
│ │ │ ├── order by: []
│ │ │ └── limit: NONE
│ │ └── Scan
│ │ ├── table: default.date_dim (#1)
│ │ ├── filters: []
│ │ ├── order by: []
│ │ └── limit: NONE
│ └── Scan
│ ├── table: default.store (#2)
│ ├── filters: []
│ ├── order by: []
│ └── limit: NONE
└── Scan
├── table: default.customer (#3)
├── filters: []
├── order by: []
└── limit: NONE

Optimized plan:
Limit
├── limit: [100]
├── offset: [0]
└── Sort
├── sort keys: [default.customer.c_customer_id (#79) ASC NULLS LAST]
├── limit: [100]
└── EvalScalar
├── scalars: [customer.c_customer_id (#79) AS (#79), ctr1.ctr_total_return (#48) AS (#154), scalar_subquery_147 (#147) AS (#155), store.s_store_sk (#49) AS (#156), ctr1.ctr_store_sk (#7) AS (#157), store.s_state (#73) AS (#158), ctr1.ctr_customer_sk (#3) AS (#159), customer.c_customer_sk (#78) AS (#160)]
└── Join(Inner)
├── build keys: [sr_store_sk (#103)]
├── probe keys: [sr_store_sk (#7)]
├── other filters: [gt(ctr1.ctr_total_return (#48), scalar_subquery_147 (#147))]
├── Join(Inner)
│ ├── build keys: [customer.c_customer_sk (#78)]
│ ├── probe keys: [ctr1.ctr_customer_sk (#3)]
│ ├── other filters: []
│ ├── Aggregate(Final)
│ │ ├── group items: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7)]
│ │ ├── aggregate functions: [Sum(sr_return_amt) AS (#48)]
│ │ └── Aggregate(Partial)
│ │ ├── group items: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7)]
│ │ ├── aggregate functions: [Sum(sr_return_amt) AS (#48)]
│ │ └── EvalScalar
│ │ ├── scalars: [store_returns.sr_customer_sk (#3) AS (#3), store_returns.sr_store_sk (#7) AS (#7), store_returns.sr_return_amt (#11) AS (#11), store_returns.sr_returned_date_sk (#0) AS (#148), date_dim.d_date_sk (#20) AS (#149), date_dim.d_year (#26) AS (#150)]
│ │ └── Join(Inner)
│ │ ├── build keys: [date_dim.d_date_sk (#20)]
│ │ ├── probe keys: [store_returns.sr_returned_date_sk (#0)]
│ │ ├── other filters: []
│ │ ├── Scan
│ │ │ ├── table: default.store_returns (#0)
│ │ │ ├── filters: []
│ │ │ ├── order by: []
│ │ │ └── limit: NONE
│ │ └── Scan
│ │ ├── table: default.date_dim (#1)
│ │ ├── filters: [eq(date_dim.d_year (#26), 2001)]
│ │ ├── order by: []
│ │ └── limit: NONE
│ └── Scan
│ ├── table: default.customer (#3)
│ ├── filters: []
│ ├── order by: []
│ └── limit: NONE
└── Join(Inner)
├── build keys: [sr_store_sk (#103)]
├── probe keys: [store.s_store_sk (#49)]
├── other filters: []
├── Scan
│ ├── table: default.store (#2)
│ ├── filters: [eq(store.s_state (#73), 'TN')]
│ ├── order by: []
│ └── limit: NONE
└── EvalScalar
├── scalars: [outer.sr_store_sk (#103) AS (#103), multiply(divide(sum(ctr_total_return) (#145), if(eq(count(ctr_total_return) (#146), 0), 1, count(ctr_total_return) (#146))), 1.2) AS (#147)]
└── Aggregate(Final)
├── group items: [outer.sr_store_sk (#103) AS (#103)]
├── aggregate functions: [sum(ctr_total_return) AS (#145), count(ctr_total_return) AS (#146)]
└── Aggregate(Partial)
├── group items: [outer.sr_store_sk (#103) AS (#103)]
├── aggregate functions: [sum(ctr_total_return) AS (#145), count(ctr_total_return) AS (#146)]
└── Aggregate(Final)
├── group items: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103)]
├── aggregate functions: [Sum(sr_return_amt) AS (#144)]
└── Aggregate(Partial)
├── group items: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103)]
├── aggregate functions: [Sum(sr_return_amt) AS (#144)]
└── EvalScalar
├── scalars: [store_returns.sr_customer_sk (#99) AS (#99), store_returns.sr_store_sk (#103) AS (#103), store_returns.sr_return_amt (#107) AS (#107), store_returns.sr_returned_date_sk (#96) AS (#151), date_dim.d_date_sk (#116) AS (#152), date_dim.d_year (#122) AS (#153)]
└── Join(Inner)
├── build keys: [date_dim.d_date_sk (#116)]
├── probe keys: [store_returns.sr_returned_date_sk (#96)]
├── other filters: []
├── Scan
│ ├── table: default.store_returns (#4)
│ ├── filters: []
│ ├── order by: []
│ └── limit: NONE
└── Scan
├── table: default.date_dim (#5)
├── filters: [eq(date_dim.d_year (#122), 2001)]
├── order by: []
└── limit: NONE

Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
name: "Q01"
description: "TPC-DS Query 1 optimizer test"

sql: |
WITH customer_total_return
AS (SELECT sr_customer_sk AS ctr_customer_sk,
sr_store_sk AS ctr_store_sk,
Sum(sr_return_amt) AS ctr_total_return
FROM store_returns,
date_dim
WHERE sr_returned_date_sk = d_date_sk
AND d_year = 2001
GROUP BY sr_customer_sk,
sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1,
store,
customer
WHERE ctr1.ctr_total_return > (SELECT Avg(ctr_total_return) * 1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
AND s_store_sk = ctr1.ctr_store_sk
AND s_state = 'TN'
AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id
LIMIT 100

# Reference to external statistics file
statistics_file: statistics.yaml

# Converted from tabular format to tree format based on parent-child relationships
good_plan: |
Result
└── SortWithLimit [sortKey: (CUSTOMER.C_CUSTOMER_ID ASC NULLS LAST), rowCount: 100]
└── InnerJoin [joinKey: (CTR1.CTR_CUSTOMER_SK = CUSTOMER.C_CUSTOMER_SK)]
├── InnerJoin [joinKey: (STORE.S_STORE_SK = CTR1.CTR_STORE_SK)]
│ ├── Filter [STORE.S_STATE = 'TN']
│ │ └── TableScan [SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.STORE] [S_STORE_SK, S_STATE] [partitions: 1/1, bytes: 135,680]
│ └── InnerJoin [joinKey: (CTR2.CTR_STORE_SK = CTR1.CTR_STORE_SK), joinFilter: (CTR1.CTR_TOTAL_RETURN) > (((SUM(CTR2.CTR_TOTAL_RETURN)) / (NVL(COUNT(CTR2.CTR_TOTAL_RETURN), 0))) * 1.2)]
│ ├── Filter [(SUM(CTR2.CTR_TOTAL_RETURN) IS NOT NULL) AND (COUNT(CTR2.CTR_TOTAL_RETURN) IS NOT NULL)]
│ │ └── Aggregate [aggExprs: [SUM(CTR2.CTR_TOTAL_RETURN), COUNT(CTR2.CTR_TOTAL_RETURN)], groupKeys: [CTR2.CTR_STORE_SK]]
│ │ └── JoinFilter [joinKey: (STORE.S_STORE_SK = CTR1.CTR_STORE_SK)]
│ │ └── WithReference [CTR2]
│ │ └── Filter [STORE_RETURNS.SR_STORE_SK IS NOT NULL]
│ │ └── WithClause [CUSTOMER_TOTAL_RETURN]
│ │ └── Aggregate [aggExprs: [SUM(SUM(SUM(STORE_RETURNS.SR_RETURN_AMT)))], groupKeys: [STORE_RETURNS.SR_CUSTOMER_SK, STORE_RETURNS.SR_STORE_SK]]
│ │ └── Aggregate [aggExprs: [SUM(SUM(STORE_RETURNS.SR_RETURN_AMT))], groupKeys: [STORE_RETURNS.SR_CUSTOMER_SK, STORE_RETURNS.SR_STORE_SK]]
│ │ └── InnerJoin [joinKey: (DATE_DIM.D_DATE_SK = STORE_RETURNS.SR_RETURNED_DATE_SK)]
│ │ ├── Filter [DATE_DIM.D_YEAR = 2001]
│ │ │ └── TableScan [SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.DATE_DIM] [D_DATE_SK, D_YEAR] [partitions: 1/1, bytes: 2,138,624]
│ │ └── Aggregate [aggExprs: [SUM(STORE_RETURNS.SR_RETURN_AMT)], groupKeys: [STORE_RETURNS.SR_CUSTOMER_SK, STORE_RETURNS.SR_STORE_SK, STORE_RETURNS.SR_RETURNED_DATE_SK]]
│ │ └── Filter [STORE_RETURNS.SR_RETURNED_DATE_SK IS NOT NULL]
│ │ └── JoinFilter [joinKey: (DATE_DIM.D_DATE_SK = STORE_RETURNS.SR_RETURNED_DATE_SK)]
│ │ └── TableScan [SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.STORE_RETURNS] [SR_RETURNED_DATE_SK, SR_CUSTOMER_SK, SR_STORE_SK, SR_RETURN_AMT] [partitions: 7070/7070, bytes: 124,763,446,272]
│ └── JoinFilter [joinKey: (STORE.S_STORE_SK = CTR1.CTR_STORE_SK)]
│ └── WithReference [CTR1]
│ └── Filter [(STORE_RETURNS.SR_STORE_SK IS NOT NULL) AND (STORE_RETURNS.SR_CUSTOMER_SK IS NOT NULL)]
│ └── WithClause [CUSTOMER_TOTAL_RETURN] (reference to earlier WITH clause)
└── JoinFilter [joinKey: (CTR1.CTR_CUSTOMER_SK = CUSTOMER.C_CUSTOMER_SK)]
└── TableScan [SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.CUSTOMER] [C_CUSTOMER_SK, C_CUSTOMER_ID] [partitions: 261/261, bytes: 2,328,538,624]
Loading
Loading