Skip to content

Commit

Permalink
fix(query): cte recursive check (#17427)
Browse files Browse the repository at this point in the history
* fix(query): cte check

* fix(query): cte check

* fix(query): cte check

* update

* fix(query): group checker subquery

* fix(query): group checker subquery

* fix(query): group checker subquery

* fix(query): fix tests
  • Loading branch information
sundy-li authored Feb 8, 2025
1 parent a8f694f commit f227b91
Show file tree
Hide file tree
Showing 13 changed files with 87 additions and 48 deletions.
9 changes: 7 additions & 2 deletions .github/workflows/links.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,13 @@ jobs:
id: lychee
uses: lycheeverse/[email protected]
with:
args: "--base . --cache --max-cache-age 1d . --exclude 'https://github.com/databendlabs/databend/issues/' --exclude 'https?://twitter\\.com(?:/.*$)?$'"

args: >-
--base .
--cache
--max-cache-age 1d .
--exclude 'https://github\.com/datafuselabs/databend/issues/.*'
--exclude 'https?://github\.com/databendlabs/databend/issues/.*'
--exclude 'https?://twitter\.com(?:/.*$)?$'
- name: Save lychee cache
uses: actions/cache/save@v3
if: always()
Expand Down
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
<a href="https://docs.databend.com/guides/cloud">Databend Serverless Cloud (beta)</a> |
<a href="https://docs.databend.com/">Documentation</a> |
<a href="https://benchmark.clickhouse.com/">Benchmarking</a> |
<a href="https://github.com/datafuselabs/databend/issues/11868">Roadmap (v1.3)</a>
<a href="https://github.com/databendlabs/databend/issues/11868">Roadmap (v1.3)</a>

</h4>

Expand All @@ -22,7 +22,7 @@

<br>

<a href="https://github.com/datafuselabs/databend/actions/workflows/release.yml">
<a href="https://github.com/databendlabs/databend/actions/workflows/release.yml">
<img src="https://img.shields.io/github/actions/workflow/status/datafuselabs/databend/release.yml?branch=main" alt="CI Status" />
</a>

Expand All @@ -35,11 +35,11 @@
</div>
</div>

<img src="https://github.com/datafuselabs/databend/assets/172204/9997d8bc-6462-4dbd-90e3-527cf50a709c" alt="databend" />
<img src="https://github.com/databendlabs/databend/assets/172204/9997d8bc-6462-4dbd-90e3-527cf50a709c" alt="databend" />

## 🐋 Introduction

**Databend**, built in Rust, is an open-source cloud data warehouse that serves as a cost-effective [alternative to Snowflake](https://github.com/datafuselabs/databend/issues/13059). With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.
**Databend**, built in Rust, is an open-source cloud data warehouse that serves as a cost-effective [alternative to Snowflake](https://github.com/databendlabs/databend/issues/13059). With its focus on fast query execution and data ingestion, it's designed for complex analysis of the world's largest datasets.

**Production-Proven Scale:**
- 🤝 **Enterprise Adoption**: Trusted by over **50 organizations** processing more than **100 million queries daily**
Expand All @@ -53,15 +53,15 @@

</div>

![Databend vs. Snowflake](https://github.com/datafuselabs/wizard/assets/172204/d796acf0-0a66-4b1d-8754-cd2cd1de04c7)
![Databend vs. Snowflake](https://github.com/databendlabs/wizard/assets/172204/d796acf0-0a66-4b1d-8754-cd2cd1de04c7)

<div align="center">

[Data Ingestion Benchmark: Databend Cloud vs. Snowflake](https://docs.databend.com/guides/benchmark/data-ingest)

</div>

![Databend vs. Snowflake](https://github.com/datafuselabs/databend/assets/172204/c61d7a40-f6fe-4fb9-83e8-06ea9599aeb4)
![Databend vs. Snowflake](https://github.com/databendlabs/databend/assets/172204/c61d7a40-f6fe-4fb9-83e8-06ea9599aeb4)


## 🚀 Why Databend
Expand Down Expand Up @@ -90,7 +90,7 @@

## 📐 Architecture

![Databend Architecture](https://github.com/datafuselabs/databend/assets/172204/68b1adc6-0ec1-41d4-9e1d-37b80ce0e5ef)
![Databend Architecture](https://github.com/databendlabs/databend/assets/172204/68b1adc6-0ec1-41d4-9e1d-37b80ce0e5ef)

## 🚀 Try Databend

Expand Down Expand Up @@ -280,15 +280,15 @@ Here are some resources to help you get started:
For guidance on using Databend, we recommend starting with the official documentation. If you need further assistance, explore the following community channels:

- [Slack](https://link.databend.com/join-slack) (For live discussion with the Community)
- [GitHub](https://github.com/datafuselabs/databend) (Feature/Bug reports, Contributions)
- [GitHub](https://github.com/databendlabs/databend) (Feature/Bug reports, Contributions)
- [Twitter](https://twitter.com/DatabendLabs/) (Get the news fast)
- [I'm feeling lucky](https://link.databend.com/i-m-feeling-lucky) (Pick up a good first issue now!)

## 🛣️ Roadmap

Stay updated with Databend's development journey. Here are our roadmap milestones:

- [Roadmap 2025](https://github.com/datafuselabs/databend/issues/14167)
- [Roadmap 2025](https://github.com/databendlabs/databend/issues/14167)

## 📜 License

Expand Down
36 changes: 32 additions & 4 deletions src/query/sql/src/planner/binder/bind_context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,12 @@ impl CteContext {
pub fn set_cte_context(&mut self, cte_context: CteContext) {
self.cte_map = cte_context.cte_map;
}

// Set cte context to current `BindContext`.
pub fn set_cte_context_and_name(&mut self, cte_context: CteContext) {
self.cte_map = cte_context.cte_map;
self.cte_name = cte_context.cte_name;
}
}

#[derive(Clone, Debug)]
Expand Down Expand Up @@ -252,9 +258,31 @@ impl BindContext {
}
}

pub fn with_parent(parent: Box<BindContext>) -> Self {
BindContext {
parent: Some(parent.clone()),
pub fn depth(&self) -> usize {
if let Some(ref p) = self.parent {
return p.depth() + 1;
}
1
}

pub fn with_opt_parent(parent: Option<&BindContext>) -> Result<Self> {
if let Some(p) = parent {
Self::with_parent(p.clone())
} else {
Self::with_parent(Self::new())
}
}

pub fn with_parent(parent: BindContext) -> Result<Self> {
const MAX_DEPTH: usize = 4096;
if parent.depth() >= MAX_DEPTH {
return Err(ErrorCode::Internal(
"Query binder exceeds the maximum iterations",
));
}

Ok(BindContext {
parent: Some(Box::new(parent.clone())),
columns: vec![],
bound_internal_columns: BTreeMap::new(),
aggregate_info: Default::default(),
Expand All @@ -273,7 +301,7 @@ impl BindContext {
expr_context: ExprContext::default(),
planning_agg_index: false,
window_definitions: DashMap::new(),
}
})
}

/// Create a new BindContext with self's parent as its parent
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,18 +32,14 @@ impl Binder {
// If the subquery is a lateral subquery, we need to let it see the columns
// from the previous queries.
let (result, mut result_bind_context) = if lateral {
let mut new_bind_context = BindContext::with_parent(Box::new(bind_context.clone()));
let mut new_bind_context = BindContext::with_parent(bind_context.clone())?;
self.bind_query(&mut new_bind_context, subquery)?
} else {
let mut new_bind_context = BindContext::with_parent(
bind_context
.parent
.clone()
.unwrap_or_else(|| Box::new(BindContext::new())),
);
let mut new_bind_context =
BindContext::with_opt_parent(bind_context.parent.as_ref().map(|c| c.as_ref()))?;
new_bind_context
.cte_context
.set_cte_context(bind_context.cte_context.clone());
.set_cte_context_and_name(bind_context.cte_context.clone());
self.bind_query(&mut new_bind_context, subquery)?
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ impl Binder {
self.ctx
.add_streams_ref(&catalog, &database, &table_name, consume);
}
let mut new_bind_context = BindContext::with_parent(Box::new(bind_context.clone()));
let mut new_bind_context = BindContext::with_parent(bind_context.clone())?;
let tokens = tokenize_sql(query.as_str())?;
let (stmt, _) = parse_sql(&tokens, self.dialect)?;
let Statement::Query(query) = &stmt else {
Expand Down Expand Up @@ -246,7 +246,7 @@ impl Binder {
let tokens = tokenize_sql(query.as_str())?;
let (stmt, _) = parse_sql(&tokens, self.dialect)?;
// For view, we need use a new context to bind it.
let mut new_bind_context = BindContext::with_parent(Box::new(bind_context.clone()));
let mut new_bind_context = BindContext::with_parent(bind_context.clone())?;
new_bind_context.view_info = Some((database.clone(), table_name));
if let Statement::Query(query) = &stmt {
self.metadata.write().add_table(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -324,7 +324,7 @@ impl Binder {
alias,
..
} => {
let mut bind_context = BindContext::with_parent(Box::new(parent_context.clone()));
let mut bind_context = BindContext::with_parent(parent_context.clone())?;
let func_name = normalize_identifier(name, &self.name_resolution_ctx);

if BUILTIN_FUNCTIONS
Expand Down
3 changes: 1 addition & 2 deletions src/query/sql/src/planner/binder/ddl/index.rs
Original file line number Diff line number Diff line change
Expand Up @@ -166,8 +166,7 @@ impl Binder {
for (index_id, _, index_meta) in indexes {
let tokens = tokenize_sql(&index_meta.query)?;
let (stmt, _) = parse_sql(&tokens, self.dialect)?;
let mut new_bind_context =
BindContext::with_parent(Box::new(bind_context.clone()));
let mut new_bind_context = BindContext::with_parent(bind_context.clone())?;
new_bind_context.planning_agg_index = true;
if let Statement::Query(query) = &stmt {
let (s_expr, _) = self.bind_query(&mut new_bind_context, query)?;
Expand Down
15 changes: 7 additions & 8 deletions src/query/sql/src/planner/binder/table.rs
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ impl Binder {
}
}
}
let bind_context = BindContext::with_parent(Box::new(bind_context.clone()));
let bind_context = BindContext::with_parent(bind_context.clone())?;
Ok((
SExpr::create_leaf(Arc::new(DummyTableScan.into())),
bind_context,
Expand Down Expand Up @@ -162,15 +162,14 @@ impl Binder {
cte_info: &CteInfo,
) -> Result<(SExpr, BindContext)> {
if let Some(cte_name) = &bind_context.cte_context.cte_name {
// `cte_name` exists, which means the current cte is a nested cte
// If the `cte_name` is the same as the current cte's name, it means the cte is recursive
if cte_name == table_name {
return Err(ErrorCode::SemanticError(
"The cte is not recursive, but it references itself.".to_string(),
)
return Err(ErrorCode::SemanticError(format!(
"The cte {table_name} is not recursive, but it references itself.",
))
.set_span(span));
}
}

let mut new_bind_context = BindContext {
parent: Some(Box::new(bind_context.clone())),
bound_internal_columns: BTreeMap::new(),
Expand Down Expand Up @@ -236,7 +235,7 @@ impl Binder {
cte_name: &str,
alias: &Option<TableAlias>,
) -> Result<(SExpr, BindContext)> {
let mut new_bind_ctx = BindContext::with_parent(Box::new(bind_context.clone()));
let mut new_bind_ctx = BindContext::with_parent(bind_context.clone())?;
let mut metadata = self.metadata.write();
let mut columns = cte_info.columns.clone();
for (index, column_name) in cte_info.columns_alias.iter().enumerate() {
Expand Down Expand Up @@ -357,7 +356,7 @@ impl Binder {
change_type: Option<ChangeType>,
sample: &Option<SampleConfig>,
) -> Result<(SExpr, BindContext)> {
let mut bind_context = BindContext::with_parent(Box::new(bind_context.clone()));
let mut bind_context = BindContext::with_parent(bind_context.clone())?;

let table = self.metadata.read().table(table_index).clone();
let table_name = table.name();
Expand Down
15 changes: 10 additions & 5 deletions src/query/sql/src/planner/optimizer/decorrelate/flatten_plan.rs
Original file line number Diff line number Diff line change
Expand Up @@ -71,9 +71,14 @@ impl SubqueryRewriter {
let mut metadata = self.metadata.write();
// Currently, we don't support left plan's from clause contains subquery.
// Such as: select t2.a from (select a + 1 as a from t) as t2 where (select sum(a) from t as t1 where t1.a < t2.a) = 1;
let table_index = metadata
.table_index_by_column_indexes(correlated_columns)
.unwrap();
let table_index =
match metadata.table_index_by_column_indexes(correlated_columns) {
Some(index) => index,
None => return Err(ErrorCode::SemanticError(
"Join left plan's from clause can't contain subquery to dcorrelated join right plan",
)),
};

let mut data_types = Vec::with_capacity(correlated_columns.len());
let mut scalar_items = vec![];
let mut scan_columns = ColumnSet::new();
Expand Down Expand Up @@ -231,7 +236,7 @@ impl SubqueryRewriter {
self.flatten_expression_scan(plan, scan, correlated_columns)
}

_ => Err(ErrorCode::Internal(
_ => Err(ErrorCode::SemanticError(
"Invalid plan type for flattening subquery",
)),
}
Expand Down Expand Up @@ -694,7 +699,7 @@ impl SubqueryRewriter {
.iter()
.any(|index| correlated_columns.contains(index))
{
return Err(ErrorCode::Internal(
return Err(ErrorCode::SemanticError(
"correlated columns in window functions not supported",
));
}
Expand Down
6 changes: 3 additions & 3 deletions src/query/sql/src/planner/semantic/type_check.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3134,11 +3134,11 @@ impl<'a> TypeChecker<'a> {
);

// Create new `BindContext` with current `bind_context` as its parent, so we can resolve outer columns.
let mut bind_context = BindContext::with_parent(Box::new(self.bind_context.clone()));
let mut bind_context = BindContext::with_parent(self.bind_context.clone())?;
let (s_expr, output_context) = binder.bind_query(&mut bind_context, subquery)?;
self.bind_context
.cte_context
.set_cte_context(output_context.cte_context);
.set_cte_context_and_name(output_context.cte_context);

if (typ == SubqueryType::Scalar || typ == SubqueryType::Any)
&& output_context.columns.len() > 1
Expand Down Expand Up @@ -4644,7 +4644,7 @@ impl<'a> TypeChecker<'a> {
expr: &Expr,
list: &[Expr],
) -> Result<Box<(ScalarExpr, DataType)>> {
let mut bind_context = BindContext::with_parent(Box::new(self.bind_context.clone()));
let mut bind_context = BindContext::with_parent(self.bind_context.clone())?;
let mut values = Vec::with_capacity(list.len());
for val in list.iter() {
values.push(vec![val.clone()])
Expand Down
3 changes: 3 additions & 0 deletions tests/sqllogictests/suites/query/cte/cte.test
Original file line number Diff line number Diff line change
Expand Up @@ -483,6 +483,9 @@ CREATE OR REPLACE TABLE sales (
net_paid DECIMAL(10, 2) NOT NULL
) row_per_block=51113;

query error
WITH t4(x) AS (select x + 3 from (select * from t4) where x < 10) SELECT * FROM t4;

query error
WITH InitialSales AS (
SELECT
Expand Down
8 changes: 6 additions & 2 deletions tests/sqllogictests/suites/query/subquery.test
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ DROP TABLE IF EXISTS c
statement ok
DROP TABLE IF EXISTS o


query error 1065
SELECT * FROM (SELECT 1 AS x) AS ss1 LEFT OUTER JOIN (SELECT 2 DIV 228 AS y) AS ss2 ON TRUE, LATERAL (SELECT ss2.y AS z LIMIT 1) AS ss3

statement ok
CREATE TABLE c (c_id INT NULL, bill VARCHAR NULL)

Expand Down Expand Up @@ -817,7 +821,7 @@ CREATE TABLE `merge_log` (
`created_at` TIMESTAMP NULL
) ENGINE = FUSE;

query
query
SELECT
(SELECT MAX(created_at)
FROM merge_log
Expand Down Expand Up @@ -851,7 +855,7 @@ statement ok
insert into test_group values ('2024-01-01', 100),('2024-01-01', 200),('2024-01-02', 400),('2024-01-02', 800);

query TI
select input_date, sum(value),
select input_date, sum(value),
(select sum(value)
from test_group b
where b.input_date >= DATE_TRUNC(YEAR, input_date)
Expand Down
4 changes: 2 additions & 2 deletions tests/suites/0_stateless/05_hints/05_0001_set_var.result
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ storage_read_buffer_size 1048576
America/Toronto
America/Toronto
1
2022-02-02 03:00:00
2022-02-02 03:00:00
2022-02-02 03:00:00.000000
2022-02-02 03:00:00.000000
1 13
Asia/Shanghai

0 comments on commit f227b91

Please sign in to comment.