Recursive CTEs: Stage 3 - add execution support #8840

matthewgapp · 2024-01-12T06:11:40Z

Adds execution support to recursive CTEs based on a worktable physical plan that mirrors the previous iteration's results back into the recursive query's execution plan.

Part of #462

Todos

Fill out docs and cleanup.
Add test that contians GROUPBY within the recursive CTE
Add test that contains WindowFunction within the recursive CTE
Add test with EXPLAIN explain plans of these queries to show how they work
Add tests that produce more than one batch on each iteration (these tests make batches much smaller than the default 8K rows). Maybe by setting target_batch_size to 3 or something.

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

matthewgapp · 2024-01-12T06:20:48Z

datafusion/sqllogictest/test_files/cte.slt

+        time,
+        1 as "val"
+      FROM
+        (SELECT DISTINCT "time" FROM "beg_account_balance")


This test fails. But the following changes cause the test to pass

removing time or 1 as "val" from the sub_cte projection

removing the reference to beg_account_balance in the FROM clause (effectively removing the sub-query from the sub_cte)

removing the FULL JOIN "sub_cte" on 1=1

The test fails with index out of bounds: the len is 0 but the index is 0 but other incarnations of the error after playing around with the repro were

Internal error: PhysicalExpr Column references column 'time' at index 0 (zero-based) but input schema only has 0 columns: []. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker [SQL] WITH RECURSIVE "recursive_cte" AS

External error: query failed: DataFusion error: ProjectionPushdown caused by Internal error: PhysicalExpr Column references column 'time' at index 0 (zero-based) but input schema only has 0 columns: []. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker [SQL] WITH RECURSIVE "recursive_cte" AS (

External error: query failed: DataFusion error: ProjectionPushdown caused by Internal error: PhysicalExpr Column references column 'LEAD(beg_account_balance.time,Int64(1)) ORDER BY [beg_account_balance.time ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW' at index 1 (zero-based) but input schema only has 1 columns: ["LEAD(beg_account_balance.time,Int64(1)) ORDER BY [beg_account_balance.time ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW"]. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker [SQL] WITH RECURSIVE "recursive_cte" AS (

Note that the columns referenced in the error messages are different from the repro because the repro at the time that I collected those errors was different.

The same PR but based off of 393e48f98872c696a90fce033fa584533d2326fa on Nov 18th works but based on latest main, it fails.

@alamb or @jonahgao, does this ring any bell for you?

That sounds like an empty schema is being used somewhere (len is zero). Maybe there is a bug in the output schema calculations or something

Removing projection pushdown fixes the issue. https://github.com/matthewgapp/arrow-datafusion/blob/362bec93736f64aef7327f0ed577a29b75e74796/datafusion/core/src/physical_optimizer/optimizer.rs#L122. Looking into file changes that might be relevant...

After taking a quick pass at PRs that touched projection_pushdown.rs, I'm still stuck. Here's the log of my investigation. Guess I'll have to buckle up and start debugging 🙃

Tried to revert https://github.com/apache/arrow-datafusion/pull/8662/files but couldn’t get code to compile so gave up which is a shame since that PR looks the most relevant
Reverted #8573 but didn’t fix
Reverted #8454 but didn’t fix
Reverted #8485 but didn’t fix
Didn’t try to revert https://github.com/apache/arrow-datafusion/pull/8327/files because it doesn’t look relevant

Fixed min repro in this commit ea4cb6e. But still breaking in our downstream product so I'll add the fuller test case and work to fix against that.

I can't seem to reproduce outside of our project just yet. Getting the following panic

The application panicked (crashed). Message: primitive array Location: /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-array-49.0.0/src/cast.rs:751 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ BACKTRACE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ⋮ 15 frames hidden ⋮ 16: core::panicking::panic_display::hbefe501d317ed1d7 at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/panicking.rs:196 17: core::panicking::panic_str::h9863c42b2d46e0f3 at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/panicking.rs:171 18: core::option::expect_failed::h4e9eb510dd8145dd at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/option.rs:1980 19: core::option::Option<T>::expect::hec9b720ec03fc4f9 at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/option.rs:894 20: arrow_array::cast::AsArray::as_primitive::h493f40e8665dba69 at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-array-49.0.0/src/cast.rs:751 749 │ /// Downcast this to a [`PrimitiveArray`] panicking if not possible 750 │ fn as_primitive<T: ArrowPrimitiveType>(&self) -> &PrimitiveArray<T> { 751 > self.as_primitive_opt().expect("primitive array") 752 │ } 753 │ 21: <datafusion_physical_plan::aggregates::group_values::primitive::GroupValuesPrimitive<T> as datafusion_physical_plan::aggregates::group_values::GroupValues>::intern::h6edcb0cb053f37df at /Users/matthewgapp/code/forked/arrow-datafusion/datafusion/physical-plan/src/aggregates/group_values/primitive.rs:116 114 │ groups.clear(); 115 │ 116 > for v in cols[0].as_primitive::<T>() { 117 │ let group_id = match v { 118 │ None => *self.null_group.get_or_insert_with(|| { 22: datafusion_physical_plan::aggregates::row_hash::GroupedHashAggregateStream::group_aggregate_batch::h941a0f4d623d2cc4 at /Users/matthewgapp/code/forked/arrow-datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs:548 546 │ // calculate the group indices for each input row 547 │ let starting_num_groups = self.group_values.len(); 548 > self.group_values 549 │ .intern(group_values, &mut self.current_group_indices)?; 550 │ let group_indices = &self.current_group_indices; 23: <datafusion_physical_plan::aggregates::row_hash::GroupedHashAggregateStream as futures_core::stream::Stream>::poll_next::h778e63c29afa7d81 at /Users/matthewgapp/code/forked/arrow-datafusion/datafusion/physical-plan/src/aggregates/row_hash.rs:438 436 │ 437 │ // Do the grouping 438 > extract_ok!(self.group_aggregate_batch(batch)); 439 │ 440 │ // If we can begin emitting rows, do so, 24: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next::h13c49d63ee92c868 at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-core-0.3.29/src/stream.rs:120 118 │ 119 │ fn poll_next(self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Option<Self::Item>> { 120 > self.get_mut().as_mut().poll_next(cx) 121 │ } 122 │ 25: futures_util::stream::stream::StreamExt::poll_next_unpin::h5f167c57e87b8a15 at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.29/src/stream/stream/mod.rs:1638 1636 │ Self: Unpin, 1637 │ { 1638 > Pin::new(self).poll_next(cx) 1639 │ } 1640 │ 26: <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll::h558ec691ab4e7eeb at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-util-0.3.29/src/stream/stream/next.rs:32 30 │ 31 │ fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> { 32 > self.stream.poll_next_unpin(cx) 33 │ } 34 │ } 27: datafusion_physical_plan::repartition::RepartitionExec::pull_from_input::{{closure}}::h0d5b828859fa1f18 at /Users/matthewgapp/code/forked/arrow-datafusion/datafusion/physical-plan/src/repartition/mod.rs:702 700 │ // fetch the next batch 701 │ let timer = metrics.fetch_time.timer(); 702 > let result = stream.next().await; 703 │ timer.done(); 704 │ 28: tokio::runtime::task::core::Core<T,S>::poll::{{closure}}::h24aa3949553e8510 at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.0/src/runtime/task/core.rs:328 326 │ 327 │ let _guard = TaskIdGuard::enter(self.task_id); 328 > future.poll(&mut cx) 329 │ }) 330 │ }; 29: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut::hed0e8a2b03eb44a5 at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.0/src/loom/std/unsafe_cell.rs:16 14 │ #[inline(always)] 15 │ pub(crate) fn with_mut<R>(&self, f: impl FnOnce(*mut T) -> R) -> R { 16 > f(self.0.get()) 17 │ } 18 │ } 30: tokio::runtime::task::core::Core<T,S>::poll::h6e6c537ecabf42b1 at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.0/src/runtime/task/core.rs:317 315 │ pub(super) fn poll(&self, mut cx: Context<'_>) -> Poll<T::Output> { 316 │ let res = { 317 > self.stage.stage.with_mut(|ptr| { 318 │ // Safety: The caller ensures mutual exclusion to the field. 319 │ let future = match unsafe { &mut *ptr } { 31: tokio::runtime::task::harness::poll_future::{{closure}}::hdaa68138fe576c41 at /Users/matthewgapp/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.35.0/src/runtime/task/harness.rs:485 483 │ } 484 │ let guard = Guard { core }; 485 > let res = guard.core.poll(cx); 486 │ mem::forget(guard); 487 │ res

This issue seems to arise from (or at lease be correlated to) the MemoryExec. I think this because the difference in MemExec partition sizes (more on that below) and because when I remove try_swapping_with_memory https://github.com/matthewgapp/arrow-datafusion/blob/ea4cb6e9c6b8a0bc892ef76578fd7853ddc692ab/datafusion/core/src/physical_optimizer/projection_pushdown.rs#L110, our project no longer blows up.

The difference between the plans that fail (in our project) and succeed (outside of our project) is that the offending Memory exec has a partition size of 2 ( MemoryExec: partitions=1, partition_sizes=[2] ) and the successful plan has a partition size of 1 ( MemoryExec: partitions=1, partition_sizes=[1] ).

I'm not sure how to force datafusion to take on a larger partition size so that I can reproduce outside of our project 🤔

Offending physical plan from logs:

Input physical plan: AggregateExec: mode=FinalPartitioned, gby=[emd@0 as emd, beg@1 as beg, prices_row_num@2 as prices_row_num, prices_row_num_advancement@3 as prices_row_num_advancement], aggr=[] AggregateExec: mode=Partial, gby=[emd@0 as emd, beg@1 as beg, prices_row_num@2 as prices_row_num, prices_row_num_advancement@3 as prices_row_num_advancement], aggr=[] ProjectionExec: expr=[emd@0 as emd, beg@1 as beg, prices_row_num@2 as prices_row_num, prices_row_num_advancement@3 as prices_row_num_advancement] RecursiveQueryExec: is_distinct=false ProjectionExec: expr=[40 as emd, 0 as beg, prices_row_num@5 as prices_row_num, prices_row_num_advancement@6 as prices_row_num_advancement] NestedLoopJoinExec: join_type=Left NestedLoopJoinExec: join_type=Left HashJoinExec: mode=Partitioned, join_type=Inner, on=[(prices_row_num@3, prices_row_num@0)] MemoryExec: partitions=1, partition_sizes=[2] AggregateExec: mode=FinalPartitioned, gby=[prices_row_num@0 as prices_row_num], aggr=[] AggregateExec: mode=Partial, gby=[prices_row_num@0 as prices_row_num], aggr=[] ProjectionExec: expr=[prices_row_num@0 as prices_row_num] ProjectionExec: expr=[MIN(prices_with_row_num_2.prices_row_num)@0 as prices_row_num] AggregateExec: mode=Final, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)] AggregateExec: mode=Partial, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)] MemoryExec: partitions=1, partition_sizes=[2] ProjectionExec: expr=[prices_row_num@0 as prices_row_num] ProjectionExec: expr=[MIN(prices_with_row_num_2.prices_row_num)@0 as prices_row_num] AggregateExec: mode=Final, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)] AggregateExec: mode=Partial, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)] MemoryExec: partitions=1, partition_sizes=[2] ProjectionExec: expr=[prices_row_num_advancement@0 as prices_row_num_advancement] ProjectionExec: expr=[MIN(prices_with_row_num_2.prices_row_num)@0 as prices_row_num_advancement] AggregateExec: mode=Final, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)] AggregateExec: mode=Partial, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)] NestedLoopJoinExec: join_type=Inner, filter=prices_row_num@0 > prices_row_num@1 MemoryExec: partitions=1, partition_sizes=[2] ProjectionExec: expr=[prices_row_num@0 as prices_row_num] ProjectionExec: expr=[MIN(prices_with_row_num_2.prices_row_num)@0 as prices_row_num] AggregateExec: mode=Final, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)] AggregateExec: mode=Partial, gby=[], aggr=[MIN(prices_with_row_num_2.prices_row_num)] MemoryExec: partitions=1, partition_sizes=[2] ProjectionExec: expr=[emd@0 + 40 as emd, emd@0 as beg, prices_row_num@8 as prices_row_num, prices_row_num_advancement@9 as prices_row_num_advancement] ProjectionExec: expr=[emd@0 as emd, beg@1 as beg, prices_row_num@2 as prices_row_num, prices_row_num_advancement@3 as prices_row_num_advancement, Index@4 as Index, product@5 as product, price@6 as price, prices_row_num@7 as prices_row_num, prices_row_num@9 as prices_row_num, prices_row_num_advancement@10 as prices_row_num_advancement] HashJoinExec: mode=Partitioned, join_type=Left, on=[(coalesce(prices_with_row_num_2.prices_row_num,recursive_cte.prices_row_num_advancement)@8, prices_row_num@0)] ProjectionExec: expr=[emd@0 as emd, beg@1 as beg, prices_row_num@2 as prices_row_num, prices_row_num_advancement@3 as prices_row_num_advancement, Index@4 as Index, product@5 as product, price@6 as price, prices_row_num@7 as prices_row_num, coalesce(prices_row_num@7, prices_row_num_advancement@3) as coalesce(prices_with_row_num_2.prices_row_num,recursive_cte.prices_row_num_advancement)] HashJoinExec: mode=Partitioned, join_type=Left, on=[(prices_row_num_advancement@3, prices_row_num@3)] FilterExec: prices_row_num_advancement@3 IS NOT NULL WorkTableExec: name=recursive_cte MemoryExec: partitions=1, partition_sizes=[2] ProjectionExec: expr=[prices_row_num@0 as prices_row_num, LEAD(prices_with_row_num_2.prices_row_num,Int64(1)) ORDER BY [prices_with_row_num_2.prices_row_num ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@1 as prices_row_num_advancement] BoundedWindowAggExec: wdw=[LEAD(prices_with_row_num_2.prices_row_num,Int64(1)) ORDER BY [prices_with_row_num_2.prices_row_num ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: Ok(Field { name: "LEAD(prices_with_row_num_2.prices_row_num,Int64(1)) ORDER BY [prices_with_row_num_2.prices_row_num ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW", data_type: UInt64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(UInt64(NULL)), end_bound: CurrentRow }], mode=[Sorted] AggregateExec: mode=FinalPartitioned, gby=[prices_row_num@0 as prices_row_num], aggr=[] AggregateExec: mode=Partial, gby=[prices_row_num@0 as prices_row_num], aggr=[] ProjectionExec: expr=[prices_row_num@3 as prices_row_num] MemoryExec: partitions=1, partition_sizes=[2]

I was able to increase the partition size to 2 within the memory exec by setting the batch size to 50 on the session config, but I'm still unable to reproduce in my test crate outside of our larger project.

I was able to reproduce and I think I'm getting to the bottom of this issue, which indeed appears to be a separate issue. It arises when creating an upstream table with ROW_NUMBER() window function. Some sort of subtle data type mismatch causes downstream execution to blow up. We can workaround this by wrapping the upstream ROW_NUMBER() column in a CAST to BIGINT.

datafusion/common/src/config.rs

matthewgapp · 2024-01-13T21:17:11Z

datafusion/core/src/physical_optimizer/projection_pushdown.rs

-        })
-        .collect()
+) -> Option<Vec<usize>> {
+    if source.is_none() {


Related to fixing panic which caused a unit test in cte.slt to fail. More info here #8840 (comment)

I'm afraid that this bug might not have been fixed.
Returning None seems to select all the columns of the csv table balance.

DataFusion CLI v34.0.0 ❯ CREATE EXTERNAL TABLE balance STORED as CSV WITH HEADER ROW LOCATION '../testing/data/csv/r_cte_balance.csv'; 0 rows in set. Query took 0.026 seconds. ❯ set datafusion.optimizer.max_passes=0; 0 rows in set. Query took 0.002 seconds. ❯ select time from balance; ProjectionPushdown caused by Internal error: PhysicalOptimizer rule 'ProjectionPushdown' failed, due to generate a different schema, original schema: Schema { fields: [ Field { name: "time", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }, new schema: Schema { fields: Field { name: "time", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "name", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: "account_balance", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }], metadata: {} }. This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker

I filed issue #9004, and I think PR #9005 should be able to fix the ProjectionPushdown bug.

There might be another issue here, which is why no projection was pushed down to the CSV table, but I think it will not affect the functionality.

testing

matthewgapp · 2024-01-18T20:16:41Z

The tests are failing because it's failing to find the commit that contains the new test files. I pushed the new commit to my branch here so it should start passing again when they're re-run apache/arrow-testing#93

alamb · 2024-01-22T20:49:08Z

FYI @jonahgao

I'll try and find time to review this over the next day or two. Thanks @matthewgapp

matthewgapp · 2024-01-22T21:27:33Z

FYI @jonahgao

I'll try and find time to review this over the next day or two. Thanks @matthewgapp

Thank you, looking forward to getting this over the finish line!

datafusion/physical-plan/src/recursive_query.rs

jonahgao · 2024-01-24T13:36:06Z

datafusion/physical-plan/src/work_table.rs

+
+    /// Take the previously written batches from the work table.
+    /// This will be called by the [`WorkTableExec`] when it is executed.
+    fn take(&self) -> Vec<RecordBatch> {


The work table might be read multiple times if there are multiple TableScans on it in the recursive term.

WITH RECURSIVE my_cte AS ( SELECT 1 as a UNION ALL SELECT my_cte.a+2 as a FROM my_cte join my_cte c2 using(a) WHERE my_cte.a<5 ) SELECT a FROM my_cte;

We might need to clone the batches here, or we may not support it, like PostgreSQL does.

postgres=# WITH RECURSIVE my_cte AS ( SELECT 1 as a UNION ALL SELECT my_cte.a+2 as a FROM my_cte join my_cte c2 using(a) WHERE my_cte.a<5 ) SELECT a FROM my_cte; ERROR: recursive reference to query "my_cte" must not appear more than once LINE 5: FROM my_cte join my_cte c2 using(a)

I think not supporting it for now would be fine (rather than paying the clone tax for every query regardless of whether it's referenced multiple times)

Or we could make the Worktable capable of being read repeatedly. For example, we could use something like Arc<Vec<RecordBatch>>. However, I think it's okay not to support it; we can refine it in subsequent PRs.

Yeah that makes sense. Off the top of your head, is there an easy way to traverse the recursive term of the logical plan to count the number of table references (relations) that match the cte name? I'll avoid building that machinery myself if it already exists.

Btw, looks like memory stream clones record batches anyway. https://github.com/matthewgapp/arrow-datafusion/blob/b11ca8bb7e61d78af831b8667fa551d26e4839c7/datafusion/physical-plan/src/memory.rs#L255. Feels like the poll_next implementation should instead drain the batches

How about implementing it in assign_work_table as well?

fn assign_work_table( plan: Arc<dyn ExecutionPlan>, work_table: Arc<WorkTable>, ) -> Result<Arc<dyn ExecutionPlan>> { let mut work_table_refs = 0; plan.transform_down_mut(&mut |plan| { if let Some(exec) = plan.as_any().downcast_ref::<WorkTableExec>() { if work_table_refs > 0 { not_impl_err!( "Multiple recursive references to the same CTE are not supported" ) } else { work_table_refs += 1; Ok(Transformed::Yes(Arc::new( exec.with_work_table(work_table.clone()), ))) } } else if plan.as_any().is::<RecursiveQueryExec>() { not_impl_err!("Recursive queries cannot be nested") } else { Ok(Transformed::No(plan)) } }) }

Yes, cloning in MemoryStream seems to be avoidable.

implemented here 2eaa48a (#8840), thanks @jonahgao!

I'll create a separate PR to remove the unnecessary clones from memory stream.

matthewgapp · 2024-01-25T18:46:33Z

Let me know if any other blocking comments or changes @alamb or @jonahgao! TY

alamb · 2024-01-25T22:20:43Z

I plan to review this tomorrow

matthewgapp · 2024-01-26T00:10:50Z

I plan to review this tomorrow

Thank you!

datafusion/core/src/physical_optimizer/projection_pushdown.rs

jonahgao · 2024-01-26T13:53:13Z

Except for the changes related to ProjectionPushdown, everything else looks good to me!
I believe that the bug related to ProjectionPushdown can be fixed after merging with main.

Thank you! @matthewgapp

add config flag for recursive ctes update docs from script update slt test for doc change restore testing pin add sql -> logical plan support * impl cte as work table * move SharedState to continuance * impl WorkTableState wip: readying pr to implement only logical plan fix sql integration test wip: add sql test for logical plan wip: format test assertion wip: remove uncessary with qualifier method some docs more docs Add comments to `RecursiveQuery` Update datfusion-cli Cargo.lock Fix clippy better errors and comments add sql -> logical plan support * impl cte as work table * move SharedState to continuance * impl WorkTableState wip: readying pr to implement only logical plan fix sql integration test wip: add sql test for logical plan wip: format test assertion wip: remove uncessary with qualifier method some docs more docs impl execution support add sql -> logical plan support * impl cte as work table * move SharedState to continuance * impl WorkTableState wip: readying pr to implement only logical plan partway through porting over isidentical's work Continuing implementation with fixes and improvements Lint fixes ensure that repartitions are not added immediately after RecursiveExec in the physical-plan add trivial sqllogictest more recursive tests remove test that asserts recursive cte should fail additional cte test wip: remove tokio from physical plan dev deps format cargo tomls fix issue where CTE could not be referenced more than 1 time wip: fixes after rebase but tpcds_physical_q54 keeps overflowing its stack Impl NamedRelation as CteWorkTable * impl cte as work table * move SharedState to continuance * impl WorkTableState * upd * assign work table state * upd * upd fix min repro but still broken on larger test case set config in sql logic tests clean up cte slt tests fixes fix option add group by test case and more test case files wip add window function recursive cte example simplify stream impl for recrusive query stream add explain to trivial test case move setting of recursive ctes to slt file and add test to ensure multiple record batches are produced each iteration remove tokio dep and remove mut lint, comments and remove tokio stream update submodule pin to feat branch that contains csvs update submodule pin to feat branch that contains csvs

…ive term

matthewgapp · 2024-01-26T17:27:40Z

Except for the changes related to ProjectionPushdown, everything else looks good to me! I believe that the bug related to ProjectionPushdown can be fixed after merging with main.

Thank you! @matthewgapp

I rebased onto main which has the projection pushdown fix #9005

alamb

Thank you @matthewgapp -- I reviewed this PR and it looks really nice (thank you @jonahgao for all the feedback that left it in such good shape)

The only thing I think is needed prior to merge is remove the arrow-testing pin -- the testing data really belong in the https://github.com/apache/arrow-testing repo, as that is shared across arrow implementations. Perhaps you could put it somewhere in the main repo, such as in https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/data ?

Otherwise I think this PR is looking great

alamb · 2024-01-26T20:00:54Z

datafusion/sqllogictest/test_files/cte.slt

+
+# setup
+statement ok
+CREATE EXTERNAL TABLE balance STORED as CSV WITH HEADER ROW LOCATION '../../testing/data/csv/r_cte_balance.csv'


BTW if you wanted to avoid having to check in external files, you could use CREATE TABLE AS VALUES

For example: https://github.com/apache/arrow-datafusion/blob/fc752557204f4b52ab4cb38b5caff99b1b73b902/datafusion/sqllogictest/test_files/aggregate.slt#L41-L45

Thanks for the tip. I'll leave for files for now - since some of the files have a non-trivial number of values.

datafusion/physical-plan/src/work_table.rs

alamb · 2024-01-26T20:03:12Z

datafusion/core/src/datasource/cte_worktable.rs

@@ -84,7 +82,11 @@ impl TableProvider for CteWorkTable {
        _filters: &[Expr],
        _limit: Option<usize>,
    ) -> Result<Arc<dyn ExecutionPlan>> {
-        not_impl_err!("scan not implemented for CteWorkTable yet")
+        // TODO: pushdown filters and limits


I recommend tracking this separate tickets or tasks on the main recursive CTE ticket, otherwise things like this can be easily forgotten

alamb · 2024-01-26T20:16:57Z

datafusion/physical-plan/src/recursive_query.rs

+        Partitioning::UnknownPartitioning(1)
+    }
+
+    // TODO: control these hints and see whether we can


yeah, this would be pretty tricky -- I think the choices you have below are the simple (and correct) ones, and excellent choices for the first implementation 👍

datafusion/physical-plan/src/recursive_query.rs

alamb · 2024-01-26T20:22:07Z

datafusion/physical-plan/src/recursive_query.rs

+        cx: &mut Context<'_>,
+    ) -> Poll<Option<Self::Item>> {
+        // TODO: we should use this poll to record some metrics!
+        if let Some(static_stream) = &mut self.static_stream {


this is very nicely written and easy to follow 👍

matthewgapp · 2024-01-26T21:52:17Z

Thank you @matthewgapp -- I reviewed this PR and it looks really nice (thank you @jonahgao for all the feedback that left it in such good shape)

The only thing I think is needed prior to merge is remove the arrow-testing pin -- the testing data really belong in the https://github.com/apache/arrow-testing repo, as that is shared across arrow implementations. Perhaps you could put it somewhere in the main repo, such as in https://github.com/apache/arrow-datafusion/tree/main/datafusion/core/tests/data ?

Otherwise I think this PR is looking great

@alamb thanks so much for the review. I'll move the test files and remove the pin now.

matthewgapp · 2024-01-26T22:31:40Z

@alamb I removed the pin and updated to reflect your tweaks. Should be good to go :D

alamb

Epic work @matthewgapp

And thank you so much for all your help @jonahgao

alamb · 2024-01-27T16:23:00Z

I can't wait to get this hardned enough to turn on my default ❤️

alamb · 2024-03-11T10:46:57Z

Filed #9554 to track enabling this feature by default

github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Jan 12, 2024

matthewgapp commented Jan 12, 2024

View reviewed changes

matthewgapp mentioned this pull request Jan 12, 2024

Add support for recursive CTEs #462

Closed

matthewgapp commented Jan 12, 2024

View reviewed changes

datafusion/common/src/config.rs Outdated Show resolved Hide resolved

matthewgapp commented Jan 13, 2024

View reviewed changes

matthewgapp force-pushed the matt/feat/recursive-ctes/execution-support branch 2 times, most recently from a993efb to f67763a Compare January 15, 2024 22:04

matthewgapp marked this pull request as ready for review January 15, 2024 22:19

matthewgapp commented Jan 16, 2024

View reviewed changes

testing Outdated Show resolved Hide resolved

matthewgapp force-pushed the matt/feat/recursive-ctes/execution-support branch from a7fd8bc to 8c8a1d4 Compare January 16, 2024 18:05

matthewgapp requested a review from alamb January 16, 2024 18:29

matthewgapp force-pushed the matt/feat/recursive-ctes/execution-support branch from 06fcbea to 68c9fe7 Compare January 18, 2024 19:56

matthewgapp force-pushed the matt/feat/recursive-ctes/execution-support branch from 68c9fe7 to b11ca8b Compare January 21, 2024 08:49

github-actions bot removed logical-expr Logical plan and expressions optimizer Optimizer rules labels Jan 21, 2024

alamb mentioned this pull request Jan 21, 2024

DataFusion weekly project plan (Andrew Lamb) - Jan 22, 2024 #8933

Closed

9 tasks

jonahgao reviewed Jan 23, 2024

View reviewed changes

datafusion/physical-plan/src/recursive_query.rs Show resolved Hide resolved

jonahgao reviewed Jan 24, 2024

View reviewed changes

jonahgao reviewed Jan 26, 2024

View reviewed changes

datafusion/core/src/physical_optimizer/projection_pushdown.rs Outdated Show resolved Hide resolved

jonahgao mentioned this pull request Jan 26, 2024

PhysicalOptimizer rule 'ProjectionPushdown' failed, due to generate a different schema #9004

Closed

matthewgapp added 5 commits January 26, 2024 09:12

error if recursive ctes are nested

efed900

error if recursive cte is referenced multiple times within the recurs…

38f847d

…ive term

wip

6121248

fix rebase

80069f7

matthewgapp force-pushed the matt/feat/recursive-ctes/execution-support branch from e65caee to 80069f7 Compare January 26, 2024 17:25

alamb reviewed Jan 26, 2024

View reviewed changes

matthewgapp added 3 commits January 26, 2024 14:18

move testing files into main repo

812d64f

update testing pin to main pin

39ab371

tweaks

d100913

alamb approved these changes Jan 27, 2024

View reviewed changes

alamb merged commit a6cdd0d into apache:main Jan 27, 2024
22 checks passed

alamb mentioned this pull request Mar 11, 2024

Enable recursive CTE support by default #9554

Closed

suibianwanwank mentioned this pull request Apr 25, 2024

CTE&CSE suibianwanwank/hexa#2

Open

Recursive CTEs: Stage 3 - add execution support #8840

Recursive CTEs: Stage 3 - add execution support #8840

Conversation

matthewgapp commented Jan 12, 2024 • edited Loading

Todos

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewgapp Jan 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewgapp Jan 13, 2024 • edited Loading

Choose a reason for hiding this comment

matthewgapp Jan 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewgapp commented Jan 18, 2024

alamb commented Jan 22, 2024

matthewgapp commented Jan 22, 2024

jonahgao Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jonahgao Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

matthewgapp Jan 25, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewgapp commented Jan 25, 2024

alamb commented Jan 25, 2024

matthewgapp commented Jan 26, 2024

jonahgao commented Jan 26, 2024

matthewgapp commented Jan 26, 2024

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthewgapp commented Jan 26, 2024

matthewgapp commented Jan 26, 2024

alamb left a comment

Choose a reason for hiding this comment

alamb commented Jan 27, 2024

alamb commented Mar 11, 2024

matthewgapp commented Jan 12, 2024 •

edited

Loading

matthewgapp Jan 12, 2024 •

edited

Loading

matthewgapp Jan 13, 2024 •

edited

Loading

matthewgapp Jan 13, 2024 •

edited

Loading

jonahgao Jan 24, 2024 •

edited

Loading

jonahgao Jan 25, 2024 •

edited

Loading

matthewgapp Jan 25, 2024 •

edited

Loading