pushdowns not being applied correctly during optimization #2616

universalmind303 · 2024-08-05T18:55:01Z

Simple query to find the first 10 transactions that have pizza in the description.

df = daft.read_csv('~/Downloads/transactions.csv')
df = (df
  .where(df['description'].str.lower().str.like('%pizza%'))
  .select(daft.col('description'))
  .limit(10)
)

the unoptimized plan is as one would expect

== Unoptimized Logical Plan ==

* Limit: 10
|
* Project: col(description)
|
* Filter: like(lower(col(description)), lit("%pizza%"))
|
* GlobScanOperator
|   Glob paths = [~/Downloads/transactions.csv]
|   File schema = transaction_date#Date, posted_date#Date, card_no#Int64,
|     description#Utf8, category#Utf8, debit#Float64, credit#Float64
|   Partitioning keys = []
|   Output schema = transaction_date#Date, posted_date#Date, card_no#Int64,
|     description#Utf8, category#Utf8, debit#Float64, credit#Float64

A few strange things i noticed.

The optimized plan has the filter and limit pushed down, but also shows them as their own nodes
How does a limit get pushed down with a filter? usually if a filter is present, the filter needs to be performed first acting as a pushdown barrier for the limit
why is the projection apparently not being pushed down at all? neither the filter or the limit are pushdown barriers for the projection

== Optimized Logical Plan ==

* Project: col(description)
|
* Limit: 10
|
* GlobScanOperator
|   Glob paths = [~/Downloads/transactions.csv]
|   File schema = transaction_date#Date, posted_date#Date, card_no#Int64,
|     description#Utf8, category#Utf8, debit#Float64, credit#Float64
|   Partitioning keys = []
|   Filter pushdown = like(lower(col(description)), lit("%pizza%"))
|   Limit pushdown = 10
|   Output schema = transaction_date#Date, posted_date#Date, card_no#Int64,
|     description#Utf8, category#Utf8, debit#Float64, credit#Float64

the physical plan shows no information about if the filters/limits are actually pushed down, so it's hard to corroborate what is happening at the physical level.

== Physical Plan ==

* Project: col(description)
|   Clustering spec = { Num partitions = 1 }
|
* Limit: 10
|   Eager = false
|   Num partitions = 1
|
* TabularScan:
|   Num Scan Tasks = 1
|   Estimated Scan Bytes = 19366
|   Clustering spec = { Num partitions = 1 }

The text was updated successfully, but these errors were encountered:

Vince7778 mentioned this issue Aug 9, 2024

[BUG] Fix projection pushdowns not working with limits #2635

Merged

Vince7778 closed this as completed in #2635 Aug 9, 2024

Vince7778 closed this as completed in 8cf6974 Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pushdowns not being applied correctly during optimization #2616

pushdowns not being applied correctly during optimization #2616

universalmind303 commented Aug 5, 2024

pushdowns not being applied correctly during optimization #2616

pushdowns not being applied correctly during optimization #2616

Comments

universalmind303 commented Aug 5, 2024