-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT]: sql count(*) #2832
[FEAT]: sql count(*) #2832
Conversation
CodSpeed Performance ReportMerging #2832 will degrade performances by 28.91%Comparing Summary
Benchmarks breakdown
|
src/daft-sql/src/modules/aggs.rs
Outdated
Some(rel) => { | ||
let schema = rel.schema(); | ||
let expr = col(schema.fields[0].name.clone()) | ||
.count(daft_core::count_mode::CountMode::Valid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right behavior for COUNT(*)? IIUC this code is saying "take the count of non-null entries in the 0th column"?
Instead, we need a way to correctly generate a CountRows plan when we encounter COUNT(*) I think. Which here might mean we propagate col("*")
upwards and perhaps handle that somewhere downstream
cc @kevinzwang here as well who might know better on where this would get resolved during plan creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaychia I just made some changes so that the sql implementation of select count(*) from df
mirrors the behavior of df.count()
col(schema.fields[0].name.clone()) | ||
.count(daft_core::count_mode::CountMode::All) | ||
.alias("count") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok for now, but we will need to change this behavior to enable "correct" counting behavior, when we do it for the non-SQL path as well (which is to add a special CountRows logical plan op instead of relying on our aggregation machinery to count the first column).
adds support for sql
count(*)
closes #2742