Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT]: sql count(*) #2832

Merged
merged 14 commits into from
Sep 12, 2024

Conversation

universalmind303
Copy link
Collaborator

@universalmind303 universalmind303 commented Sep 11, 2024

adds support for sql count(*)

closes #2742

@github-actions github-actions bot added the enhancement New feature or request label Sep 11, 2024
Copy link

codspeed-hq bot commented Sep 11, 2024

CodSpeed Performance Report

Merging #2832 will degrade performances by 28.91%

Comparing universalmind303:sql-count-star (656a754) with main (805fbce)

Summary

❌ 1 regressions
✅ 15 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main universalmind303:sql-count-star Change
test_count[1 Small File] 16.3 ms 22.9 ms -28.91%

Some(rel) => {
let schema = rel.schema();
let expr = col(schema.fields[0].name.clone())
.count(daft_core::count_mode::CountMode::Valid);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right behavior for COUNT(*)? IIUC this code is saying "take the count of non-null entries in the 0th column"?

Instead, we need a way to correctly generate a CountRows plan when we encounter COUNT(*) I think. Which here might mean we propagate col("*") upwards and perhaps handle that somewhere downstream

cc @kevinzwang here as well who might know better on where this would get resolved during plan creation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaychia I just made some changes so that the sql implementation of select count(*) from df mirrors the behavior of df.count()

tests/sql/test_sql.py Show resolved Hide resolved
col(schema.fields[0].name.clone())
.count(daft_core::count_mode::CountMode::All)
.alias("count")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ok for now, but we will need to change this behavior to enable "correct" counting behavior, when we do it for the non-SQL path as well (which is to add a special CountRows logical plan op instead of relying on our aggregation machinery to count the first column).

@universalmind303 universalmind303 merged commit 6594d87 into Eventual-Inc:main Sep 12, 2024
32 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SQL] Make sure that COUNT(*) is supported (equivalent to a DataFrame.count("*"))
2 participants