Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: Calling count with None arguments #768

Merged
merged 2 commits into from
Aug 1, 2024

Conversation

timsaucer
Copy link
Contributor

Which issue does this PR close?

None

Rationale for this change

This is a two line bugfix.

What changes are included in this PR?

If the user calls count() it should internally use count_star()

Are there any user-facing changes?

None

@Michael-J-Ward
Copy link
Contributor

Michael-J-Ward commented Jul 25, 2024

In #771, I flirted with the idea of removing count_star from the Rust side, but then added it back in for this release. It's really just porcelain for count(lit(1)) (I think).

Now that we have the benefit of the python wrappers, do you think we should try to push aliases such as this into the python side?

EDIT:

Thinking a little more, would count(None, distinct = True) be a potential foot-gun that we should avoid, anyway?

The other thing coming out of #771 is that we could use a better AggregateBuilder API / wrapper around the new AggregateExt trait

@timsaucer
Copy link
Contributor Author

In regards to aliases, I do think putting them on the wrappers makes sense. One thing I didn't do with #750 was to remove the alias function definitions on the rust side, even when there is an alias in the python wrapper. I'll add an issue to make sure that gets cleaned up.

Yes, count(None, distinct = True) would be a problem. What do you think - should we update count_star or maybe just add to the docstring to say that None arguments is a special case that ignores distinct?

In regards to AggregateBuilder would you be willing to wait one more release? We just merged in apache/datafusion#11550 which I'm guessing will be in datafusion 41.0. With that we get both the aggregate functions and window functions using the same API. That might make the minimal amount of churn. But if you do want to get in the aggregates for datafusion-python 40.0 I can probably knock that out in the next couple of days.

@Michael-J-Ward
Copy link
Contributor

In regards to AggregateBuilder would you be willing to wait one more release? We just merged in apache/datafusion#11550 which I'm guessing will be in datafusion 41.0.

First, excellent upstream PR. I literally have "improve upstream Builder API" in my post upgrade hit-list, so I'm very happy to see that you've knocked it out.

Second, we can absolutely wait till 41. I've been slow to the 40 upgrade, so let's get it out the door first.

What do you think - should we update count_star or maybe just add to the docstring to say that None arguments is a special case that ignores distinct?

Could count(None, distinct = True) map to count(lit(1), distinct=True) instead of count_star()? And before that, do we need to support count(None) at all?

…it is similar to count_star but still enables you to select distinct=True.
@timsaucer
Copy link
Contributor Author

Excellent suggestion. Pushed the change.

@timsaucer timsaucer mentioned this pull request Jul 26, 2024
@Michael-J-Ward Michael-J-Ward mentioned this pull request Jul 26, 2024
10 tasks
@andygrove andygrove merged commit 66bfe36 into apache:main Aug 1, 2024
14 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants