Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distinct_on to dataframe api #11012

Merged
merged 4 commits into from
Jun 21, 2024
Merged

Add distinct_on to dataframe api #11012

merged 4 commits into from
Jun 21, 2024

Conversation

Omega359
Copy link
Contributor

Which issue does this PR close?

Closes #11011

Rationale for this change

Add distinct_on to the dataframe api.

What changes are included in this PR?

code, tests, documentation

Are these changes tested?

Yes via tests in dataframe/mod.rs

Are there any user-facing changes?

dataframe API was amended.

@github-actions github-actions bot added the core Core DataFusion crate label Jun 19, 2024
@Omega359 Omega359 marked this pull request as ready for review June 19, 2024 20:22
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @Omega359

I left a suggestion to improve the example maybe with some comments. The semantics of distinct_on are a little mind blowing

datafusion/core/src/dataframe/mod.rs Outdated Show resolved Hide resolved
@alamb
Copy link
Contributor

alamb commented Jun 21, 2024

I merged this branch up from main to resolve a merge conflict

/// # async fn main() -> Result<()> {
/// let ctx = SessionContext::new();
/// let df = ctx.read_csv("tests/data/example.csv", CsvReadOptions::new()).await?
/// // Return a single row (a, b) for each distinct value of a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if DISTINCT supports all types, incl binary, complex, etc? if its not we should be mentioning it in the doc, and double check it returns a respective error instead of crash/corruption

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good idea -- I filed #11052 to track

However, I think it is not required for this PR so I merged it in

@alamb alamb merged commit 30a6ed5 into apache:main Jun 21, 2024
24 checks passed
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jun 22, 2024
* Add distinct_on to dataframe api apache#11011

* cargo fmt

* Update datafusion/core/src/dataframe/mod.rs as per reviewer feedback

Co-authored-by: Andrew Lamb <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jun 22, 2024
* Add distinct_on to dataframe api apache#11011

* cargo fmt

* Update datafusion/core/src/dataframe/mod.rs as per reviewer feedback

Co-authored-by: Andrew Lamb <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
@Omega359 Omega359 deleted the distinct_on branch July 15, 2024 13:46
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Add distinct_on to dataframe api apache#11011

* cargo fmt

* Update datafusion/core/src/dataframe/mod.rs as per reviewer feedback

Co-authored-by: Andrew Lamb <[email protected]>

---------

Co-authored-by: Andrew Lamb <[email protected]>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add distinct_on to dataframe api
3 participants