Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drop_columns to dataframe api #11010

Merged
merged 4 commits into from
Jun 21, 2024
Merged

Add drop_columns to dataframe api #11010

merged 4 commits into from
Jun 21, 2024

Conversation

Omega359
Copy link
Contributor

Which issue does this PR close?

Closes #11007

Rationale for this change

Add drop_columns to the dataframe api.

What changes are included in this PR?

code, tests, documentation

Are these changes tested?

Yes via tests in dataframe/mod.rs

Are there any user-facing changes?

dataframe API was amended.

@Omega359 Omega359 changed the title Add drop_columns to dataframe api Drop columns Add drop_columns to dataframe api Jun 19, 2024
@github-actions github-actions bot added the core Core DataFusion crate label Jun 19, 2024
@Omega359 Omega359 marked this pull request as ready for review June 19, 2024 17:13
Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @Omega359

I left a little suggestion about the container's type.

.schema()
.qualified_field_with_unqualified_name(name)
})
.collect::<Result<Vec<_>>>()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collect into a hash table might be good for wide table scenarios

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I'll take a look. The logic for this function was based on the select_columns fn which did things that way.

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Omega359
Appreciate if you add more tests, like drop with empty array
Drop where columns are with double quotes, drop non existent cols, drop duplicated

@alamb
Copy link
Contributor

alamb commented Jun 21, 2024

Thanks @Omega359 @comphead and @waynexia -- the new tests look good to me 🚀

@alamb alamb merged commit 5498a02 into apache:main Jun 21, 2024
24 checks passed
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jun 22, 2024
* Add drop_columns to dataframe api apache#11007

* Prettier cleanup

* Added additional drop_columns tests and fixed issue with nonexistent columns.
xinlifoobar pushed a commit to xinlifoobar/datafusion that referenced this pull request Jun 22, 2024
* Add drop_columns to dataframe api apache#11007

* Prettier cleanup

* Added additional drop_columns tests and fixed issue with nonexistent columns.
@Omega359 Omega359 deleted the drop_columns branch July 15, 2024 13:46
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Add drop_columns to dataframe api apache#11007

* Prettier cleanup

* Added additional drop_columns tests and fixed issue with nonexistent columns.
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add drop_columns to dataframe api
4 participants