-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataframe.with_column_rename has unintuitive behavior when using case sensitive column names #8800
Comments
I agree this is confusing -- in general people using the DataFrame API have been confused by this as DataFrame APIs typically are case sensitive, while SQL is not (by default) |
One of the concerns, yes. I also was expecting as Andrew pointed out that the names would be case sensitive. Using double quotes is a solution however imho it's not an ideal one. |
Ident normalization is a standard practice
parameter to override this behaviuor. Please refer to |
The signature and comments for
@Omega359 do you think anything still left that can be improved? |
Yes. Hiding rename case behaviour behind a flag that as far as I can find isn't prominently noted in the docs isn't great. It's really not obvious as a user of the api why it isn't working when you provided the 'correct' name. Having to dig into the source code, while awesome to have it available, is not ideal to solving an issue like this, especially for users who are still getting up to speed on rust. I didn't have a solution to my issue until a core developer popped into the discord channel to help. A few possible solutions I see of which I'll help with #2 time permitting:
|
Somehow we missed the description for this param in https://github.com/apache/arrow-datafusion/blob/main/docs/source/user-guide/configs.md I'll create a PR as well as document the behavior for UPD: My bad we have this param documented in https://github.com/apache/arrow-datafusion/blob/main/docs/source/user-guide/configs.md |
Describe the bug
If a data frame has column names such as FileId, FileType, SequenceId attempting to rename this with calls such as
data_frame = data_frame.with_column_renamed("FileId", "file_id").unwrap();
will return success however the column will not be renamed. After some investigation by Andy Grove it was determined that the names would have to be wrapped in quotes (ala sql) to have the columns renamed correctly.
data_frame = data_frame.with_column_renamed("\"FileId\"", "file_id").unwrap();
This is neither intuitive, documented nor normal for any dataframe api I've encountered in the past.
To Reproduce
I wrote a quick test case when investigating this issue.
Expected behavior
For me the expectation is that referring to the column name in the first parameter would be 'as-is' without any configuration changes (say, to use case insensitive names or to require quotes ala sql)
Additional context
No response
The text was updated successfully, but these errors were encountered: