-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV Column Rename #530
Comments
@onyxfish Is your opinion the same as in #310 (comment) ? |
I think so. This still feels like it crosses the line over into the realm of things the command line is a bad environment for. Two comma separated, quoted lists of columns names are just not a very clear way of expressing this behavior—and the length of the commands gets unwieldy very fast. That being said, I had a need for something like this just this week, so I can feel the pain. @jpmckinney What do you think? |
I think the common case would be to rename one or two columns, not all the columns, in which case the length of the command is fine. I have needed this, too, when, for whatever reason, the government changed one header in one file in a set of files. |
It's been my experience that typically when those kind of changes happen columns are also inserted and removed, as was the case in #310. For instance, this is the case with Census Bureau County Business Patterns data files, which pickup a new column suddenly in 2008. It's opening the door to that cascade of related "slight tweak" problems that I'm leery of. |
That's reasonable. That would also have resolved my issue with the CBP data. Happy to consider that as an extension to |
csvstack is currently streaming, and I'd like to preserve that. |
Well for what it's worth this is now implemented in agate. It should be pretty straightforward to duplicate the logic for the csvkit streaming interface. |
Noting that the method is |
Thank you for suggesting this new CSV tool. However, the maintainers have decided to not author, merge or maintain new tools; there is simply not enough time to do so. Our focus is instead on making the existing tools as good as possible. We encourage you to create and maintain your own tool as a separate Python package. You may want to use the agate library, which csvkit uses for most of its CSV reading and writing. Doing so will make it easier to maintain common behavior with csvkit’s tools. |
This is disappointing, quite frankly. This would be a very useful feature for quickly cleaning up data from the command line. I don't at all see how this crosses into "the realm of things the command line is a bad environment for". On the contrary, I've been able to write rather intricate scripts/pipelines for data munging, and this is the sort of thing I hate having to drop down to |
@metasoarous That was not the reason for closing the issue - re-read the last comment before closing. If you want this feature, implement it. The maintainers are not your free labor. |
I read the comment before writing. I also didn't demand that anyone work on it. The original poster indicated they'd already created an implementation. I was only making a plea/suggestion that it be considered for inclusion, and am saddened not only about this issue but that you and the rest of the team are categorically apposed to any new tools. It's your project though. Obviously you have the right to do what you like with it. As a user, I just wanted to let you know how I feel about it. |
renaming columns can be done with csvsql. Just alias the names:
|
Open issue: #396 |
Hello @onyxfish ,
thanks for csvkit and your other cool tools like agate and proof.
I wanted to ask if you would integrate a new part into csvkit.
CSVRename
csvrename
would allow you to change the header columns of your dataset. agate has a similar tool foragate.Table
and I often have to rename columns.Right now I'm using the
header
shell script from the book Data Science for the Command Line ToolkitIt works but everybody has to install it by themselfes so I cannot share my data pipelines(Makefiles).
I already created a
csvrename
and it would work as follow:Rename/Replace all headers
Replace all the colum headers with new ones, as long as the list has the same length as the columns.
Rename specific column headers
I potentially would add another argument to select the columns by index:
What do you think? Or do you have another easy way to do it?
Thanks.
The text was updated successfully, but these errors were encountered: