Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to add new columns before using csvstack to merge file with extra columns #310

Closed
philipashlock opened this issue Aug 18, 2014 · 3 comments

Comments

@philipashlock
Copy link

Sometimes I want to merge two files, but one file has an additional column that I want to preserve. Let's call these two files: three_columns.csv and four_columns.csv. In order to use csvstack to merge them, I first want to make sure they'll have the same columns, so I first create an empty new column on three_columns.csv:

awk '{ print $0 "," }' < three_columns.csv

If the order of the columns are different, I might then need to process one with csvcut and then I can merge them with csvstack

csvstack three_columns.csv four_columns.csv > merged_file.csv

However three_columns.csv doesn't have the heading of the new column, so I need to manually replace the column headings for the whole file, eg:

var="These,Are,My,Columns"
sed "1s/.*/$var/" merged_file.csv

Is there a better way of doing this?

It might be nice if there was a simpler way to add new columns and rename column headings using a command packaged with csvkit.

@onyxfish
Copy link
Collaborator

onyxfish commented Sep 5, 2014

As it stands I don't think there is a better way. I don't think a better way will be in the offing with csvkit. My mantra for csvkit has always been "its not an editor. Batch processing and conversion are in my wheelhouse, but this sort of manipulation falls on the other side of the fence with things like find/replace. On the other hand, if you or somebody else felt like building a tool for doing these sorts of granular alterations, I'd happily link to it from the docs.

@onyxfish onyxfish closed this as completed Sep 5, 2014
@andresgottlieb
Copy link

Just experienced this exact same situation and, after reading @onyxfish 's argument, I wanted to say that, IMHO, this is not editing at all, but a very common use case in which you want to stack 2 or more csv files with only some corresponding columns. It's analogous to the csvjoin command, but transposed to rows instead of columns and, as such, it could be solved similarly as it was it was solved there with --outer, --left and --right.

There should (or at least I'd be very happy if there was) be, at least, a modifier (eg. --all) which tells the csvstack command to keep all the involved files' columns for the output file, and just leave the values null for rows coming from files without that column.

@jpmckinney
Copy link
Member

I think this may be solved when #245 is closed.

lcorbasson pushed a commit to lcorbasson/csvkit that referenced this issue Sep 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants