Ability to add new columns before using csvstack to merge file with extra columns #310

philipashlock · 2014-08-18T22:42:56Z

Sometimes I want to merge two files, but one file has an additional column that I want to preserve. Let's call these two files: three_columns.csv and four_columns.csv. In order to use csvstack to merge them, I first want to make sure they'll have the same columns, so I first create an empty new column on three_columns.csv:

awk '{ print $0 "," }' < three_columns.csv

If the order of the columns are different, I might then need to process one with csvcut and then I can merge them with csvstack

csvstack three_columns.csv four_columns.csv > merged_file.csv

However three_columns.csv doesn't have the heading of the new column, so I need to manually replace the column headings for the whole file, eg:

var="These,Are,My,Columns"
sed "1s/.*/$var/" merged_file.csv

Is there a better way of doing this?

It might be nice if there was a simpler way to add new columns and rename column headings using a command packaged with csvkit.

The text was updated successfully, but these errors were encountered:

onyxfish · 2014-09-05T21:06:45Z

As it stands I don't think there is a better way. I don't think a better way will be in the offing with csvkit. My mantra for csvkit has always been "its not an editor. Batch processing and conversion are in my wheelhouse, but this sort of manipulation falls on the other side of the fence with things like find/replace. On the other hand, if you or somebody else felt like building a tool for doing these sorts of granular alterations, I'd happily link to it from the docs.

andresgottlieb · 2015-04-30T20:44:42Z

Just experienced this exact same situation and, after reading @onyxfish 's argument, I wanted to say that, IMHO, this is not editing at all, but a very common use case in which you want to stack 2 or more csv files with only some corresponding columns. It's analogous to the csvjoin command, but transposed to rows instead of columns and, as such, it could be solved similarly as it was it was solved there with --outer, --left and --right.

There should (or at least I'd be very happy if there was) be, at least, a modifier (eg. --all) which tells the csvstack command to keep all the involved files' columns for the output file, and just leave the values null for rows coming from files without that column.

jpmckinney · 2015-11-23T18:36:38Z

I think this may be solved when #245 is closed.

onyxfish closed this as completed Sep 5, 2014

jpmckinney mentioned this issue Jan 27, 2016

CSV Column Rename #530

Closed

lcorbasson pushed a commit to lcorbasson/csvkit that referenced this issue Sep 7, 2020

Implement MappedSequence.get. Closes wireservice#310.

8e5f83c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to add new columns before using csvstack to merge file with extra columns #310

Ability to add new columns before using csvstack to merge file with extra columns #310

philipashlock commented Aug 18, 2014

onyxfish commented Sep 5, 2014

andresgottlieb commented Apr 30, 2015

jpmckinney commented Nov 23, 2015

Ability to add new columns before using csvstack to merge file with extra columns #310

Ability to add new columns before using csvstack to merge file with extra columns #310

Comments

philipashlock commented Aug 18, 2014

onyxfish commented Sep 5, 2014

andresgottlieb commented Apr 30, 2015

jpmckinney commented Nov 23, 2015