Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csvsql: Add option to use ON CONFLICT DO NOTHING/UPDATE #1186

Open
jschuur opened this issue Nov 23, 2022 · 2 comments
Open

csvsql: Add option to use ON CONFLICT DO NOTHING/UPDATE #1186

jschuur opened this issue Nov 23, 2022 · 2 comments

Comments

@jschuur
Copy link

jschuur commented Nov 23, 2022

I'm using csvsql to insert into to a Postregress database, and I'd like it to ignore any errors generated from unique constraints (basically skip importing the same data twice).

In Postgres and that would be handled like this:

INSERT INTO table (num1, num2) VALUES (1,1) ON CONFLICT DO UPDATE

However, it doesn't look like any of the current csvsql options allow me to modify the insert statement like this, since --prefix would add it direcly after INSERT and --after-insert executes a seperate statement. Postgres doesn't support the 'OR IGNORE' format that MySQL does.

Seems like a --suffix option to append something to the end of the query would be useful here, or is this a fundamental issue, since agate-sql doesn't support this?

Are there any alternatives for ignoring unique constraint insert attempt errors? Even just a flag to ignore any error from an individual insert statement and keep trying would help here.

@jpmckinney
Copy link
Member

jpmckinney commented Dec 20, 2022

Indeed. agate-sql uses SQLAlchemy, and --prefix causes it to call prefix_with(). Your feature would involve adding a CLI argument that causes it to call on_conflict_do_update or on_conflict_do_nothing.

If you write a PR for agate-sql, I can expose it in csvkit.

@jpmckinney jpmckinney changed the title Support '--suffix' in csvsql (for Postgres ON CONFLICT DO NOTHING e.g.) csvsql: Add option to use ON CONFLICT DO NOTHING/UPDATE Dec 20, 2022
@dmannarino
Copy link

In case it's helpful, what we do is use csvsql to create the table (including unique constraints) and then psql to load the data from the CSVs. See here:
https://github.com/wri/gfw-data-api/blob/master/batch/scripts/create_tabular_schema.sh
https://github.com/wri/gfw-data-api/blob/master/batch/scripts/load_tabular_data.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants