Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csvsql: Option to output INSERT statements #1049

Open
merkys opened this issue Nov 28, 2019 · 7 comments
Open

csvsql: Option to output INSERT statements #1049

merkys opened this issue Nov 28, 2019 · 7 comments

Comments

@merkys
Copy link

merkys commented Nov 28, 2019

I want to convert a CSV file to SQL CREATE TABLE and INSERT statements. However, csvsql insists on providing --db together with --insert. This is strange, as csvsql produces CREATE TABLE itself, so it knows the schema and should be able to create INSERT statements without any additional information.

@jpmckinney
Copy link
Member

csvkit (or agate-sql, on which it depends) doesn't have a facility to output INSERT statements. This would be a new feature. That said, you can insert rows into a database using csvsql, and then dump the database to get the INSERT statements as text.

@jpmckinney jpmckinney changed the title csvsql: why '--db' is needed with '--insert'? csvsql: Option to output INSERT statements Nov 29, 2019
@merkys
Copy link
Author

merkys commented Dec 2, 2019

Thanks for prompt response.

That said, you can insert rows into a database using csvsql, and then dump the database to get the INSERT statements as text.

Right, but this requires the 'live' database, when in principle outputting CREATE and INSERT could be done without it. This would be a nice feature.

@iainelder
Copy link

My use case: I have been given an Excel file containing codes to load into a SQL Server database.

I can't query the server directly from my workstation. The server environment does not have Python installed.

I considered generating the SQL statements locally and copying the file to where I can run SQL commands against the server.

As it's not possible, my workaround is to copy the CSV to the database server and load the data into a temporary table using SQL Server Integration Services.


As for a solution, I'm guessing what needs to change is to_sql function.

It looks like there is an opportunity to stringify the insert expression before executing it, or instead of executing.

https://github.com/wireservice/agate-sql/blob/02f46e0cac11a2d35e9d4441b27748b9be95ee9a/agatesql/table.py#L231-L300

def to_sql(self, connection_or_string, table_name, overwrite=False,
           create=True, create_if_not_exists=False, insert=True, prefixes=[],
           db_schema=None, constraints=True, unique_constraint=[], chunk_size=None, 
           min_col_len=1, col_len_multiplier=1):
    [...]

    if insert:
        insert = sql_table.insert()
        for prefix in prefixes:
            insert = insert.prefix_with(prefix)
        if chunk_size is None:
            connection.execute(insert, [dict(zip(self.column_names, row)) for row in self.rows])
        else:
            number_of_rows = len(self.rows)
            for index in range((number_of_rows - 1) // chunk_size + 1):
                end_index = (index + 1) * chunk_size
                if end_index > number_of_rows:
                    end_index = number_of_rows
                connection.execute(insert, [dict(zip(self.column_names, row)) for row in
                                            self.rows[index * chunk_size:end_index]])


    try:
        return sql_table
    finally:
        if engine is not None:
            connection.close()
            engine.dispose()

@jpmckinney
Copy link
Member

Hmm, this is awkward to implement, because the agate-* plugins are meant to write files (e.g. XLSX), return agate tables, return SQLAlchemy tables, etc.

To write this output, we could just print() to standard output from within agate-sql, but that seems like a bad API choice.

We could add a method to agate-sql that is similar to to_sql (we can maybe extract common logic to a third method), and this method would yield the Insert constructs to csvkit, which could then print them. That's probably the cleanest way, if anyone whats to write a PR.

@iainelder
Copy link

agate-* plugins are meant to write files

Could it write a file called insert_statements.sql that contains the desired output?

I haven't used csvsql since I worked around the problem in 2011, so I won't write the PR, but I look forward to someone who can :-)

@jpmckinney jpmckinney added this to the Priority milestone Oct 18, 2023
@jpmckinney
Copy link
Member

If yielding the Insert constructs from agate-sql is too much trouble, that could be another option.

@PRNDA
Copy link

PRNDA commented Mar 28, 2024

+1 for this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants