Skip to content

This package takes a CSV and generates SQL statements to create a temp table and insert the data within it into the temp table. Good when you need to move data between warehouses for ad hoc analysis and the warehouses are not otherwise connected.

License

Notifications You must be signed in to change notification settings

ryanwithawhy/generate_temp_table_sql

Repository files navigation

Generate Temp Table SQL

Generate Temp Table SQL is a Python package that generates SQL statements for creating a temporary table and inserting data from a CSV file. It's useful when you need to move data between disconnected databases and data warehouses. You can now simply unload a CSV, turn that CSV into SQL statements creating a temp table and inserting data with a CLI command, and copy those SQL statements into your query editor so you can start using the data in a different warehouse.

Why did I build this? I often need to move data between disconnected warehouses. E.g. when I need to join data in my finance warehouse with data in my operational warehouse to see how a feature is linked to customer spend. I can use excel to convert a CSV to SQL statements...but it's a PITA.

With generate_temp_table_sql, I can now move the data in seconds. I simply download the CSV, run generate-tt-sql csv_name.csv in my terminal, and copy it into my query editor. It's immediately available to query so I can go straight to analysis.

If you have the same problem, this will save you a ton of time!

Also available in PyPi.

Features

  • Load data from a CSV file
  • Generate a CREATE TEMP TABLE SQL statement
  • Generate INSERT INTO SQL statements for the data
  • Command-line interface (CLI) for easy usage

Installation

Prerequisites

  • Python 3.6 or higher
  • pandas library

Installing

  1. Clone the repository:

    git clone https://github.com/rywaldor/generate_temp_table_sql.git
    cd generate_temp_table_sql
  2. Install the package locally:

    pip install -e .

Usage

Command-Line Interface (CLI)

After installing the package, you can use the generate-tt-sql command to generate SQL statements from a CSV file.

Basic Usage

To generate SQL statements and save them to a file:

generate-tt-sql path/to/your/file.csv

Additional Options

--o The path to the output SQL file.  Defaults to the director you call the command in.
--overwrite: Allow overwriting the output file if it exists.
--table_name: Specify the name of the temporary table to create.
--column_type: Specify the data type of the columns in the temporary table. Defaults to TEXT which works for Redshift and Snowflake. Use STRING for BigQuery.
--batch_size: Specify the number of rows inserted per insert statement.  If present, it creates multiple insert statements based on the batch size specified.

Example

Assume you have a CSV file example.csv with the following content:

name,age,city
John,30,New York
Jane,25,Los Angeles

Run the following command to generate SQL statements:

generate-tt-sql example.csv -o output.sql --table_name my_temp_table --column_type STRING

The output.sql file will contain:

CREATE TEMP TABLE my_temp_table (
    name STRING,
    age STRING,
    city STRING
);

--Insert Data SQL:
INSERT INTO my_temp_table (name, age, city) VALUES 
    ('John', '30', 'New York'),
    ('Jane', '25', 'Los Angeles');

All arguments are optional except the csv

Running Tests

To run the tests, use the following command:

python -m unittest discover -s tests

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature-branch)
  3. Commit your changes (git commit -am 'Add new feature')
  4. Push to the branch (git push origin feature-branch)
  5. Create a new Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Ryan Waldorf - [email protected]
GitHub LinkedIn

About

This package takes a CSV and generates SQL statements to create a temp table and insert the data within it into the temp table. Good when you need to move data between warehouses for ad hoc analysis and the warehouses are not otherwise connected.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published