ETL stands for Extraction, Transformation and Loading.
This project uses various python modules such as my-sqlconnector-python, pandas, sqlalchemy for peroforming the ETL job.
Extraction: Reading an existing source database and copying it to our machine. Here we extract data in 1:1 unloading which means each table in the database will be convertd to one csv file.
The details of the source database from which data needs to be extracted are stored in config3.json file. In this example we used "sakila" database that comes dafault in mysql-workbench with pre-loaded data. Using mysql-connector the script connects to database and loops through it and converts each table into a dataframe using pandas and dataframe into csv file in tables directory.
Transformation: Manipulating of performing the operations such as concatinating columns, drop tables, joining tables etc on the extracted data for analytics purpose.
This transformation scipt will parse through transformations in config3.json and converts requires csv files into pandas dataframes. Then the transformation specified in transformations is applied on dataframe. After all the transformation the dataframe again converted to csv file in result directory.
Loading: Exporting the transformed data into a target database for analysis.
The loading script converts all the transformed csv files into pandas dataframes and reads the target database from config3.json. Using sqlalchemy and sqlalchemy_utils it creates the target database and loads all the transformed dataframes into database as tables.