Code for the Data Engineering for Beginners e-book.
The code for SQL, Python, and data model sections are written using Spark SQL. To run the code, you will need the prerequisites listed below.
Windows users: please setup WSL and a local Ubuntu Virtual machine following the instructions here. Install the above prerequisites on your ubuntu terminal; if you have trouble installing docker, follow the steps here (only Step 1 is necessary). Please install the make command with sudo apt install make -y (if it's not already present).
Fork this repository data_engineering_for_beginners_code.
After forking, clone the repo to your local machine and start the containers as shown below:
git clone https://github.com/your-user-name/data_engineering_for_beginners_code.git
cd data_engineering_for_beginners_code
docker compose up -d # to start the docker containers
sleep 30 Open the Starter Jupyter Notebook at http://localhost:8888/lab/tree/notebooks/starter-notebook.ipynb and try out the commands in ther Data Engineering for Beginners e-book as shown below.
If you are creating a new notebook, make sure to select the Python 3 (ipykernel) Notebook.
When you are done, stop docker containers with the below command:
docker compose down For the Airflow, dbt & capstone section, go into the airflow directory and run the make commands as shown below.
Note All the code in the dbt, Airflow and capstone chapters are to be run via the terminal at data_engineering_for_beginners_code/airflow directory.
docker compose down # Make sure to stop Spark/Jupyternotebook containers before turning on Airflow's
cd airflow
make restart # This will ask for your password to create some foldersYou can open Airflow UI at http://localhost:8080 and log in with airflow as username and password. In the Airflow UI, you can run the dag.
After the dag is run, in the terminal, run make dbt-docs for dbt to serve the docs, which is viewable by going to http://localhost:8081.
You can stop the containers & return to the parent directory as shown below:
make down
cd ..