Skip to content

As a data engineer, this project aims to design a dockerized ETL pipeline to ingest data into data lake and transform them for machine learning purposes

License

Notifications You must be signed in to change notification settings

skevin-dev/ad_challenge

Repository files navigation

Table of Contents
  1. About The Project
  2. Getting Started
  3. Contributing
  4. License
  5. Contact
  6. Acknowledgements

About The Project

Project Architecture A dockerized Extract, Transform, Load (ETL) pipeline with PostgreSQL, Airflow, and DBT.

Data Engineering Tasks

  • Ingestion of given raw data into a data lake of your choice.
  • Modeling the data to reduce the memory process and improve the performance of fetch queries.
  • ETL pipeline to enrich your data into a data warehouse following your models.
  • Validator to validate the correctness of your data for the ETL pipeline.
  • And finally, an interface to expose your actionable data for the Machine learning purposes.

Tech Stack used in this project

  • Docker
  • Postgres
  • Airflow
  • DBT

Getting Started

Prerequisites

Make sure you have docker installed on local machine.

  • Docker
  • Docker Compose

Installation

  1. Clone the repo
    git clone https://github.com/skevin-dev/ad_challenge
  2. Navigate to the folder
    cd ad_challenge
  3. Build an airflow image
    docker build . --tag apache_dbt/airflow:2.3.3
  4. Run
     docker-compose up
  5. Open Airflow web browser
    Navigate to `http://localhost:8089/` on the browser
    activate and trigger ingestion_data

Airflow data lineage

data lineage

dbt data lineage

data lineage

DBT documentation

https://curious-wisp-482466.netlify.app

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Shyaka Kevin - [email protected]

Acknowledgments

About

As a data engineer, this project aims to design a dockerized ETL pipeline to ingest data into data lake and transform them for machine learning purposes

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages