Skip to content

A hands-on project built to deepen understanding of dbt modeling, testing, and documentation. Based on the Jaffle Shop dataset, the project showcases best practices in transforming and validating source data for business analytics using the modern data stack.

License

Notifications You must be signed in to change notification settings

DataCody/jaffle-shop-data-transformation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🥪 dbt-jaffle-shop

A Modern Data Stack Project for End-to-End Analytics Engineering

This repository is a customized implementation of the dbt-labs/jaffle-shop project, tailored to demonstrate proficiency in data modeling, testing, documentation, and orchestration using dbt. It serves as a comprehensive example of building a robust analytics pipeline from raw data ingestion to curated data marts.

🚀 Project Overview

  • Source Data: Simulated e-commerce datasets representing customers, orders, and payments.
  • TransformationLayers:
  • Staging Models: Clean and standardize raw data.
  • Intermediate Models: Join and enrich data across multiple sources.
  • Mart Models: Provide business-ready datasets for analytics and reporting.
  • Testing: Implemented data quality tests including unique, not_null, and referential integrity checks.
  • Documentation: Auto-generated documentation with detailed model and column descriptions.
  • Orchestration: Configured dbt Cloud jobs for scheduled runs and testing.

🧰 Tech Stack

  • dbt Core: SQL-based data transformation framework.
  • dbt Cloud: Hosted environment for development and job orchestration.
  • Data Warehouse: BigQuery.
  • Version Control: GitHub for source code management.
  • CI/CD: [Optional—mention if integrated].

🗺️ Project Structure

dbt-jaffle-shop/
├── analyses
├── docs/
├── models/
│   ├── staging/
│   ├── intermediate/
│   └── marts/
├── seeds/
├── tests/
├── macros/
├── snapshots/
├── dbt_project.yml
├── LICENSE
└── README.md

🧪 Testing & Quality Assurance

Implemented comprehensive testing strategies to ensure data reliability:

  • Schema Tests: Enforced constraints like unique and not_null.
  • Custom Tests: Developed bespoke tests for business logic validation.
  • Data Freshness: Configured freshness checks on source data.

📊 Documentation & Lineage

  • Auto-Generated Docs: Utilized dbt docs generate for creating interactive documentation.
  • Data Lineage: Visualized model dependencies and data flow.

📸 Documentation screenshot

Lineage Graph

📸 DAG visualization

Lineage Graph

🗓️ Scheduling & Orchestration

Configured dbt Cloud jobs with the following characteristics:

  • Environment: Production
  • Schedule: Daily runs at 9 AM AEST
  • Commands:
  • dbt seed
  • dbt run
  • dbt test

🧩 Key Features

  • Modular and scalable model architecture.
  • Adherence to dbt best practices and naming conventions.
  • Comprehensive testing and documentation.
  • Automated workflows for continuous integration and deployment.

📈 Future Enhancements

  • Integration with BI tools like Looker or Tableau.
  • Implementation of snapshots for slowly changing dimensions.
  • Expansion of test coverage with more complex scenarios.

📄 License

This project is licensed under the MIT License.

About

A hands-on project built to deepen understanding of dbt modeling, testing, and documentation. Based on the Jaffle Shop dataset, the project showcases best practices in transforming and validating source data for business analytics using the modern data stack.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published