Allen_Wang_miniproj_11

Overview

This project demonstrates a complete data pipeline using Databricks, showcasing how to extract data from an external url, transform it with SQL and Python, and load it into a structured format for analysis. The project includes a CI/CD setup for ensuring code quality, reproducibility, and testing. The pipeline identifies trends in alcohol consumption and drug use across different countries and age groups, with a focus on actionable insights from complex SQL queries.

Pipeline Overview

Data Pipeline Components:

Data Source: drinks and drug use tables.
Data Sink: Transformed data is stored in Delta tables on Databricks.
Transformation: Fill in na and new features created
Visualization: Analysis results are visualized using Python's Matplotlib and Seaborn.

Pipeline Steps:

Extract data from url.
Load data into a Databricks Delta table.
Apply ransformations for data aggregation and filtering.
Visualize the results and save plots.

Project Structure

mylib/: Python scripts for SQL queries, data extraction, and transformations.
.devcontainer/: Configuration for the development container.
Makefile: Provides commands for setup, formatting, linting, testing, and running SQL queries:
- make install: Installs dependencies.
- make format: Formats Python files.
- make lint: Lints Python files.
- make test: Runs unit tests.
- make all: Runs all tasks (install, format, lint, and test).
.github/workflows/CICD.yml: CI/CD pipeline configuration using GitHub Actions.
README.md: Setup instructions, usage guidelines, and project description.

Setup

Clone the repository:

git clone https://github.com/nogibjj/Allen_Wang_miniproj_11.git
cd Allen_Wang_miniproj_11

Install dependencies:
```
make install
```
Format code:
```
make format
```
Lint code:
```
make lint
```
Test code:
```
make test
```

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
img		img
mylib		mylib
.gitignore		.gitignore
Makefile		Makefile
Pipeline.png		Pipeline.png
README.md		README.md
alcohol_servings_by_type.png		alcohol_servings_by_type.png
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py
test_main.py		test_main.py
top_10_countries_alcohol_consumption.png		top_10_countries_alcohol_consumption.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Allen_Wang_miniproj_11

Overview

Pipeline Overview

Data Pipeline Components:

Pipeline Steps:

Project Structure

Setup

Visualization from Query

Databricks Pipeline

About

Releases

Packages

Languages

nogibjj/Allen_Wang_miniproj_11

Folders and files

Latest commit

History

Repository files navigation

Allen_Wang_miniproj_11

Overview

Pipeline Overview

Data Pipeline Components:

Pipeline Steps:

Project Structure

Setup

Visualization from Query

Databricks Pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages