Allen_Wang_miniproj_11

Overview

This project demonstrates a complete data pipeline using Databricks, showcasing how to extract data from an external url, transform it with SQL and Python, and load it into a structured format for analysis. The project includes a CI/CD setup for ensuring code quality, reproducibility, and testing. The pipeline identifies trends in alcohol consumption and drug use across different countries and age groups, with a focus on actionable insights from complex SQL queries.

Pipeline Overview

Data Pipeline Components:

Data Source: drinks and drug use tables.
Data Sink: Transformed data is stored in Delta tables on Databricks.
Transformation: Fill in na and new features created
Visualization: Analysis results are visualized using Python's Matplotlib and Seaborn.

Pipeline Steps:

Extract data from url.
Load data into a Databricks Delta table.
Apply ransformations for data aggregation and filtering.
Visualize the results and save plots.

Project Structure

mylib/: Python scripts for SQL queries, data extraction, and transformations.
.devcontainer/: Configuration for the development container.
Makefile: Provides commands for setup, formatting, linting, testing, and running SQL queries:
- make install: Installs dependencies.
- make format: Formats Python files.
- make lint: Lints Python files.
- make test: Runs unit tests.
- make all: Runs all tasks (install, format, lint, and test).
.github/workflows/CICD.yml: CI/CD pipeline configuration using GitHub Actions.
README.md: Setup instructions, usage guidelines, and project description.

Setup

Clone the repository:

git clone https://github.com/nogibjj/Allen_Wang_miniproj_11.git
cd Allen_Wang_miniproj_11

Install dependencies:
```
make install
```
Format code:
```
make format
```
Lint code:
```
make lint
```
Test code:
```
make test
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Allen_Wang_miniproj_11

Overview

Pipeline Overview

Data Pipeline Components:

Pipeline Steps:

Project Structure

Setup

Visualization from Query

Databricks Pipeline

Files

README.md

Latest commit

History

README.md

File metadata and controls

Allen_Wang_miniproj_11

Overview

Pipeline Overview

Data Pipeline Components:

Pipeline Steps:

Project Structure

Setup

Visualization from Query

Databricks Pipeline