Skip to content

Showcase data engineering and ETL skills using Python

Notifications You must be signed in to change notification settings

neooooo28/spotify-etl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction to Data Engineering using Spotify API

This personal projects aims to showcase the data engineering skills I've learned by building a data processing pipeline from the ground up. In particular, I aim to get my most recent (last 24 hours) played tracks from Spotify API and automate the ETL process using Airflow.

Requirements

pip install -r requirements.txt

Improvements

Right now this project relies on saving a .db file locally, in the future the plan is to incorporate Airflow DAGs to automatically refresh the API tokens and pull data on a daily basis. And then probably level up this entire process by using the AWS Ecosystem (S3, EMR, Redshift)

Demo

This is a sample view of my played tracks on May 1st 2021, as seen on a SQLite viewer

About

Showcase data engineering and ETL skills using Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages