Skip to content

DataCody/pvoutput-streaming-monitoring

Repository files navigation

🌞 Sunnybank – End-to-End Solar Energy Data Platform

Designed and implemented a complete data pipeline for solar panel performance monitoring using real-time web scraping, data cleaning (bronze/silver/gold layers), and interactive dashboards.

🔧 Technologies:
Python, Selenium, BeautifulSoup, pandas, dbt, Airflow, Databricks, PostgreSQL, Plotly/Dash

✨ Key Features:

  • 📅 Automated daily data extraction from pvoutput.org
  • 🧹 Data transformation into bronze → silver → gold layers using dbt
  • ⏱️ Scheduled workflows and job orchestration via Airflow
  • 📊 Interactive dashboards to visualize:
    • System efficiency
    • Power generation trends
    • Anomalies and system health
  • 🔁 Modular design with support for multiple solar systems (multi-SID)

How to use it

Create virtural environments

python3 -m venv venv source venv/bin/activate

Install dependencies.

pip install -r requirements.txt

Leveraged PySpark in Databricks to aggregate multi-site solar energy data across 30,000+ records, enabling system-wide performance benchmarking, anomaly detection, and cross-SID comparisons. Data stored as Delta tables for efficient downstream dashboard consumption.

About

End-to-End PV Monitoring & Streaming Pipeline with Delta Lake

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published