Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
-
Updated
Jul 7, 2024 - Python
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
Making data lake work for time series
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
Extract, Transform, Load: Any SQL Database in 4 lines of Code.
A tool for building feature stores.
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
Data pipelines from re-usable components
Stetl, Streaming ETL, is a lightweight geospatial processing and ETL framework written in Python.
Lightweight library to write, orchestrate and test your SQL ETL. Writing ETL with data integrity in mind.
Simplified ETL process in Hadoop using Apache Spark. Has complete ETL pipeline for datalake. SparkSession extensions, DataFrame validation, Column extensions, SQL functions, and DataFrame transformations
Python MLS and Real-Estate Data Scraper for the Realtor.ca Website
Airflow DAGs for the Stellar ETL project
TAC is an airflow plugin which helps you to Extract transform and Load your data, bit more easily
Flowrunner is a lightweight package to organize and represent Data Engineering/Science workflows
Event-driven task execution framework based on Kafka
Introduction to the data pipeline management with Airflow. Airflow schedule and maintain numerous ETL processes running on a large scale Enterprise Data Warehouse.
This project contains pyspark jobs to create data pipelines and shows how to distribute the project package on Cluster.
Add a description, image, and links to the etl-framework topic page so that developers can more easily learn about it.
To associate your repository with the etl-framework topic, visit your repo's landing page and select "manage topics."