A simplified, lightweight ETL Framework based on Apache Spark
-
Updated
Jan 24, 2024 - Scala
A simplified, lightweight ETL Framework based on Apache Spark
A simple Spark-powered ETL framework that just works 🍺
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
Yet Another SPark Framework
Arrival delay time prediction of commercial flights (UPM's Master in Data Science project for Big Data subject)
This project is a tempale for performing etl using Kafka, Spark and hive.
seatunnel plugin developing examples.
Batch ETL data pipeline built on HDP 3.0 to process daily sales and business data to procedure power Bi reports. Automated the pipelines using Airflow.
Bigdata processing (Realtime ETL DataPipeline) using Avro Schema Registry, Spark, Kafka, HDFS, Hive, Scala, docker, spark-streaming
Data monitoring tool, monitors the result, not the run
Scala data-pipeline for amazon moview reviews data processing using kafka & spark streaming
STM data enrichment, Extract, Transform, Load (e.g., ETL)
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Data Tweak is a simplified, lightweight ETL framework based on Apache Spark.
Repository for playing with spark
Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."