Skip to content
#

hadoop-hdfs

Here are 56 public repositories matching this topic...

MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

  • Updated Oct 2, 2019
  • Python

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

  • Updated Aug 17, 2024
  • Python

This ETL pipeline project is a practical demonstration of my skills in data engineering and automation using Python and Apache Airflow. By integrating MySQL for data storage and leveraging Airflow for task orchestration, the project simulates a scalable and modular ETL solution often required in enterprise data workflows.

  • Updated Aug 17, 2024
  • Python

Improve this page

Add a description, image, and links to the hadoop-hdfs topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the hadoop-hdfs topic, visit your repo's landing page and select "manage topics."

Learn more