Skip to content

Created a data pipeline to stream data and generate real-time alerts using NiFi, Kafka and Spark

Notifications You must be signed in to change notification settings

harshitagrawal02/Sapient-Data-Engineer-Challenge

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Sapient-Data-Engineer-Challenge

This is my approach to Sapient Talent Hunt for Data Engineers challenge which was hosted on Analytics Vidhya. I secured second rank in this challenge.

In this challenge, I had to generate alerts based on sensor data. Detailed problem statement is given here. Basically, sensors are generating data per minute. I had to consume this data in streaming fashion and generate two kinds of alerts on it. Use of a kafka component reading data from csv file and sending it to any streaming engine was compulsory.

Softwares Used:

  1. NiFi
  2. Kafka
  3. Spark (Streaming and Batch)
  4. Parquet

Please go through following files:

  1. Problem Statement : This file contains problem statement as well as data description.
  2. Data Pipeline Document : It has detailed information about pipeline such as data flow diagram, preprocessing, null value imputation and future scope.

About

Created a data pipeline to stream data and generate real-time alerts using NiFi, Kafka and Spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%