Skip to content

HeartSaVioR/iot-trucking-app-spark-structured-streaming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IoT Trucking App with Spark Structured Streaming

This project reimplements HDF IoT Trucking Apps with Spark Structured Streaming.

This project provides couple of sample applications which leverage stream-stream join, window aggregation, deduplication respectively.

This project depends on HDP version of Spark, so you may want to modify Spark version before building the project to accomodate your installation of Spark.

Apps depend on Truck Geo Event as well as Truck Speed Event, which are well explained in README.md in sam-trucking-data-utils.

Apps also assume both truck geo events and truck speed events are ingested into Kafka topics, which can be easily done via sam-trucking-data-utils.

To run application on this project, please follow below steps:

Build the project

sbt assembly

Create Kafka topics for input events as well as result

$KAFKA_HOME/bin/kafka-topics.sh --create --topic truck_events_stream \
--zookeeper localhost:2181 --replication-factor 1 \
--partitions 10

$KAFKA_HOME/bin/kafka-topics.sh --create --topic truck_speed_events_stream \
--zookeeper localhost:2181 --replication-factor 1 \
--partitions 10

$KAFKA_HOME/bin/kafka-topics.sh --create --topic trucking_app_query_progress \
--zookeeper localhost:2181 --replication-factor 1 \
--partitions 10

$KAFKA_HOME/bin/kafka-topics.sh --create --topic trucking_app_result \
--zookeeper localhost:2181 --replication-factor 1 \
--partitions 10

(optional) Create and set up checkpoint directory

You would like to set up directory for checkpointing. It could be either local directory if you plan to run the app in local environment, or HDFS-like remote directory if you plan to run the app in cluster.

Generate Trucking events and ingest to kafka topics

Assuming we cloned sam-trucking-data-utils and checked out branch fix-for-spark-structured-streaming, and built the project:

nohup java -cp ./target/stream-simulator-jar-with-dependencies.jar \
hortonworks.hdf.sam.refapp.trucking.simulator.SimulationRunnerApp \
200000 \ hortonworks.hdf.sam.refapp.trucking.simulator.impl.domain.transport.Truck \
hortonworks.hdf.sam.refapp.trucking.simulator.impl.collectors.KafkaJsonEventCollector \
1 \
./routes/midwest/ \
100 \
localhost:9092 \
ALL_STREAMS \
NONSECURE &

Run application (IotTruckingAppJoinedAbnormalEvents app for example)

$SPARK_HOME/bin/spark-submit \
--class com.hortonworks.spark.IotTruckingAppJoinedAbnormalEvents \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--executor-memory 2g \
--executor-cores 1 \
./build/iot-trucking-app-spark-structured-streaming.jar \
--brokers localhost:9092 \
--checkpoint /tmp/apps/iot_trucking_app_joined_abnormal_events/checkpoints \
--geo-events-topic truck_events_stream \
--speed-events-topic truck_speed_events_stream \
--query-status-topic trucking_app_query_progress \
--output-topic trucking_app_result

While running you can query from Kafka topic you provided to query-status-topic to analyze the progress of query. You can refer below link to see couple of queries which you might feel useful for operational view. https://gist.github.com/HeartSaVioR/9d53b39052d4779a4c77e71ff7e989a3

About

IoT Trucking App with Spark Structured Streaming

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages