Skip to content

Tutorial for data pipeline: Apache Kafka -> MongoDB -> R

Notifications You must be signed in to change notification settings

pneff93/Kafka-MongoDB-R

Repository files navigation

Kafka-MongoDB-R

LinkedIn

This small tutorial creates a data pipeline from Apache Kafka over MongoDB into R. It focuses on simplicity and can be seen as a baseline for similar projects. You can read more about it in my blog article: Create a Data Analysis Pipeline with Apache Kafka and RStudio.

Prerequisites

Set up

docker-compose up -d

It starts:

  • Zookeeper
  • Kafka Broker
  • Kafka Producer
    • built docker image executing fat JAR
  • Kafka Connect
  • MongoDB
  • RStudio

Kafka Producer

The Kafka Producer produces fake events of a driving truck into the topic truck-topic in JSON format every two seconds. Verify that data is produced correctly:

docker-compose exec broker bash
kafka-console-consumer --bootstrap-server broker:9092 --topic truck-topic

Kafka Connect

We use Kafka Connect to transfer the data from Kafka to MongoDB. Verify that the MongoDB Source and Sink Connector is added to Kafka Connect correctly:

curl -s -XGET http://localhost:8083/connector-plugins | jq '.[].class'

Start the connector:

curl -X POST -H "Content-Type: application/json" --data @MongoDBConnector.json http://localhost:8083/connectors | jq

Verify that the connector is up and running:

curl localhost:8083/connectors/TestData/status | jq

MongoDB

Start MongoDB Compass and create a new connection with:

username: user
password: password
authentication database: admin
or
URI: mongodb://user:password@localhost:27017/admin

You should see a database TruckData with a collection truck_1 having data stored.

RStudio

Open RStudio via:

localhost:8787

The username is user and password password.

Under /home you can run GetData.R. It connects to MongoDB using the package mongolite and requests the data.

Sources

About

Tutorial for data pipeline: Apache Kafka -> MongoDB -> R

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published