Note: Run below commands from the directory where docker-compose.yml file is present.
docker-compose up -d
docker-compose stop
docker-compose start
docker-compose rm -f
There are two modes to run the Producer and Consumer routines:
- Single topic mode
- Two-topic batch mode
Single topic run is the simple mode where the producer pushes the data into Kafka to one Topic ("aminer1"). While in 2-topic mode, the producer pushes the data alternately, per the batch size set, to two topics ("aminer0" and "aminer1").
Regardless of the run mode, first you must spin up the containers. A. Load docker Images from docker-compose file
docker-compose up
or
docker-compose up -d
B.i Producer Code: ( make sure file exists: project\kafka\data\aminer_papers_0.txt)
cd project\kafka
python producer.py
C.i Consumer Code: Just before running the consumer, run the producer, so that messages are published to Kafka Queue
- Simple Consumer Test: Connect to Spark Master docker and run
python /opt/spark/code/consumer.py
- Spark Streaming Consumer:
docker exec spark-master bin/spark-submit --verbose --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.1 --master spark://spark-master:7077 /opt/spark/code/consumerSpark.py
For 2-topic run mode, you must copy the producer_batch.py in the batch_mode folder to the kafka folder. You also need to copy the consumerSpark.py and consumerSpark2.py in the batch_mode folder to the spark/code folder.
B.ii Producer Code: ( make sure file exists: project\kafka\data\aminer_papers_0.txt)
cd project\kafka
python producer_batch.py
C.ii Consumer Code: Just before running the consumer, run the producer, so that messages are published to Kafka Queue.
-
Open up two separate terminal shells. Now, in the terminal, go to the /spark/code folder.
-
Run Spark Streaming Consumer 1 in one of the terminal:
docker exec spark-master bin/spark-submit --verbose --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.1 --master spark://spark-master:7077 --executor-memory 1g --num-executors 2 --executor-cores 1 --total-executor-cores 2 /opt/spark/code/consumerSpark.py
- Run Spark Streaming Consumer 2 in the other terminal:
docker exec spark-master bin/spark-submit --verbose --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.1 --master spark://spark-master:7077 --executor-memory 1g --num-executors 2 --executor-cores 1 --total-executor-cores 2 /opt/spark/code/consumerSpark2.py
D. Visualization:
1. Run local http server
```python
cd project\guide
python http-server.py
```
This will be running against localhost:8081 port pointing to guide folder
(Check) Try to navigate http://localhost:18001/AMiner.html
2. Connect to Neo4j browser using http://localhost:7474/browser with username: neo4j and password: password
This will load the above AMiner.html tutorial page by default after connecting
OR
run this code in the query window
```
play: http://localhost:18001/AMiner.html
```
Notes: If you see that above port is being used and not able to launch above url, then you can change the port in project\guide\http-server.py and launch this from neo4j browser with above command ( play: http://localhost:/AMiner.html ). If you want it automatic launch then you need to update docker\db\config\neo4j.conf and restart the container.
Happy Learning Kafka ( Producer, Consumer), Spark-Streaming, Neo4j and binding docker images enables scaling for distributed processing