This project demonstrates the integration of Apache Flink with Kafka for real-time data processing, using SingleStore as the target database. The setup consists of two key components, each built as Maven projects and containerized using Docker.
- Kafka Producer: A service responsible for generating simulated data and pushing it to a Kafka topic.
- Flink Processor: A service that connects to Kafka, consumes the data, processes it using Flink’s stream processing capabilities, and stores the results into SingleStore using JDBC.
├── kafka-producer/ # Maven project for generating and producing simulation data to Kafka
│ └── Dockerfile # Dockerfile for Kafka producer
├── flink-processor/ # Maven project for consuming and processing Kafka data using Flink
│ └── Dockerfile # Dockerfile for Flink processor
├── docker-compose.yml # Docker Compose setup for Kafka, Zookeeper, producer, and processor
└── README.md # This file
Ensure you have the following installed:
- Java
- Maven
- Docker
- Docker Compose
Before running the Docker Compose setup, the Kafka producer and Flink processor Maven projects need to be packaged and built into Docker images. Follow the steps below:
The Flink processor uses JDBC to store processed data into a SingleStore database. Modify the database credentials in the Flink processor’s code (flink-processor/src/main/java/Main.java):
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder()
.withUrl("jdbc:singlestore://<<hostname>>:<<port>>/<<database>>")
.withDriverName("com.singlestore.jdbc.Driver")
.withUsername("<<username>>")
.withPassword("<<password>>")
.build();
Replace <<hostname>>
, <<port>>
, <<database>>
, <<username>>
, and <<password>>
with your SingleStore instance's details.
- Execute create_table.sql
- Navigate to the
kafka-producer/
directory:cd kafka-producer/
- Clean and package the Maven project:
mvn clean package
- Build the Docker image for the Kafka producer:
docker build -t kafka-producer .
- Navigate back to the project root directory.
- Navigate to the
flink-processor/
directory:cd flink-processor/
- Clean and package the Maven project:
mvn clean package
- Build the Docker image for the Flink processor:
docker build -t flink-processor .
Once both projects are packaged and their Docker images are built, you can start the whole system using Docker Compose.
- Navigate back to the project root directory.
- Run the following command to start the services:
docker compose up
- Zookeeper: Manages Kafka cluster coordination.
- Kafka: The distributed streaming platform where the producer sends data and Flink reads data.
- Kafka Producer: Generates simulated data and sends it to Kafka at regular intervals.
- Flink Processor: Consumes data from Kafka, processes it, and stores the output in SingleStore.
FROM openjdk:8u151-jdk-alpine3.7
# Install Bash
RUN apk add --no-cache bash libc6-compat
# Copy resources
WORKDIR /
COPY wait-for-it.sh wait-for-it.sh
COPY target/kafka2singlestore-1.0-SNAPSHOT-jar-with-dependencies.jar k-producer.jar
# Wait for Zookeeper and Kafka to be available and run application
CMD ./wait-for-it.sh -s -t 30 $ZOOKEEPER_SERVER -- ./wait-for-it.sh -s -t 30 $KAFKA_SERVER -- java -Xmx512m -jar k-producer.jar
FROM openjdk:8u151-jdk-alpine3.7
# Install Bash
RUN apk add --no-cache bash libc6-compat
# Copy resources
WORKDIR /
COPY wait-for-it.sh wait-for-it.sh
COPY target/flink-kafka2postgres-1.0-SNAPSHOT-jar-with-dependencies.jar flink-processor.jar
# Wait for Zookeeper and Kafka to be available and run application
CMD ./wait-for-it.sh -s -t 30 $ZOOKEEPER_SERVER -- ./wait-for-it.sh -s -t 30 $KAFKA_SERVER -- java -Xmx512m -jar flink-processor.jar
- Kafka Producer Interval: You can adjust the data generation interval by modifying the
PRODUCER_INTERVAL
in thedocker-compose.yml
file. - SingleStore Credentials: Ensure that the database credentials in the Flink processor are updated with valid connection details.
Feel free to fork the project and create pull requests to improve the demo!
For questions or support, reach out via the project repository.