This project demonstrates a real-time data ingestion and processing pipeline designed to handle a high-velocity stream of vehicle telemetry data. It simulates a fleet of vehicles sending data every second, which is then ingested, processed, and stored for real-time analysis and dashboarding.
This project is an excellent showcase of modern data engineering principles, using a microservices architecture, asynchronous messaging, in-memory caching, and containerization.
The system follows a classic producer-consumer pattern, decoupled by a message broker.
- A Spring Boot application that simulates multiple vehicles.
- Generates mock telemetry data (vehicle ID, location, speed, fuel level) and sends it to a Kafka topic at a high frequency.
- Acts as the central nervous system of our pipeline.
- Ingests the high-throughput data stream from the producer, providing durability and back-pressure capabilities.
- Another Spring Boot application that subscribes to the Kafka topic.
- Consumes the telemetry messages, processes them, and stores the latest status for each vehicle in a Redis cache.
- An in-memory data store used to maintain the real-time state of each vehicle.
- Ideal for powering a live dashboard that needs up-to-the-second vehicle information without querying a slower, disk-based database.
- The entire infrastructure (Producer, Consumer, Kafka, Zookeeper, Redis) is containerized.
- Allows for a consistent, isolated, and easily reproducible development environment.
- Backend: Java 17, Spring Boot 3
- Messaging: Apache Kafka
- Cache/In-Memory DB: Redis
- Containerization: Docker, Docker Compose
- Build Tool: Gradle
- Java 17 or later
- Gradle
- Docker and Docker Compose
- Clone the repository:
git clone <repository-url>
cd real-time-data-pipeline
- Build the Spring Boot applications:
Navigate to the telemetry-producer
and telemetry-consumer
directories and run:
gradle build -x test
This will create the executable JAR files.
- Launch the entire stack using Docker Compose:
From the root directory of the project, run:
docker-compose up --build -d
This command will:
- Build the Docker images for the producer and consumer services.
- Start containers for Kafka, Zookeeper, Redis, and our two applications.
- Verify the pipeline is working:
- Check the producer logs to see data being sent:
docker-compose logs -f producer
- Check the consumer logs to see data being received:
docker-compose logs -f consumer
- Connect to Redis to see the stored data:
docker-compose exec redis redis-cli
Inside redis-cli
, check for vehicle data:
KEYS vehicle:*
GET vehicle:1
- Shut down the application:
docker-compose down