Change Data Capture (CDC)

Change data capture for postgres and mssql database using debezium and streaming data processing.

Change Data Capture (CDC) is a technique used to identify and track changes made to data within a database. It provides a mechanism to capture only the modified data, rather than the entire dataset, resulting in significant efficiency gains and reduced data transfer volumes.

Workflow diagram

Benefits

Incremental Updates
Real-time Processing
Reduced Data Transfer
Audit and Logging
Event-Driven Architecture:

Tools Used

Background

Whenever any services writes data to any of the CDC enabled tables on these databases (postgres or msslq), the CDC framework debezium picks up the changes and stream the CDC data to kafka stream respective topics. When we enable CDC with debezium it will automatically creates topic per table and the CDC is streamed to respective topic.

We have flink cluster, which reads the data from the respective kafka streams and do processing with the data. Once the processing is completed for the data, it either send the data to another kafka stream or dum to elasticsearch sink. Flink have many sinks supported.

Requirements

Name	Version
docker	>= 20
docker-compose	>= 2
virtualenv	>= 20

Getting started

NOTE: These steps should be followed in order.

Use docker compose command to start mssql and postgres databases.
Once the databases are up, test tables orders,shipments in postgres and products in mssql are automatically created.
Use docker compose to start the kafka cluster, once the kafka cluster is ready use docker compose to up and start debezium cluster.
At the same time start kafka control center and schema registry of kafka.
When the debezium cluster is ready we need to start the data producers.
Use docker compose to start load_products first and once it is completed start load_orders. The orders producer will auto generate fake order and shipment every 10 seconds and store it in the postgres database tables orders and shipments respectively.
Start the flink cluster, it can be done using the docker compose as well. At the same time star the elastic search cluster also.
Docker exec the flink jobmanger container.
Navigate to /flink_jobs dir.
Run flink run -py main.py -d command to submit the flink job.

Dashboards

Debezium [http://localhost:8081]
Kafka Control Center [http://localhost:9021]
Flink [http://localhost:8086]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
data_generators		data_generators
debezium		debezium
flink-conf		flink-conf
flink		flink
images		images
mssql_config		mssql_config
practices		practices
psql_config		psql_config
test_work		test_work
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
docker-compose.yml.example		docker-compose.yml.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Change Data Capture (CDC)

Workflow diagram

Benefits

Tools Used

Background

Requirements

Getting started

Dashboards

About

Releases

Packages

Languages

Kushalkhadka7/streaming-cdc

Folders and files

Latest commit

History

Repository files navigation

Change Data Capture (CDC)

Workflow diagram

Benefits

Tools Used

Background

Requirements

Getting started

Dashboards

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages