Thesis(text mostly in greek with abstract in english): link
- Go
- Docker
- Docker Compose
- kind - Kubernetes in Docker
- Install the project's requirements, preferably in a Python virtual env:
pip install -r requirements.txt
- Create a swarm cluster and clone the repo to the master node
- Create a Kubernetes cluster and clone the repo to the master node
- Install the project's requirements in each cluster, preferably in a Python virtual env:
pip install -r requirements_production.txt
Folder/File | Description |
---|---|
/serrano_app/docker-compose.yaml | environment/KAFKA_ADVERTISED_LISTENERS |
/serrano_app/flask-start.sh | flask run |
/serrano_app/config/ | kafka_cfg/kubeconfig_yaml |
/kubernetes_driver_app/config/ | kafka_cfg/bootstrap.servers |
/kubernetes_driver_app/config/ | kafka_cfg/kubeconfig_yaml |
/kubernetes_driver_app/config/ | kafka_cfg/broker |
/swarm_driver_app/config/ | kafka_cfg/bootstrap.servers |
/swarm_driver_app/config/ | kafka_cfg/broker |
-
Run Docker Compose on a terminal:
cd serrano_app/ docker-compose up -d
-
Check that everything is up and running:
docker-compose ps
-
Establish the MongoDB Sink connection
-
To start the serrano_app, from the /serrano_app folder:
- Start the Faust agents for the Dispatcher, Resource Optimization Toolkit and Orchestrator components:
<python3_path> ./src/__main__.py worker --loglevel=INFO
- Start the Flask app to receive requests for an app deployment or termination:
./flask-start.sh
- Start the Faust agents for the Dispatcher, Resource Optimization Toolkit and Orchestrator components:
-
Create an application request:
- Deployment
<python3_path> ./deploy.py -yamlSpec <absolute path to a YAML file> -env <local or prod>
- Termination
<python3_path> ./remove.py -requestUUID <requestUUID> -env <local or prod>
- Deployment
-
Activate the Faust agent of the swarm or the Kubernetes driver to process the request:
- For the swarm driver, connect to the manager node and from the swarm_driver_app folder:
<python3_path> ./src/__main__.py worker --loglevel=INFO
- For the Kubernetes driver, connect to the manager node and from the kubernetes_driver_app folder:
<python3_path> ./src/__main__.py worker --loglevel=INFO
- For the swarm driver, connect to the manager node and from the swarm_driver_app folder:
-
To check that data is published to the respective topics:
kafkacat -b localhost:9092 -t dispatcher
kafkacat -b localhost:9092 -t resource_optimization_toolkit
kafkacat -b localhost:9092 -t orchestrator
kafkacat -b localhost:9092 -t swarm
kafkacat -b localhost:9092 -t kubernetes
-
Check created namespace, services and deployments in Kubernetes:
kubectl get namespaces kubectl get services kubectl get deployments
-
Check stack creation in the swarm:
docker stack ls
Folder/File | Description |
---|---|
/serrano_app | implementation of the Dispatcher, Resource Optimization Toolkit and Orchestrator components |
/swarm_driver_app | implementation of the swarm driver component |
/kubernetes_driver_app | implementation of the Kubernetes driver component |
flask-start.sh | script to run the flask app (mod +x) |
src/main.py | faust application entrypoint |
src/faust_app.py | creates an instance of the Faust for stream processing |
src/flask_app.py | creates an instance of the Flask |
models.py | Faust models to describe the data in the streams |
helpers.py | helping functions |
agents.py | Faust async stream processors of the Kafka topics |
swarm_driver.py | SwarmDriver class to connect and interact with a swarm |
kubernetes_driver.py | K8sDriver class to connect and interact with a Kubernetes cluster |
In this example, the MongoDB is in MongoDB Atlas.
curl -X PUT http://localhost:8083/connectors/sink-mongodb-users/config -H "Content-Type: application/json" -d ' {
"connector.class":"com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max":"1",
"topics":"db_consumer",
"connection.uri":"mongodb://athina_kyriakou:[email protected]:27017,ntua-thesis-cluster-shard-00-01.xcgej.mongodb.net:27017,ntua-thesis-cluster-shard-00-02.xcgej.mongodb.net:27017/myFirstDatabase?ssl=true&replicaSet=atlas-xirr44-shard-0&authSource=admin&retryWrites=true&w=majority",
"database":"thesisdb",
"collection":"db_consumer",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable":false,
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable":false,
"document.id.strategy":"com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"document.id.strategy.partial.value.projection.list":"requestUUID",
"document.id.strategy.partial.value.projection.type":"AllowList",
"writemodel.strategy":"com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneBusinessKeyStrategy"
}'
Detailed info here
Create a topic:
docker-compose exec broker kafka-topics \
--create \
--bootstrap-server localhost:9092 \
--replication-factor 1 \
--partitions 1 \
--topic kdriver
CLI used for debugging & testing. Documentation
Check a topic's content:
kafkacat -b localhost:9092 -t <topic_name>
Write to the orchestrator topic:
kafkacat -b localhost:9092 -t orchestrator -P
{"requestUUID": XX, "action": XXX, "yamlSpec":XXXX}
CLI Documentation
Get created stacks:
docker stack ls
Delete a stack:
docker stack rm <stack name>
Check nodes' information (engine version, hostname, status, availability, manager status):
docker node ls
Check the labels of a node (and other info):
docker node inspect <node hostname> --pretty
Add a label to a node:
docker node update --label-add <label name>=<value>
Check the node that a service is running on:
docker service ps <service name>
Create a swarm stack based on a YAML file:
docker stack deploy --compose-file <yaml file path> <stack name>
Create a cluster:
kind create cluster
kind get clusters (dflt name for created clusters: kind)
To have terminal interaction with the created Kubernetes objects, mainly for debugging, install kubectl.
Get deployments in a namespace:
kubectl get deployments --namespace=<name>
Check in which nodes a namespace's pods are runnning:
kubectl get pod -o wide --namespace=<name>
Show the nodes' labels:
kubectl get nodes --show-labels
Add a label to a node:
kubectl label nodes <node name> <label name>=<value>
Create Kubernetes resources based on a YAML file:
kubectl apply -f <yaml file path>
- Kafka CheatSheet
- Getting Started with Apache Kafka in Python
- Basic stream processing using Kafka and Faust
- Kafka: Consumer API vs Streams API
- Kubectl Cheat Sheet
- How to Access a Kubernetes Cluster
- Cluster Access Configuration
- Configure kubectl to Access Remote Kubernetes Cluster
- Kubeconfig tips with Kubectl
- The App - Define your Faust project: Spot the suggested structure of medium/large projects here
- Example of medium project structure implementation
- Models, Serialization, and Codecs
- Stream processing with Python Faust: Part II – Streaming pipeline
- MongoDB Kafka Connector
- Create a Database in MongoDB Using the CLI with MongoDB Atlas: here & here
- MongoDB Write Model Strategies
- Kafka Connector used strategy for upsert: ReplaceOneBusinessKeyStrategy
- DeleteOne write model strategy