This repository offers everything required to deploy a distributed data processing environment on Kubernetes over 4 virtual machines.
It has been developed in the context of a project in the Big Data Ecosystem course of ECE Students Luka, Cléa & Mathias.
It aims to deploy Trino in Kubernetes, while relying on object storage with MinIO.
- Adjust the machine's resources in the
Vagrantfile
to match your own host computer's limitations. (A worker node ideally needs 6 GB of RAM, and MinIO requires at least 4 hard disks to function) - Clone this repository
- Install Vagrant, VirtualBox, Ansible on your host machine
- Run
vagrant up
in the root of the repository
We used a Virtual Machine hosted in a friend's server, running Ubuntu 22.04. The specs are as follows:
- CPU: 16 cores
- RAM: 94 GB
- Storage: 256 GB
We have previously attempted to run this project in a Linux Container (LXC) with the same specs, but the latter supported neither VirtualBox nor KVM/QEMU. We thus switched to a traditional VM, which supports VirtualBox.
We have created a dedicated projet report for our evaluation at ECE.
We have documented all of our attempts for each part of the project, which we have divided and documented as follows:
- Creating & provisioning VMs with Vagrant
- Deploying Kubernetes with kubeadm, kubelet, kubectl
- Deploying basic Kubernetes services (CNI)
- Deploying MinIO to provide object storage
- Deploying Trino to perform distributed computation
Name | |
---|---|
Luka BIGOT | [email protected] |
Cléa DEDUIT | [email protected] |
Mathias SERICOLA | [email protected] |