- Introduction
- Configuration used for this tutorial
- Install Kubernetes using Kubespray
- Still missing in your cluster
- Install JupyterHub
This tutorial is about running a JupyterHub instance on a Kubernetes cluster deployed on bare metal.
For this purpose and after several attemps with Minikube and kubeadm, with and without VM, I choosed Kubespray using Ansible to deploy Kubernetes. It offers the performance of a bare metal cluster but also scalability and production-ready type of cluster.
- Hardware:
- CPU: 2 preferable (no check)
- RAM: 1024MB/1500MB minimum for worker/master nodes enforced in Kubespray (configurable)
- O/S: Ubuntu 19.10 Eoan
- Kubespray: 2.12.5
- Python: 3.7
- Helm: 3.1.2
Note that Ubuntu 19.10 Eoan is not a Kubespray supported linux distribution. It requires a patch described here.
Please follow these steps to fulfill the Kubespray requirements.
It's always a good pratice to start with a system update, especially before installing new packages.
sudo apt-get update && \
sudo apt-get upgrade
Do this on your localhost (used to run Kubespray). Kubespray will take care of system updates on the declared nodes.
- Install SSH server
If a node does not have SSH a server installed by default, you have to install it to remotely connect this machine.
Ubuntu server
O/Ss already have SSH a server installed.
sudo apt-get install openssh-server
- Create SSH key pair
You have to generate one or multiple SSH key pair(s) to allow Kubespray/Ansible automatic login using SSH. You can use a different key pair for each node or use the same for all nodes.
ssh-keygen -b 2048 -t rsa -f /home/<local-user>/.ssh/id_rsa -q -N ""
- Copy your public key(s) on nodes
Copy your public key(s) in the ~/.ssh/authorized_keys file of the user accounts you will use on each node for deployment. You will be prompted twice for the password corresponding to account, the first time for the public key upload using SSH and the second time for adding the public key in the authorized keys file.
for ip in <node1-ip> <node2-ip> ...; do
scp /home/<local-user>/.ssh/id_rsa.pub <node-user>@$ip:/home/<node-user>/.ssh
ssh <node-user>@$ip "cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys"
done
You will never be prompted again for password using SSH, the key will be used to authenticate you!
Kubespray requires to turn on IPv4 forwarding. This should be done automatically by Kubepsray.
To do it manually, run the following command:
for ip in <node1-ip> <node2-ip> ...; do
ssh <node-user>@$ip "echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward"
done
Turning swap off is required by Kubernetes. See this issue for more information.
for ip in <node1-ip> <node2-ip> ...; do
ssh <node-user>@$ip "sudo swapoff -a && sudo sed -i '/ swap / s/^/#/' /etc/fstab"
done
This step can also be done using the
prepare-cluster.yaml
playbook available in this repo
Start by installing curl.
sudo apt-get install curl
Get the lastest Kubespray source code from its repo.
The latest release when writing this tutorial, v2.12.5, throws error not encountered in the master version.
It is probably due to not supported Ubuntu 19.10 and will be fixed in 20.04!
mkdir -p ~/projects/ && \
curl -LJO https://github.com/kubernetes-sigs/kubespray/archive/master.zip && \
unzip kubespray-master.zip -d kubespray && \
rm kubespray-master.zip && \
cd kubespray
Kubespray uses Python 3 and several dependencies to be installed.
- Install Python 3
Install Python 3 but also pip (package installer for Python) and venv to create virtual environnements (see below).
sudo apt-get install python3.7 python3-pip python3-venv
- Create a virtual env
This is a best isolation pratice using Python to use virtual env (or conda env for conda users).
python3 -m venv ~/projects/kubespray-venv
source ~/projects/kubespray-venv/bin/activate
- Install Kubespray dependencies
pip install -r requirements.txt
Start creating a copy of the default settings from sample cluster.
cp -rfp inventory/sample inventory/mycluster
Be sure you are still in the ~/projects/kubespray/ directory before executing this command!
Then customize your new cluster
- Update Ansible inventory file with inventory builder
declare -a IPS=(<node1-ip> <node2-ip> ...)
CONFIG_FILE=inventory/mycluster/hosts.yaml python contrib/inventory_builder/inventory.py ${IPS[@]}
- (optional) Rename your nodes or deactivate hostname renaming
If you skip this step, your cluster hostnames will be renamed node1, node2, etc.
You can either edit the file ~/projects/kubespray/inventory/mycluster/hosts.yaml
sed -e 's/node1/tower/g' -e 's/node2/laptop/g' ... -i inventory/mycluster/hosts.yaml
OR
keep the current hostnames
echo "override_system_hostname: false" >> inventory/mycluster/group_vars/all/all.yml
- Set Docker version to 19.03
The 18.09 version of Docker seems to be not available in apt sources, prefer the 19.03.
echo "docker_version: 19.03" >> inventory/mycluster/group_vars/all/docker.yml
- Set resolv.conf
There is more than one resolv.conf file on your Ubuntu 18+ O/S, use the right one!
A fix for Ubuntu 18.* has been merged in Kubespray, but it does not apply on the not supported 19.* versions.
echo 'kube_resolv_conf: "/run/systemd/resolve/resolv.conf"' >> inventory/mycluster/group_vars/all/all.yml
- Check localhost vs nodes usernames
If your localhost username differ from a node username (the one that owns your SSH public key), you must specify it to Ansible by editing (manually) the hosts.yaml file.
Example:
localhost username | node1 username |
---|---|
foo | bar |
> cat inventory/mycluster/hosts.yaml
all:
hosts:
node1:
ansible_ssh_user: bar
If you do not have turned on IPv4 and turned off swap manually, you can use:
curl -LJO https://raw.githubusercontent.com/adriendelsalle/jhub-on-kubespray/master/kubespray/prepare-cluster.yaml
ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root prepare-cluster.yaml
It's time to deploy Kubernetes by running the Ansible playbook command.
ansible-playbook -i inventory/mycluster/hosts.yaml --become --become-user=root cluster.yml
The cluster is created but you currently have no access to its API for configuration purpose.
kubectl
has been installed by Kubespray on master nodes of your cluster and configuration files saved in root home directory.
If you want to access the cluster API from another computer on your network, install kubectl
first.
curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl
In all cases, start by copying configuration files from root home directory to your user account used to deploy kubernetes.
Remember, it owns your SSH public key!
ssh <node-user>@<master-node-ip> "sudo cp -R /root/.kube ~ && sudo chown -R <node-user>:<node-user> ~/.kube"
If you plan to handle the API from another computer, download those files and update ownership.
scp -r <node-user>@<master-node-ip>:~/.kube ~
sudo chown -R <local-user>:<local-user> ~/.kube
ssh <node-user>@<master-node-ip> "rm -r ~/.kube"
Remove the configuration files from master node user to keep secrets protected
For sanity, use autocompletion!
echo 'source <(kubectl completion bash)' >>~/.bashrc
JupyterHub will expose a Service
exposed with the LoadBalancer
type. On a bare metal cluster, you don't have a load balancer since it's usually part of your cloud provider infrastructure.
For more details, refer to the official documentation.
Fortunately, MetalLB is a open-source implementation of a load balancer for bare metal deployments!
- Install MetalLB
Follow the official installation guide.
kubectl edit configmap -n kube-system kube-proxy
and set:
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
strictARP: true
Then apply the manifests:
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
# On first install only
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
- Set MetalLB configuration
To allow the load balancer to distribute external IPs, you must specify in its configuration what is the IP chunk allocated for it.
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: default
protocol: layer2
addresses:
- <your-ip-range>
EOF
Don't forget to set your-ip-range to the ip chunk you want to use!
That's it!
Deployments usually require storage in order to persist data since pods are designed to be ephemerals.
Kubernetes introduced several concepts around this:
- Persistant Volume
PV
: a declaration of an available volume - Persistant Volume Claim
PVC
: a claim for Persistent Volume - etc.
For more detailed information, please refer to the official Kubernetes documentation about storage that covers volumes/PV/PVC/provisioning/etc.
In this tutorial, you will use a nfs
volume type for its simplicity, accessiblity between nodes and capability to be dynamically provisioned.
Based on the Vitux tutorial.
- Install NFS server
Just pick a machine on the same network as your cluster nodes (it can be one of them), and run:
sudo apt-get install -y nfs-kernel-server
- Choose export directory
Choose or create a directory and share it:
sudo mkdir -p /mnt/my-nfs
As we want all clients to have access, change its permissions
sudo chown nobody:nogroup /mnt/my-nfs
sudo chmod 777 /mnt/my-nfs
- Do the NFS export
echo "/mnt/my-nfs <subnetIP/24>(rw,sync,no_subtree_check)" | sudo tee /etc/exports
sudo exportfs -a
sudo systemctl restart nfs-kernel-server
You have to replace
subnetIP/24
by a correct CIDR.
Based on the Yolanda tutorial
We will use [external-storage](https://github.com/kubernetes-incubator/external-storage] template.
- Set authorizations
kubectl apply -f https://raw.githubusercontent.com/kubernetes-incubator/external-storage/master/nfs-client/deploy/rbac.yaml
- Set
StorageClass
cat << EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-storage
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: nfs-provisioner
parameters:
archiveOnDelete: "false"
EOF
We declare the StorageClass
as default one to automatically be selected by PVCs.
- Set provisioner
cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: quay.io/external_storage/nfs-client-provisioner:latest
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: nfs-provisioner
- name: NFS_SERVER
value: <nfs-server-ip>
- name: NFS_PATH
value: /mnt/my-nfs
volumes:
- name: nfs-client-root
nfs:
server: <nfs-server-ip>
path: /mnt/my-nfs
EOF
Don't forget to set nfs-server-ip to your nfs server ip!
- Check everything is OK
kubectl get deployments.apps,pods,sc -n default
You should see the deployment of the Provisioner
, the corresponding Pod
and also the StorageClass
as default one.
Just run:
sudo snap install helm --classic
You can now follow the zero-to-jupyterhub tutorial
- Add Helm repo
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
- Create your configuration file
cat << EOF > jhub-config.yaml
proxy:
secretToken: "<RANDOM_HEX>"
EOF
sed -i "s/<RANDOM_HEX>/$(openssl rand -hex 32)/g" jhub-config.yaml
If you don't implement the StorageClass
and provisioner part of this tutorial, you have to modify your configuration file to store information in-memory. In that case you will lose all your data in case of cluster reboot, etc.
From JHub doc:
Use an in-memory sqlite database. This should only be used for testing, since the database is erased whenever the hub pod restarts - causing the hub to lose all memory of users who had logged in before.
When using this for testing, make sure you delete all other objects that the hub has created (such as user pods, user PVCs, etc) every time the hub restarts. Otherwise you might run into errors about duplicate resources.
cat << EOF >> config.yaml
hub:
db:
type: sqlite-memory
singleuser:
storage:
type: sqlite-memory
EOF
- Deploy JupyterHub
RELEASE=jhub
NAMESPACE=jhub
kubectl create namespace $NAMESPACE
helm upgrade --install $RELEASE jupyterhub/jupyterhub \
--namespace $NAMESPACE \
--version=0.9.0 \
--values config.yaml
Don't forget that the Helm chart version differ from JupyterHub version! See the jupyterhub/helm-chart repo.
Here we are, I hope this tutorial was helpful! Do not hesitate to make PR and have a great moment on JHub.