Skip to content

A tutorial for a JupyterHub instance on a kube cluster deployed with kubespray

Notifications You must be signed in to change notification settings

adriendelsalle/jhub-on-kubespray

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Table of Contents

  1. Introduction
  2. Configuration used for this tutorial
  3. Install Kubernetes using Kubespray
    1. System update
    2. SSH access
    3. IPv4 forwarding
    4. Turn off swap
    5. Get Kubespray
    6. Install Kubespray requirements
    7. Create a new cluster configuration
    8. Deploy your cluster!
    9. Access your cluster API
  4. Still missing in your cluster
    1. Set a LoadBalancer
    2. Set a StorageClass and a provisioner
  5. Install JupyterHub
    1. Install Helm
    2. Deploy JupyterHub from Helm chart

Introduction

This tutorial is about running a JupyterHub instance on a Kubernetes cluster deployed on bare metal.

For this purpose and after several attemps with Minikube and kubeadm, with and without VM, I choosed Kubespray using Ansible to deploy Kubernetes. It offers the performance of a bare metal cluster but also scalability and production-ready type of cluster.

[Top]


Configuration

  • Hardware:
    • CPU: 2 preferable (no check)
    • RAM: 1024MB/1500MB minimum for worker/master nodes enforced in Kubespray (configurable)
  • O/S: Ubuntu 19.10 Eoan
  • Kubespray: 2.12.5
  • Python: 3.7
  • Helm: 3.1.2

Note that Ubuntu 19.10 Eoan is not a Kubespray supported linux distribution. It requires a patch described here.

[Top]


Install Kubernetes using Kubespray

Please follow these steps to fulfill the Kubespray requirements.

System update

It's always a good pratice to start with a system update, especially before installing new packages.

sudo apt-get update && \
sudo apt-get upgrade

Do this on your localhost (used to run Kubespray). Kubespray will take care of system updates on the declared nodes.

[Top]

SSH access

  • Install SSH server

If a node does not have SSH a server installed by default, you have to install it to remotely connect this machine. Ubuntu server O/Ss already have SSH a server installed.

sudo apt-get install openssh-server
  • Create SSH key pair

You have to generate one or multiple SSH key pair(s) to allow Kubespray/Ansible automatic login using SSH. You can use a different key pair for each node or use the same for all nodes.

ssh-keygen -b 2048 -t rsa -f /home/<local-user>/.ssh/id_rsa -q -N ""
  • Copy your public key(s) on nodes

Copy your public key(s) in the ~/.ssh/authorized_keys file of the user accounts you will use on each node for deployment. You will be prompted twice for the password corresponding to account, the first time for the public key upload using SSH and the second time for adding the public key in the authorized keys file.

for ip in <node1-ip> <node2-ip> ...; do
   scp /home/<local-user>/.ssh/id_rsa.pub <node-user>@$ip:/home/<node-user>/.ssh
   ssh <node-user>@$ip "cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys"
done

You will never be prompted again for password using SSH, the key will be used to authenticate you!

[Top]

IPv4 forwarding

Kubespray requires to turn on IPv4 forwarding. This should be done automatically by Kubepsray.

To do it manually, run the following command:

for ip in <node1-ip> <node2-ip> ...; do
   ssh <node-user>@$ip "echo 1 | sudo tee /proc/sys/net/ipv4/ip_forward"
done

[Top]

Turn off swap

Turning swap off is required by Kubernetes. See this issue for more information.

for ip in <node1-ip> <node2-ip> ...; do
   ssh <node-user>@$ip "sudo swapoff -a && sudo sed -i '/ swap / s/^/#/' /etc/fstab"
done

This step can also be done using the prepare-cluster.yaml playbook available in this repo

[Top]

Get Kubespray

Start by installing curl.

sudo apt-get install curl

Get the lastest Kubespray source code from its repo.

The latest release when writing this tutorial, v2.12.5, throws error not encountered in the master version.

It is probably due to not supported Ubuntu 19.10 and will be fixed in 20.04!

mkdir -p ~/projects/ && \
curl -LJO https://github.com/kubernetes-sigs/kubespray/archive/master.zip && \
unzip kubespray-master.zip -d kubespray && \
rm kubespray-master.zip && \
cd kubespray

[Top]

Install Kubespray requirements

Kubespray uses Python 3 and several dependencies to be installed.

  • Install Python 3

Install Python 3 but also pip (package installer for Python) and venv to create virtual environnements (see below).

sudo apt-get install python3.7 python3-pip python3-venv
  • Create a virtual env

This is a best isolation pratice using Python to use virtual env (or conda env for conda users).

python3 -m venv ~/projects/kubespray-venv
source ~/projects/kubespray-venv/bin/activate
  • Install Kubespray dependencies
pip install -r requirements.txt

[Top]

Create a new cluster configuration

Start creating a copy of the default settings from sample cluster.

cp -rfp inventory/sample inventory/mycluster

Be sure you are still in the ~/projects/kubespray/ directory before executing this command!

Then customize your new cluster

  • Update Ansible inventory file with inventory builder
declare -a IPS=(<node1-ip> <node2-ip> ...)
CONFIG_FILE=inventory/mycluster/hosts.yaml python contrib/inventory_builder/inventory.py ${IPS[@]}
  • (optional) Rename your nodes or deactivate hostname renaming

If you skip this step, your cluster hostnames will be renamed node1, node2, etc.

You can either edit the file ~/projects/kubespray/inventory/mycluster/hosts.yaml

sed -e 's/node1/tower/g' -e 's/node2/laptop/g' ... -i inventory/mycluster/hosts.yaml

OR

keep the current hostnames

echo "override_system_hostname: false" >>  inventory/mycluster/group_vars/all/all.yml
  • Set Docker version to 19.03

The 18.09 version of Docker seems to be not available in apt sources, prefer the 19.03.

echo "docker_version: 19.03"  >> inventory/mycluster/group_vars/all/docker.yml
  • Set resolv.conf

There is more than one resolv.conf file on your Ubuntu 18+ O/S, use the right one!

A fix for Ubuntu 18.* has been merged in Kubespray, but it does not apply on the not supported 19.* versions.

echo 'kube_resolv_conf: "/run/systemd/resolve/resolv.conf"' >> inventory/mycluster/group_vars/all/all.yml
  • Check localhost vs nodes usernames

If your localhost username differ from a node username (the one that owns your SSH public key), you must specify it to Ansible by editing (manually) the hosts.yaml file.

Example:

localhost username node1 username
foo bar
> cat inventory/mycluster/hosts.yaml
all:
  hosts:
    node1:
      ansible_ssh_user: bar

[Top]

Deploy your cluster!

If you do not have turned on IPv4 and turned off swap manually, you can use:

curl -LJO https://raw.githubusercontent.com/adriendelsalle/jhub-on-kubespray/master/kubespray/prepare-cluster.yaml
ansible-playbook -i inventory/mycluster/hosts.yaml  --become --become-user=root prepare-cluster.yaml

It's time to deploy Kubernetes by running the Ansible playbook command.

ansible-playbook -i inventory/mycluster/hosts.yaml  --become --become-user=root cluster.yml

[Top]

Access your cluster API

The cluster is created but you currently have no access to its API for configuration purpose.

kubectl has been installed by Kubespray on master nodes of your cluster and configuration files saved in root home directory.

If you want to access the cluster API from another computer on your network, install kubectl first.

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x ./kubectl
sudo mv ./kubectl /usr/local/bin/kubectl

In all cases, start by copying configuration files from root home directory to your user account used to deploy kubernetes.

Remember, it owns your SSH public key!

ssh <node-user>@<master-node-ip> "sudo cp -R /root/.kube ~ && sudo chown -R <node-user>:<node-user> ~/.kube" 

If you plan to handle the API from another computer, download those files and update ownership.

scp -r <node-user>@<master-node-ip>:~/.kube ~
sudo chown -R <local-user>:<local-user> ~/.kube
ssh <node-user>@<master-node-ip> "rm -r ~/.kube"

Remove the configuration files from master node user to keep secrets protected

For sanity, use autocompletion!

echo 'source <(kubectl completion bash)' >>~/.bashrc

[Top]


Still missing in your cluster

Set a LoadBalancer

JupyterHub will expose a Service exposed with the LoadBalancer type. On a bare metal cluster, you don't have a load balancer since it's usually part of your cloud provider infrastructure.

For more details, refer to the official documentation.

Fortunately, MetalLB is a open-source implementation of a load balancer for bare metal deployments!

  • Install MetalLB

Follow the official installation guide.

kubectl edit configmap -n kube-system kube-proxy

and set:

apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  strictARP: true

Then apply the manifests:

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.3/manifests/metallb.yaml
# On first install only
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
  • Set MetalLB configuration

To allow the load balancer to distribute external IPs, you must specify in its configuration what is the IP chunk allocated for it.

cat << EOF | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
  namespace: metallb-system
  name: config
data:
  config: |
    address-pools:
    - name: default
      protocol: layer2
      addresses:
      - <your-ip-range>
EOF

Don't forget to set your-ip-range to the ip chunk you want to use!

That's it!

[Top]

Set a StorageClass and a provisioner

Deployments usually require storage in order to persist data since pods are designed to be ephemerals.

Kubernetes introduced several concepts around this:

  • Persistant Volume PV: a declaration of an available volume
  • Persistant Volume Claim PVC: a claim for Persistent Volume
  • etc.

For more detailed information, please refer to the official Kubernetes documentation about storage that covers volumes/PV/PVC/provisioning/etc.

In this tutorial, you will use a nfs volume type for its simplicity, accessiblity between nodes and capability to be dynamically provisioned.

[Top]

Set up the NFS server

Based on the Vitux tutorial.

  • Install NFS server

Just pick a machine on the same network as your cluster nodes (it can be one of them), and run:

sudo apt-get install -y nfs-kernel-server
  • Choose export directory

Choose or create a directory and share it:

sudo mkdir -p /mnt/my-nfs

As we want all clients to have access, change its permissions

sudo chown nobody:nogroup /mnt/my-nfs
sudo chmod 777 /mnt/my-nfs
  • Do the NFS export
echo "/mnt/my-nfs <subnetIP/24>(rw,sync,no_subtree_check)" | sudo tee /etc/exports
sudo exportfs -a
sudo systemctl restart nfs-kernel-server

You have to replace subnetIP/24 by a correct CIDR.

[Top]

Define the StorageClass and the provisioner

Based on the Yolanda tutorial

We will use [external-storage](https://github.com/kubernetes-incubator/external-storage] template.

  • Set authorizations
kubectl apply -f https://raw.githubusercontent.com/kubernetes-incubator/external-storage/master/nfs-client/deploy/rbac.yaml
  • Set StorageClass
cat << EOF | kubectl apply -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: managed-nfs-storage
  annotations: 
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: nfs-provisioner
parameters:
  archiveOnDelete: "false"
EOF

We declare the StorageClass as default one to automatically be selected by PVCs.

  • Set provisioner
cat << EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-client-provisioner
  labels:
    app: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: default
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nfs-client-provisioner
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      serviceAccountName: nfs-client-provisioner
      containers:
        - name: nfs-client-provisioner
          image: quay.io/external_storage/nfs-client-provisioner:latest
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: nfs-provisioner
            - name: NFS_SERVER
              value: <nfs-server-ip>
            - name: NFS_PATH
              value: /mnt/my-nfs
      volumes:
        - name: nfs-client-root
          nfs:
            server: <nfs-server-ip>
            path: /mnt/my-nfs
EOF

Don't forget to set nfs-server-ip to your nfs server ip!

  • Check everything is OK
kubectl get deployments.apps,pods,sc -n default

You should see the deployment of the Provisioner, the corresponding Pod and also the StorageClass as default one.

[Top]


Install JupyterHub

Install Helm

Just run:

sudo snap install helm --classic

[Top]

Deploy JupyterHub from Helm chart

You can now follow the zero-to-jupyterhub tutorial

  • Add Helm repo
helm repo add jupyterhub https://jupyterhub.github.io/helm-chart/
helm repo update
  • Create your configuration file
cat << EOF > jhub-config.yaml
proxy:
  secretToken: "<RANDOM_HEX>"
EOF
sed -i "s/<RANDOM_HEX>/$(openssl rand -hex 32)/g" jhub-config.yaml

If you don't implement the StorageClass and provisioner part of this tutorial, you have to modify your configuration file to store information in-memory. In that case you will lose all your data in case of cluster reboot, etc.

From JHub doc:

Use an in-memory sqlite database. This should only be used for testing, since the database is erased whenever the hub pod restarts - causing the hub to lose all memory of users who had logged in before.

When using this for testing, make sure you delete all other objects that the hub has created (such as user pods, user PVCs, etc) every time the hub restarts. Otherwise you might run into errors about duplicate resources.

cat << EOF >> config.yaml

hub:
  db:
    type: sqlite-memory

singleuser:
  storage:
    type: sqlite-memory
EOF
  • Deploy JupyterHub
RELEASE=jhub
NAMESPACE=jhub

kubectl create namespace $NAMESPACE
helm upgrade --install $RELEASE jupyterhub/jupyterhub \
  --namespace $NAMESPACE  \
  --version=0.9.0 \
  --values config.yaml

Don't forget that the Helm chart version differ from JupyterHub version! See the jupyterhub/helm-chart repo.

Here we are, I hope this tutorial was helpful! Do not hesitate to make PR and have a great moment on JHub.

[Top]

About

A tutorial for a JupyterHub instance on a kube cluster deployed with kubespray

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages