Skip to content
This repository has been archived by the owner on Nov 13, 2023. It is now read-only.

A step-by-step guide how to configure AKS auto-scaling for GitHub self-hosted runners on Azure

Notifications You must be signed in to change notification settings

nicklegan/aks-auto-scaling-github-self-hosted-runners

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

AKS auto-scaling for GitHub self-hosted runners on Azure

This documentation provides a step-by-step guide how to configure AKS auto-scaling for GitHub self-hosted runners. This is established by configuring the actions-runner-controller.

The below auto-scaling guide consists of the following self-hosted runner specification:

  • Optimized for Azure Kubernetes Service
  • Compatible with GitHub Server and Cloud
  • Organization-level runners
  • Ephemeral runners
  • Auto-scaling with workflow_job webhooks
  • Webhook secret
  • Ingress TLS termination
  • Auto-provisioning Let's Encrypt SSL certificate
  • GitHub App API authentication

Table of Content

Prerequisites

Reference architecture

reference-architecture

Setup AKS cluster

# Install Azure CLI - https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
az login

# Install kubectl - https://docs.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az_aks_install_cli
az aks install-cli

# Create resource group
az group create -n <your-resource-group> --location <your-location>

# Create AKS cluster
az aks create -n <your-cluster-name> -g <your-resource-group> --node-resource-group <your-node-resource-group-name> --enable-managed-identity 

# Get AKS access credentials
az aks get-credentials -n <your-cluster-name> -g <your-resource-group>

Setup Helm client

# Install Helm - https://helm.sh/docs/intro/install/
brew install helm # macOS
choco install kubernetes-helm # Windows
sudo snap install helm --classic # Debian/Ubuntu

Add cert-manager and NGINX ingress repositories

# Add repositories
helm repo add jetstack https://charts.jetstack.io
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx

# Update repositories
helm repo update

Install cert-manager

# Install cert-manager - https://cert-manager.io/docs/installation/helm/
helm install --wait --create-namespace --namespace cert-manager cert-manager jetstack/cert-manager --version v1.6.1 --set installCRDs=true

Apply Let's Encrypt ClusterIssuer config for cert-manager

kubectl apply -f clusterissuer.yaml

clusterissuer.yaml

  • email:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
  namespace: cert-manager
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: [email protected]
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod
    # Enable the HTTP-01 challenge provider
    solvers:
      - http01:
          ingress:
            class: nginx

Install NGINX ingress controller

# Install NGINX Ingress controller
helm install ingress-nginx ingress-nginx/ingress-nginx --namespace actions-runner-system --create-namespace

# Retrieve public load balancer IP from ingress controller
kubectl -n actions-runner-system get svc

Setup domain A record

Navigate to your domain registrar and create a new A record linking the above ingress load balancer IP to your TLD as a subdomain. e.g. webhook.tld.com

Create a GitHub App and configure GitHub App authenthication

Configure workflow_job webhooks

  • Activate the GitHub App webhook feature and add your earlier created domain A record as a Webhook URL
  • Navigate to permissions & events and enable webhook workflow job events

Generate and set a GitHub App webhook secret

Prepare a webhook secret for use in the values.yaml file github_webhook_secret_token and configure the same webhook secret in the created GitHub App

# Generate random webhook secret
ruby -rsecurerandom -e 'puts SecureRandom.hex(20)'

Prepare Actions Runner Controller configuration

Modify the default values.yaml with your custom values like specified below

# Configure values.yaml
vim values.yaml

values.yaml

  • githubEnterpriseServerURL: only needed when using GHES
  • authSecret:
  • githubWebhookServer:
    • ingress:
    • github_webhook_secret_token
# The URL of your GitHub Enterprise server, if you're using one.
githubEnterpriseServerURL: https://github.example.com

# Only 1 authentication method can be deployed at a time
# Uncomment the configuration you are applying and fill in the details
authSecret:
  create: true
  name: "controller-manager"
  annotations: {}
  ### GitHub Apps Configuration
  ## NOTE: IDs MUST be strings, use quotes
  github_app_id: "3"
  github_app_installation_id: "1"
  github_app_private_key: |-
    -----BEGIN RSA PRIVATE KEY-----
    MIIEogIBAAKCAQEA2zl6z+uMcS4D+D9f1ENLJY2w/9lLPajs/wA2gnt74/7bcB1f
    0000000000000000000000000000000000000000000000000000000000000000
    0000000000000000000000000000000000000000000000000000000000000000
    0000000000000000000000000000000000000000000000000000000000000000
    0000000000000000000000000000000000000000000000000000000000000000
    2x/9kVAWKQ2UJGxqupGqV14vLaNpmA2uILBxc5jKXHu1nNkgUwU=
    -----END RSA PRIVATE KEY-----
  ### GitHub PAT Configuration
  #github_token: ""
githubWebhookServer:
  enabled: true
  replicaCount: 1
  syncPeriod: 10m
  secret:
    create: false
    name: "github-webhook-server"
    ### GitHub Webhook Configuration
    github_webhook_secret_token: ""
  imagePullSecrets: []
  nameOverride: ""
  fullnameOverride: ""
  serviceAccount:
    # Specifies whether a service account should be created
    create: true
    # Annotations to add to the service account
    annotations: {}
    # The name of the service account to use.
    # If not set and create is true, a name is generated using the fullname template
    name: ""
  podAnnotations: {}
  podLabels: {}
  podSecurityContext: {}
  # fsGroup: 2000
  securityContext: {}
  resources: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}
  priorityClassName: ""
  service:
    type: ClusterIP
    annotations: {}
    ports:
      - port: 80
        targetPort: http
        protocol: TCP
        name: http
        #nodePort: someFixedPortForUseWithTerraformCdkCfnEtc
  ingress:
        enabled: true
        annotations:
          kubernetes.io/ingress.class: nginx
          cert-manager.io/cluster-issuer: "letsencrypt-prod"
        hosts:
          - host: webhook.tld.com
            paths:
            - path: /
        tls:
          - secretName: letsencrypt-prod
            hosts:
              - webhook.tld.com

Install Actions Runner Controller

# Install actions-runner-controller
helm upgrade --install -f values.yaml --wait --namespace actions-runner-system actions-runner-controller actions-runner-controller/actions-runner-controller

Verify installation and SSL certificate

# View all namespace resources
kubectl --namespace actions-runner-system get all

# Verify certificaterequest status
kubectl get certificaterequest --namespace actions-runner-system

# Verify certificate status
kubectl describe certificate letsencrypt --namespace actions-runner-system

# Verify if SSL certificate is working properly
curl -v --connect-to webhook.tld.com https://webhook.tld.com

Deploy runner manifest

# Create a new namespace
kubectl create namespace self-hosted-runners

# Edit runnerdeployment yaml
vim runnerdeployment.yaml

# Apply runnerdeployment manifest
kubectl apply -f runnerdeployment.yaml

runnerdeployment.yaml

The below manifest deploys organization-level auto-scaling ephemeral runners, using a minimal keep-alive configuration of 1 runner. Runners are scaled up to 5 active replicas based on incoming workflow_job webhook events. Scaling them back down to 1 runner by idle timeout of 5 minutes

  • organization:
apiVersion: actions.summerwind.dev/v1alpha1
kind: RunnerDeployment
metadata:
  name: org-runner
  namespace: self-hosted-runners
spec:
  template:
    metadata:
      labels:
        app: org-runner
    spec:
      organization: your-github-organization
      labels:
        - self-hosted
      ephemeral: true
---
apiVersion: actions.summerwind.dev/v1alpha1
kind: HorizontalRunnerAutoscaler
metadata:
  name: org-runner
  namespace: self-hosted-runners
spec:
  scaleTargetRef:
    name: org-runner
  scaleUpTriggers:
    - githubEvent: {}
      amount: 1
      duration: "5m"
  minReplicas: 1
  maxReplicas: 5

Verify status of runners and pods

# List running pods
kubectl get pods -n self-hosted-runners

# List active runners
kubectl get runners -n self-hosted-runners

Verify deployment of all cluster services

kubectl get all -A 

Resources

About

A step-by-step guide how to configure AKS auto-scaling for GitHub self-hosted runners on Azure

Topics

Resources

Stars

Watchers

Forks