Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Runner inside of Docker Container #406

Open
jpb opened this issue Apr 3, 2020 · 46 comments
Open

Support Runner inside of Docker Container #406

jpb opened this issue Apr 3, 2020 · 46 comments
Labels

Comments

@jpb
Copy link

jpb commented Apr 3, 2020

Describe the enhancement

Fully support all features when runner is within a Docker container.

Not all features are currently supported when the runner is within a Docker container, specifically those features that use Docker like Docker-based Actions and services. Running self-hosted runners using Docker is an easy way to scale out runners on some sort of Docker-based cluster and an easy way to provide clean workspaces for each run (with ./run.sh --once).

Code Snippet

Possible implementation that I am using now.

Additional information

There are a few areas of concern when the runner executes in a Docker container:

  1. Filesystem access for other containers needed as part of the job. This can be resolved by using a volume mount from the host which uses a matching host and container path (for example: docker run -v /home/github:/home/github, although it doesn't have to be this exact directory) and telling the runner to use a directory within that for the work directory (./config.sh --work /home/github/work). This works with the current volume mounting behaviour for containers created by the runner. This would need to be documented as part of the setup process for a Docker-based runner.
  2. Network between runner and other containers needed as part of the job. This could be resolved by not creating a network as part of the run and instead optionally accepting an existing network to be used. I have found that it works well to use --network container:<container ID of the runner> to reuse the network from the runner container without having to orchestrate a network created via docker network create. There is no straightforward way to discover the network or ID of a container from within it, so it would likely need to be the responsibility of the user to pass this information to the runner (I current do something like "container:$(cat /proc/self/cgroup | grep "cpu" | head -n 1 | rev | cut -d/ -f 1 | rev)" from within the runner container to find the ID and pass this to the runner, although this isn't guaranteed to work in all cases).
@jpb
Copy link
Author

jpb commented Apr 7, 2020

There appear to be a couple more things that need to be done to account for multiple runners on the same host concurrently:

  1. docker network prune can not run concurrently and should likely be retried if such an error is received:
    /usr/local/bin/docker network prune --force --filter "label=898d1dec6adc"
    Error response from daemon: a prune operation is already running
    ##[warning]Delete stale container networks failed, docker network prune fail with exit code 1
    
  2. The docker label is not sufficient for isolating separate runners on the same host. The current hash of the root directory will result in the same label being used for all runners with the exact same version. In my testing I've switched this to use the hostname, but perhaps something like the runner name or run ID could be used.

@jpb
Copy link
Author

jpb commented May 13, 2020

@TingluoHuang @bryanmacfarlane I'm hoping to get your feedback on this – getting official support for this would be a huge help for me. I'm happy to work on an implementation if that is helpful.

@SonicGD
Copy link

SonicGD commented Jun 8, 2020

This is a big problem for us. We want to run gh runner in docker to easier scaling and isolation. But we need to also run services for tests. So our workaround for now is to run multiple runners on host, but scaling container with docker-compose is so much easier and convenient.

@npalm
Copy link

npalm commented Jun 8, 2020

This is a big problem for us. We want to run gh runner in docker to easier scaling and isolation. But we need to also run services for tests. So our workaround for now is to run multiple runners on host, but scaling container with docker-compose is so much easier and convenient.

Fully agree, for the time being we have build a scalable solution on AWS spot to server our docker builds. A detailed blog post en ref to the code https://040code.github.io/2020/05/25/scaling-selfhosted-action-runners

@jupe
Copy link

jupe commented Feb 16, 2021

This is a big problem for us. We want to run gh runner in docker to easier scaling and isolation. But we need to also run services for tests. So our workaround for now is to run multiple runners on host, but scaling container with docker-compose is so much easier and convenient.

Currently we have to create workarounds using non-optimal solutions to deploy tens of runners - or workarounds for container usage in jobs which is rather ugly. How to raise priority of this ?

Just curious, how other's manage scaling the runners ? This is probably one of the most interested approach so far I've seen.. I guess many of us faced this same challenge when scaling gh-runners.. "Official" scaling proposals from GitHub would be more than welcome :) .

@vincentbrison
Copy link

This is a big problem for us. We want to run gh runner in docker to easier scaling and isolation. But we need to also run services for tests. So our workaround for now is to run multiple runners on host, but scaling container with docker-compose is so much easier and convenient.

Big kudos for @npalm and its solution on AWS. We also build a similar solution for GCP allowing us to scale our self hosted runners for a whole GitHub organization https://github.com/faberNovel/terraform-gcp-github-runner

@callum-tait-pbx
Copy link

@jupe we use https://github.com/summerwind/actions-runner-controller which has worked really well for us so far

@pratikbin
Copy link

waiting for this one so bad

@uwehdaub
Copy link

uwehdaub commented Jul 2, 2021

@jpb Is there any possibility to get a higher priority on this?

@uwehdaub
Copy link

uwehdaub commented Jul 8, 2021

For now we will use some workaround based on docker-compose.
We have the following docker-compose.yaml file in the repo to setup the services.

version: "3.3"
services:
  nginx:
    image: nginx
  redis:
    image: redis

We then connect the self-hosted runner which is also running inside docker with the created network.
This is one example WF:

name: Start docker compose
on:
  workflow_dispatch:
jobs:
  start-docker-compose:
    # should run as docker container with connection to the dockerd of the host
    runs-on: [self-hosted] 
    steps:
      - name: Checkout code
        uses: actions/checkout@v2
        with:
          fetch-depth: 1
      - name: Start docker compose
        id: start-docker-compose
        run: |
          project_prefix=my-project
          project_name="${project_prefix}-$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c 13 | tr '[:upper:]' '[:lower:]')"
          my_container_id=$(grep docker /proc/self/cgroup | head -n 1 | sed "s|^.*/docker/\(.*\)|\\1|")

          docker-compose -p "${project_name}" up -d
          while ! docker network inspect "${project_name}_default" > /dev/null ; do
            sleep 1
          done

          docker network connect "${project_name}_default" "${my_container_id}"

          echo "::set-output name=my_container_id::$my_container_id"
          echo "::set-output name=project_name::$project_name"

      - name: Check output
        run: |
          echo "Project name: ${{ steps.start-docker-compose.outputs.project_name}}"
          echo "Container ID: ${{ steps.start-docker-compose.outputs.remote_container_id}}"

      - name: Use the started docker compose services
        run: |
          # Install netcat to check redis
          apt-get update
          apt-get install -y netcat
          echo '--------------------'
          ping -c 1 nginx
          curl nginx
          echo '--------------------'
          ping -c 1 redis
          echo ping | netcat -w 2 redis 6379
      - name: Cleanup started docker compose services
        if: always()
        run: |
          docker network disconnect ${{ steps.start-docker-compose.outputs.project_name}}_default ${{ steps.start-docker-compose.outputs.my_container_id}}
          docker-compose -p ${{ steps.start-docker-compose.outputs.project_name}} down

@bryanmacfarlane
Copy link
Member

@jpb , since you asked me, I'm ➕ on this, But adding @hross to weigh in since he's driving the runner area now. 🚀

@hross
Copy link
Contributor

hross commented Jul 12, 2021

We still want to do this and it's on our list but we don't have a date or schedule for shipping this type of feature right now.

@brandonschabell
Copy link

Would love to see this prioritized. Can't really run docker-in-docker on Kubernetes self-hosted runners without this.

@nehagargSeequent
Copy link

Any update on this issue?

@myoung34
Copy link

myoung34 commented Oct 7, 2021

Ping @bryanmacfarlane =)

@na-jakobs
Copy link

Plus one here, any update ETA? @hross

@pl4nty
Copy link

pl4nty commented Dec 18, 2021

Another +1 here, for me this is blocking some 3rd-party deployment workflows with private AKS clusters

@ecout
Copy link

ecout commented Sep 7, 2022

This is a big problem for us. We want to run gh runner in docker to easier scaling and isolation. But we need to also run services for tests. So our workaround for now is to run multiple runners on host, but scaling container with docker-compose is so much easier and convenient.

Fully agree, for the time being we have build a scalable solution on AWS spot to server our docker builds. A detailed blog post en ref to the code https://040code.github.io/2020/05/25/scaling-selfhosted-action-runners

Makes sense, I did something similar for DNS Resolver ENIS with Cloudwatch as inputs.

@ecout
Copy link

ecout commented Sep 7, 2022

The main issue I see with all this is access to docker.sock...the whole docker in docker with root access scenario.
myoung34/docker-github-actions-runner#61
From the examples mentioned here:
#367 (comment)
You can try rootless,
https://docs.docker.com/engine/security/rootless/#rootless-docker-in-docker

But then you run into limitations. So a docker container "runner" running into another docker rootless container inside your typical rooted docker, can you even do docker build then?
Buildkit: https://www.containiq.com/post/docker-alternatives

And apparently some things have stopped working:
#2103

And then again, at the end of the day, you'll want container orchestration to bring your runners up and down.

Can your actions consider docker alternatives to build images with a container runner?

https://snyk.io/blog/building-docker-images-kubernetes/

For our team specifically we do want container runners that are also able to run containers.

@alexjoeyyong
Copy link

Any update on this or new news?

@AJMcKane
Copy link

Also need to plus one this issue. I've tried every workaround including the latest changes to

https://github.com/actions/runner/blob/main/images/Dockerfile by the @TingluoHuang and the team, but having the CLI isn't really too much use unless we can run docker pull xxx. More specifically when anyone is developing actions they have to be hyper aware of what the action is written in.

@AllanOricil
Copy link

AllanOricil commented May 3, 2023

I really want to run my jobs using that container feature :(

...
jobs:
  job:
    runs-on:
      labels:
        - self-hosted
        - linux
        - ${{ inputs.RUNNER_LABEL }}
    container:
      image: ${{ inputs.DOCKER_IMAGE }}
      credentials:
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}

    steps:
      - run: |
          sfdx version --json

Since I can't execute jobs that run inside containers, and my workflows don't need any other dockerized services, I can get over this limitation by just creating PODs with a docker image that has the runner + everything else that my docker image has that is necessary to run the job. The downside is that I need to create a new image. The following image shows exactly what I'm thinking about:

image

OBS: I don't need more than one node. If an entire node goes down, I can just wait for EKS to recreate it, as well as its PODs

With the workaround architecture in place, I can then remove the container configuration from the job manifest.

...
jobs:
  job:
    runs-on:
      labels:
        - self-hosted
        - linux
        - ${{ inputs.RUNNER_LABEL }}

    steps:
      - run: |
          sfdx version --json

OBS: I'm just not sure if the storage is going to, somehow, be shared by the PODs or if they are unique, even when using the same name. If the storage is shared between PODs, then one job could impact another one if both PODs run on the same Node.

@AllanOricil
Copy link

AllanOricil commented May 3, 2023

I have a POD running a github-runner and a dind container. When a job that runs on a container is taken by the github-runner container, the job can't execute a simple inline script such as echo hello. Am I doing something wrong, or is it a problem caused by this issue as stated in this other issue?

In the following image you can see that the Container is created by the dind container without a problem, but the inline script can't be executed inside the container

image

This is my kubernetes deployment manifest:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: github-runner-public
  namespace: github-runners
  labels:
    app: github-runner-public
spec:
  replicas: 1
  selector:
    matchLabels:
      app: github-runner-public
  template:
    metadata:
      labels:
        app: github-runner-public
    spec:
      nodeSelector:
        eks.amazonaws.com/nodegroup: public-nodegroup
      containers:
        - name: github-runner
          image: 225537886698.dkr.ecr.eu-west-1.amazonaws.com/github-runner-test:v1.1.3
          env:
            - name: DOCKER_HOST
              value: tcp://localhost:2375
            - name: DOCKER_API_VERSION
              value: "1.42"
          volumeMounts:
            - name: runner-workspace
              mountPath: /actions-runner/_work
        - name: dind-daemon
          image: docker:23.0.5-dind
          command: ["dockerd", "--host", "tcp://127.0.0.1:2375"]
          securityContext:
            privileged: true
          volumeMounts:
            - name: docker-graph-storage
              mountPath: /var/lib/docker
      volumes:
        - name: docker-graph-storage
          emptyDir: {}
        - name: runner-workspace
          emptyDir: {}

Why/how can't the container that runs inside the dind not have access to /actions-runner/_work/_temp from github-runner? I don't get it.
After reading this post I understood that the directories from the container inside the dind would be mapped to the directories inside the github-runner. So, If the container created by dind is mapping /actions-runner/_work from the github-runner container to /__w that is inside the container as shown below, why isn't the /__w/__temp/<bla>.sh available?

/actions-runner/_work (volume in the node) <- github-runer [/actions-runner/_work] -> (dind) -> my-container [/__w]

Shouln't /actions-runner/_work/_temp/<bla>.sh be inside the volume?

image

This is the content of the /actions-runner/_work/_temp directory inside the github-runner container. For some reason it is empty. Does this mean that the controller can't create the inline script inside the runner when it is running in a container?
image

@ChristopherHX
Copy link
Contributor

Your kubernetes manifest has a problem, because actions/runner performs docker bind mounts and you use DOCKER_HOST=tcp:// (same applies to DOCKER_HOST=ssh://) to a different system with it's own filesystem, the mounted folder doesn't exist in dind-daemon.
Mounting the _work folders in both container might help to use run steps, but due to an empty externals folder on the docker machine it won't be able to find externals/node16/bin/node. So you would also need to download / mount the externals to the dind-daemon container, before starting the container job.

You probably don't need to share any runner credentials with the dind container, but it would be easier to install the actions/runner to the dind image other than using two isolated containers.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: github-runner-public
  namespace: github-runners
  labels:
    app: github-runner-public
spec:
  replicas: 1
  selector:
    matchLabels:
      app: github-runner-public
  template:
    metadata:
      labels:
        app: github-runner-public
    spec:
      nodeSelector:
        eks.amazonaws.com/nodegroup: public-nodegroup
      containers:
        - name: github-runner
          image: 225537886698.dkr.ecr.eu-west-1.amazonaws.com/github-runner-test:v1.1.3
          env:
            - name: DOCKER_HOST
              value: tcp://localhost:2375
            - name: DOCKER_API_VERSION
              value: "1.42"
          volumeMounts:
            - name: runner-workspace
              mountPath: /actions-runner/_work
        - name: dind-daemon
          image: docker:23.0.5-dind
          command: ["dockerd", "--host", "tcp://127.0.0.1:2375"]
          securityContext:
            privileged: true
          volumeMounts:
            - name: docker-graph-storage
              mountPath: /var/lib/docker
            - name: runner-workspace # You need to share this, docker bind mounts only work if the docker daemon can find the path locally
              mountPath: /actions-runner/_work
            # TODO download the external tools of the actions/runner to `/actions-runner/externals` to be able to use `actions/checkout@v3` and all other nodejs actions.
      volumes:
        - name: docker-graph-storage
          emptyDir: {}
        - name: runner-workspace
          emptyDir: {}

@AllanOricil
Copy link

@ChristopherHX thank you for helping again 😄

@AllanOricil
Copy link

@ChristopherHX you are a god! Thank you! it worked :D

image

@AllanOricil
Copy link

AllanOricil commented May 5, 2023

before sharing /actions-runner/externals directory with the dind container

image

after sharing /actions-runner/externals directory with the dind container

image

This is my final deployment manifest. With this deployment, I was able to run a github actions job inside a Kubernetes POD

apiVersion: apps/v1
kind: Deployment
metadata:
  name: github-runner-public
  namespace: github-runners
  labels:
    app: github-runner-public
spec:
  replicas: 1
  selector:
    matchLabels:
      app: github-runner-public
  template:
    metadata:
      labels:
        app: github-runner-public
    spec:
      nodeSelector:
        eks.amazonaws.com/nodegroup: public-nodegroup
      containers:
        - name: github-runner
          image: 225537886698.dkr.ecr.eu-west-1.amazonaws.com/github-runner-test:v1.1.3
          env:
            - name: DOCKER_HOST
              value: tcp://localhost:2375
            - name: DOCKER_API_VERSION
              value: "1.42"
          volumeMounts:
            - name: runner-workspace
              mountPath: /actions-runner/_work
            - name: runner-externals
              mountPath: /externals
          lifecycle:
            postStart:
              exec:
                command:
                  ["/bin/sh", "-c", "cp -a /actions-runner/externals/. /externals"]
        - name: dind-daemon
          image: docker:23.0.5-dind
          command: ["dockerd", "--host", "tcp://127.0.0.1:2375"]
          securityContext:
            privileged: true
          volumeMounts:
            - name: docker-graph-storage
              mountPath: /var/lib/docker
            - name: runner-workspace
              mountPath: /actions-runner/_work
            - name: runner-externals
              mountPath: /actions-runner/externals
      volumes:
        - name: docker-graph-storage
          emptyDir: {}
        - name: runner-workspace
          emptyDir: {}
        - name: runner-externals
          emptyDir: {}

and this is the workflow that has a job that runs inside a container (dind)

name: Test Self-hosted Runners Docker

on:
  workflow_dispatch:
    inputs:
      RUNNER_LABEL:
        type: string
        description: choose the runner based using a label
      DOCKER_IMAGE:
        type: string
        description: docker image
        default: ghcr.io/vodafoneis/salesforce-build-image:v3.x

env:
  HOME: /root

jobs:
  job:
    runs-on:
      labels:
        - self-hosted
        - linux
        - ${{ inputs.RUNNER_LABEL }}
    container:
      image: ${{ inputs.DOCKER_IMAGE }}
      credentials:
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}

    steps:
      - run: |
          echo $HOME
          echo $PATH
          sfdx version --json

      - uses: actions/checkout@v3

my github-runner image is using v2.303.0

Thanks @ChristopherHX for the tips

@AllanOricil
Copy link

AllanOricil commented May 5, 2023

Many jobs running on the same Node

image

image

image

@AllanOricil
Copy link

As an enhancement to avoid having each dind downloading the same image over and over again, which stresses the disk a lot, I'm going to follow these steps: https://blog.argoproj.io/storage-considerations-for-docker-in-docker-on-kubernetes-ed928a83331c

@AllanOricil
Copy link

I have also verified that my implementation enables service containers. Below you can see the job execution for the redis example from this repository.

image

@AllanOricil
Copy link

@TingluoHuang @bryanmacfarlane @nikola-jokic I think this issue can be closed. If not, could you provide an example of workflow manifest that won't work on my kubernetes cluster.

@Sebastian-0
Copy link

@AllanOricil I disagree, this issue is not only about Kubernetes. We host our own runners in-house using docker (without Kubernetes) and this limitation is causing problems. The temporary solution for us is to install the runners directly on the machines rather than inside containers, but that's not the way I would like it configured.

@AllanOricil
Copy link

AllanOricil commented May 8, 2023

@Sebastian-0 I understand that there are limitations. But the way this issue is currently written does not say what these limitations are exactly. The way it is currently written, specially its title, can lead people to believe that is not possible to run the runner inside a container in any ways, which, after some trials, I discovered it is not true.
I know that there is a solution that allows the runner to run inside a docker container, because I was able to do it. However, this evidence alone does not prove that there isn't an edge case that won't work. So, that is why I'm asking an example of workflow that won't work with this solution. This can help other people to better understand what the real problem is.

@nbrugger-tgm
Copy link

@AllanOricil, I do not think that this issue can be closed as long as there is no official guide/info on this such as:

  • docker will not be supported
  • to run in docker you need to ...image... entry point ... network … mounts ... etc that is kept up to date by the gh-actions team

@timnolte
Copy link

@Sebastian-0 I understand that there are limitations. But the way this issue is currently written does not say what these limitations are exactly. The way it is currently written, specially its title, can lead people to believe that is not possible to run the runner inside a container in any ways, which, after some trials, I discovered it is not true. I know that there is a solution that allows the runner to run inside a docker container, because I was able to do it. However, this evidence alone does not prove that there isn't an edge case that won't work. So, that is why I'm asking an example of workflow that won't work with this solution. This can help other people to better understand what the real problem is.

@AllanOricil basically these items are not working: myoung34/docker-github-actions-runner#98

Meaning that in GitHub Actions you can't really get container services working/running that might be needed. I was getting failures when attempting to use the "Local Registry" solution for GitHub Actions when building Docker images in order to test them.

/usr/bin/docker create --name fc2ec27c43d44dc68880bee45a7b12fb_registry2_b6e2e5 --label c3f261 --network github_network_6886f8441d2f4f3489b19978db60872b --network-alias registry -p 5000:5000  -e GITHUB_ACTIONS=true -e CI=true registry:2
  83b0180533c3ec325b50cff90[42](https://github.com/ndigitals/ols-dockerfiles/actions/runs/5920922473/job/16052814082#step:2:45)8487490e74aaa907a46ac8ae45f08bc866755
  /usr/bin/docker start 83b0180533c3ec325b50cff90428487490e74aaa907a46ac8ae45f08bc866755
  Error response from daemon: network github_network_6886f8[44](https://github.com/ndigitals/ols-dockerfiles/actions/runs/5920922473/job/16052814082#step:2:47)1d2f4f3489b19978db60872b not found
  Error: failed to start containers: 83b0180533c3ec325b50cff90428487490e74aaa907a46ac8ae[45](https://github.com/ndigitals/ols-dockerfiles/actions/runs/5920922473/job/16052814082#step:2:48)f08bc866755

I was trying to use the following Local Registry GitHub Actions setup for Docker image builds.

@AllanOricil
Copy link

Interesting. Now it makes sense to me why this should not be closed. Thanks @timnolte

@fabio-s-franco
Copy link

I have setup dind using terraform and a custom image based on dind. That all ran in my local WSL2, so I assume it wouldn't be far fetched to do it for the other scenarios. In an approach similar to what @AllanOricil, I did have to setup a bridge network properly, which took a lot of trial and error, but in the end it worked, both with a shared socket and with an independent socket. I will share here what I did as is and maybe someone can pick it up from it.

This is not a solution, but whoever is having problems, may pickup some ideas from what I did:

Terraform code

Should be straightforward to convert it to anything else)

Network

resource "docker_network" "rke_network" {

  name = local.docker_network_name

  driver     = "bridge"
  attachable = true
  internal   = false

  ipam_config {
    ip_range = local.network_subnet
    subnet   = local.network_subnet
    gateway  = local.network_gateway
  }

  options = {
    "com.docker.network.bridge.enable_icc"           = "true"
    "com.docker.network.bridge.enable_ip_masquerade" = "true"
    "com.docker.network.bridge.host_binding_ipv4"    = "0.0.0.0"
    "com.docker.network.bridge.name"                 = local.iface_name
    "com.docker.network.driver.mtu"                  = "65000"
    "com.docker.network.driver.txqueuelen"           = "10000"
  }

  provisioner "local-exec" {
    command = "sudo ip link set dev eth0 txqueuelen 10000 && sudo ip link set dev eth0 mtu 65000 && sysctl -w net/netfilter/nf_conntrack_max=393216"
  }
}

Docker container node definition (where the agent would be)

resource "docker_container" "node" {
  count = 4

  name  = "rke-dind-${local.nodes[count.index].ipv4_address}"
  image = "docker.local/dind-ssh:latest"

  privileged        = true
  publish_all_ports = true

  user          = "root"
  cgroupns_mode = "host"
  ipc_mode      = "shareable"
  stdin_open    = true
  tty           = true
  runtime       = "runc"

  network_mode = "bridge"

  env = [ "DOCKER_TLS_CERTDIR=\"\"", "AUTH_PUBKEY=${tls_private_key.ssh_key.public_key_openssh}", "AUTH_PRVKEY=${tls_private_key.ssh_key.private_key_openssh}" ]

  networks_advanced {
    name         = local.docker_network_name
    ipv4_address = local.nodes[count.index].ipv4_address
  }

  ports {
    internal = 22
    ip = "0.0.0.0"
    protocol = "tcp"
  }

  ports {
    internal = 2375
    ip = "0.0.0.0"
    protocol = "tcp"
  }

  ports {
    internal = 2376
    ip = "0.0.0.0"
    protocol = "tcp"
  }

  ports {
    internal = 2379
    ip = "0.0.0.0"
    protocol = "tcp"
  }
}

Building a custom dind image

I did to enable ssh into it, but can also be used to add an agent for example.
The RSA keys you should generate one yourself as they are used in the image build process (but could also be generated otherwise during build). My use case required that I had these keys pre-made elsewhere.

Dockerfile

FROM docker:dind AS dindssh

# Install SSH server
RUN apk add openssh-server

# Generate SSH host keys
RUN ssh-keygen -A

# Copy sshd_config file
COPY sshd_config /etc/ssh/sshd_config
COPY daemon.json /etc/docker/daemon.json
RUN chmod 700 /usr/local/bin/dockerd-entrypoint.sh
COPY dockerd-entrypoint.sh /usr/local/bin/dockerd-entrypoint.sh
RUN chmod +x /usr/local/bin/dockerd-entrypoint.sh

# Add authorized keys for root user
RUN mkdir -p /root/.ssh
RUN touch /root/.ssh/authorized_keys
RUN touch /root/.ssh/id_rsa
RUN chmod 600 /root/.ssh/authorized_keys
RUN chmod 600 /root/.ssh/id_rsa
RUN chown root:root /root/.ssh/authorized_keys
RUN chown root:root /root/.ssh/id_rsa

# Expose SSH, Docker and etcd ports
EXPOSE 22 2375 2379

# Start SSH daemon and Docker daemon
ENTRYPOINT (/usr/sbin/sshd -D &) && dockerd-entrypoint.sh
CMD []

Entrypoint file

Don't remember where I got the base file from, but it is highly customized to work with SSH + DIND

#!/bin/sh
set -eu

_tls_ensure_private() {
	local f="$1"; shift
	[ -s "$f" ] || openssl genrsa -out "$f" 4096
}
_tls_san() {
	{
		ip -oneline address | awk '{ gsub(/\/.+$/, "", $4); print "IP:" $4 }'
		{
			cat /etc/hostname
			echo 'docker'
			echo 'localhost'
			hostname -f
			hostname -s
		} | sed 's/^/DNS:/'
		[ -z "${DOCKER_TLS_SAN:-}" ] || echo "$DOCKER_TLS_SAN"
	} | sort -u | xargs printf '%s,' | sed "s/,\$//"
}
_tls_generate_certs() {
	local dir="$1"; shift

	# if server/{ca,key,cert}.pem && !ca/key.pem, do NOTHING except verify (user likely managing CA themselves)
	# if ca/key.pem || !ca/cert.pem, generate CA public if necessary
	# if ca/key.pem, generate server public
	# if ca/key.pem, generate client public
	# (regenerating public certs every startup to account for SAN/IP changes and/or expiration)

	if [ -s "$dir/server/ca.pem" ] && [ -s "$dir/server/cert.pem" ] && [ -s "$dir/server/key.pem" ] && [ ! -s "$dir/ca/key.pem" ]; then
		openssl verify -CAfile "$dir/server/ca.pem" "$dir/server/cert.pem"
		return 0
	fi

	# https://github.com/FiloSottile/mkcert/issues/174
	local certValidDays='825'

	if [ -s "$dir/ca/key.pem" ] || [ ! -s "$dir/ca/cert.pem" ]; then
		# if we either have a CA private key or do *not* have a CA public key, then we should create/manage the CA
		mkdir -p "$dir/ca"
		_tls_ensure_private "$dir/ca/key.pem"
		openssl req -new -key "$dir/ca/key.pem" \
			-out "$dir/ca/cert.pem" \
			-subj '/CN=docker:dind CA' -x509 -days "$certValidDays"
	fi

	if [ -s "$dir/ca/key.pem" ]; then
		# if we have a CA private key, we should create/manage a server key
		mkdir -p "$dir/server"
		_tls_ensure_private "$dir/server/key.pem"
		openssl req -new -key "$dir/server/key.pem" \
			-out "$dir/server/csr.pem" \
			-subj '/CN=docker:dind server'
		cat > "$dir/server/openssl.cnf" <<-EOF
			[ x509_exts ]
			subjectAltName = $(_tls_san)
		EOF
		openssl x509 -req \
				-in "$dir/server/csr.pem" \
				-CA "$dir/ca/cert.pem" \
				-CAkey "$dir/ca/key.pem" \
				-CAcreateserial \
				-out "$dir/server/cert.pem" \
				-days "$certValidDays" \
				-extfile "$dir/server/openssl.cnf" \
				-extensions x509_exts
		cp "$dir/ca/cert.pem" "$dir/server/ca.pem"
		openssl verify -CAfile "$dir/server/ca.pem" "$dir/server/cert.pem"
	fi

	if [ -s "$dir/ca/key.pem" ]; then
		# if we have a CA private key, we should create/manage a client key
		mkdir -p "$dir/client"
		_tls_ensure_private "$dir/client/key.pem"
		chmod 0644 "$dir/client/key.pem" # openssl defaults to 0600 for the private key, but this one needs to be shared with arbitrary client contexts
		openssl req -new \
				-key "$dir/client/key.pem" \
				-out "$dir/client/csr.pem" \
				-subj '/CN=docker:dind client'
		cat > "$dir/client/openssl.cnf" <<-'EOF'
			[ x509_exts ]
			extendedKeyUsage = clientAuth
		EOF
		openssl x509 -req \
				-in "$dir/client/csr.pem" \
				-CA "$dir/ca/cert.pem" \
				-CAkey "$dir/ca/key.pem" \
				-CAcreateserial \
				-out "$dir/client/cert.pem" \
				-days "$certValidDays" \
				-extfile "$dir/client/openssl.cnf" \
				-extensions x509_exts
		cp "$dir/ca/cert.pem" "$dir/client/ca.pem"
		openssl verify -CAfile "$dir/client/ca.pem" "$dir/client/cert.pem"
	fi
}

export DOCKER_HOST=/var/run/docker.sock
# no arguments passed
# or first arg is `-f` or `--some-option`
if [ "$#" -eq 0 ] || [ "${1#-}" != "$1" ]; then
	# set "dockerSocket" to the default "--host" *unix socket* value (for both standard or rootless)
	uid="$(id -u)"
	if [ "$uid" = '0' ]; then
		dockerSocket='unix:///var/run/docker.sock'
	else
		# if we're not root, we must be trying to run rootless
		: "${XDG_RUNTIME_DIR:=/run/user/$uid}"
		dockerSocket="unix://$XDG_RUNTIME_DIR/docker.sock"
	fi
	case "${DOCKER_HOST:-}" in
		unix://*)
			dockerSocket="$DOCKER_HOST"
			;;
	esac

	# add our default arguments
	if [ -n "${DOCKER_TLS_CERTDIR:-}" ]; then
		_tls_generate_certs "$DOCKER_TLS_CERTDIR"
		# generate certs and use TLS if requested/possible (default in 19.03+)
		set -- dockerd \
			--tlsverify \
			--tlscacert "$DOCKER_TLS_CERTDIR/server/ca.pem" \
			--tlscert "$DOCKER_TLS_CERTDIR/server/cert.pem" \
			--tlskey "$DOCKER_TLS_CERTDIR/server/key.pem" \
			"$@"
		DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS="${DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS:-} -p 0.0.0.0:2376:2376/tcp"
	else
		# TLS disabled (-e DOCKER_TLS_CERTDIR='') or missing certs
		set -- dockerd \
			"$@"
		DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS="${DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS:-} -p 0.0.0.0:2375:2375/tcp"
	fi
fi

if [ "$1" = 'dockerd' ]; then
	# explicitly remove Docker's default PID file to ensure that it can start properly if it was stopped uncleanly (and thus didn't clean up the PID file)
	find /run /var/run -iname 'docker*.pid' -delete || :

	if dockerd --version | grep -qF ' 20.10.'; then
		# XXX inject "docker-init" (tini) as pid1 to workaround https://github.com/docker-library/docker/issues/318 (zombie container-shim processes)
		set -- docker-init -- "$@"
	fi

	if ! iptables -nL > /dev/null 2>&1; then
		# if iptables fails to run, chances are high the necessary kernel modules aren't loaded (perhaps the host is using nftables with the translating "iptables" wrappers, for example)
		# https://github.com/docker-library/docker/issues/350
		# https://github.com/moby/moby/issues/26824
		modprobe ip_tables || :
	fi

	uid="$(id -u)"
	if [ "$uid" != '0' ]; then
		# if we're not root, we must be trying to run rootless
		if ! command -v rootlesskit > /dev/null; then
			echo >&2 "error: attempting to run rootless dockerd but missing 'rootlesskit' (perhaps the 'docker:dind-rootless' image variant is intended?)"
			exit 1
		fi
		user="$(id -un 2>/dev/null || :)"
		if ! grep -qE "^($uid${user:+|$user}):" /etc/subuid || ! grep -qE "^($uid${user:+|$user}):" /etc/subgid; then
			echo >&2 "error: attempting to run rootless dockerd but missing necessary entries in /etc/subuid and/or /etc/subgid for $uid"
			exit 1
		fi
		: "${XDG_RUNTIME_DIR:=/run/user/$uid}"
		export XDG_RUNTIME_DIR
		if ! mkdir -p "$XDG_RUNTIME_DIR" || [ ! -w "$XDG_RUNTIME_DIR" ] || ! mkdir -p "$HOME/.local/share/docker" || [ ! -w "$HOME/.local/share/docker" ]; then
			echo >&2 "error: attempting to run rootless dockerd but need writable HOME ($HOME) and XDG_RUNTIME_DIR ($XDG_RUNTIME_DIR) for user $uid"
			exit 1
		fi
		if [ -f /proc/sys/kernel/unprivileged_userns_clone ] && unprivClone="$(cat /proc/sys/kernel/unprivileged_userns_clone)" && [ "$unprivClone" != '1' ]; then
			echo >&2 "error: attempting to run rootless dockerd but need 'kernel.unprivileged_userns_clone' (/proc/sys/kernel/unprivileged_userns_clone) set to 1"
			exit 1
		fi
		if [ -f /proc/sys/user/max_user_namespaces ] && maxUserns="$(cat /proc/sys/user/max_user_namespaces)" && [ "$maxUserns" = '0' ]; then
			echo >&2 "error: attempting to run rootless dockerd but need 'user.max_user_namespaces' (/proc/sys/user/max_user_namespaces) set to a sufficiently large value"
			exit 1
		fi
		# TODO overlay support detection?
		exec rootlesskit \
			--net="${DOCKERD_ROOTLESS_ROOTLESSKIT_NET:-vpnkit}" \
			--mtu="${DOCKERD_ROOTLESS_ROOTLESSKIT_MTU:-1500}" \
			--disable-host-loopback \
			--port-driver=builtin \
			--copy-up=/etc \
			--copy-up=/run \
			${DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS:-} \
			"$@"
	elif [ -x '/usr/local/bin/dind' ]; then
		# if we have the (mostly defunct now) Docker-in-Docker wrapper script, use it
		set -- '/usr/local/bin/dind' "$@"
	fi
else
	# if it isn't `dockerd` we're trying to run, pass it through `docker-entrypoint.sh` so it gets `DOCKER_HOST` set appropriately too
	set -- docker-entrypoint.sh "$@"
fi

if [ "$AUTH_PUBKEY" ]; then
    echo "$AUTH_PUBKEY" > /root/.ssh/authorized_keys
    echo "$AUTH_PRVKEY" > /root/.ssh/id_rsa
fi

mount --make-shared / && mount --make-shared /sys && mount --make-shared /var/lib/docker
exec "dockerd" "--tls=false"
#exec "$@"

daemon.json file

{
    "debug": false,
    "hosts": [
        "tcp://0.0.0.0:2375",
        "unix:///var/run/docker.sock"
    ],
    "runtimes": {
        "sysbox-runc": {
            "path": "/usr/bin/sysbox-runc"
        }
    },
    "dns": ["1.1.1.1"],
    "userland-proxy": false
}

@juannavalonribas
Copy link

Hello, any update about this feature?

@sergii-rybin-tfs
Copy link

Hello we need this feature also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests