Lean, secure, automated, zero downtime*, poor man's infra for services running in docker.
Running a home network? Then you may already have a custom setup, probably using docker compose. You might enjoy all the maintenance and tinkering, but you are surely aware of the pitfalls and potential downtime. If you think that is ok, or if you don't want automation, then this stack is probably not for you. Still interested? Then read on...
Table of contents:
- Key concepts
- Apps included
- Prerequisites
- Dev/ops tools
- Howto
- Questions one might have
- Disclaimer
One file (db.yml
) is used for all the infra and workloads it creates and manages, to ensure a predictable and reliable automated workflow.
This means abstractions are used which means a trade off between flexibility and reliability, but the stack is easily modified and enhanced to meet your needs. We strive to mirror docker compose functionality, which means no concessions are necessary from a docker compose enthusiast's perspective.
itsUP generates and manages proxy/docker-compose.yml
which operates traefik to be able to do all one wants from a routing solution:
- Terminate TLS and forward tcp/udp traffic over an encrypted network to listening endpoints. ææ2. Passthrough TLS to endpoints (most people have secure Home Assistant setups already).
- Open host ports if needed to choose a new port (openvpn service does exactly that)
itsUP generates and manages upstream/{project}/docker-compose.yml
files to deploy container workloads as defined as a service in db.yml
.
This centralizes and abstracts away the plethora of custom docker compose setups that are mostly uniform in their approach anyway, so controlling their artifacts from one source of truth makes a lot of sense.
Like with all docker orchestration platforms (even Kubernetes) this is dependent on the containers:
- are healthchecks correctly implemented?
- Are SIGHUP signals respected to shutdown within an acceptable time frame?
- Are the containers stateless?
itsUP will rollout changes by:
- bringing up a new container and wait till it is healthy (if it has a health check then max 60s, otherwise assumes it is healthy after 10s)
- kill the old container and wait for it to drain, then removes it
What about stateful services?
It is surely possible to deploy stateful services but beware that those might not be good candidates for the docker rollout
automation. In order to update those services it is strongly advised to first read the upgrade documentation for the newer version and follow the prescribed steps. More mature databases might have integrated these steps in the runtime, but expect that to be an exception. So, to garner correct results you are on your own and will have to read up on your chosen solutions.
- traefik/traefik: the famous L7 routing proxy that manages letsencrypt certificates
- minio/minio: S3 storage
- nubacuk/docker-openvpn: vpn access to the host running this stack
- traefik/whoami: to demonstrate that headers are correctly passed along
Tools:
Infra:
- Portforwarding of port
80
and443
to the machine running this stack. This stack MUST overtake whatever routing you now have, but don't worry, as it supports your home assistant setup and forwards any traffic it expects to it (if you finish the pre-configuredhome-assistant
project indb.yml
) - A wildcard dns domain like
*.itsup.example.com
that points to your home ip. This allows to choose whatever subdomain for your services. You may of course choose and manage any domain in a similar fashion for a public service, but I suggest not going through such trouble for anything private.
Source lib/functions.sh
to get:
dcp
: run adocker compose
command targeting the proxy stack (proxy
+terminate
services):dcp logs -f
dcu
: run adocker compose
command targeting a specific upstream:dcu test up
dca
: run adocker compose
command targeting all upstreams:dca ps
dcpx
: execute a command in one of the proxy containers:dcpx traefik 'rm -rf /etc/acme/acme.json && shutdown' && dcp up
dcux
: execute a command in one of the upstream containers:dcux test test-informant env
In effect these wrapper commands achieve the same as when going into an upstream/\*
folder and runningdocker compose
there.
I don't want to switch folders/terminals all the time and want to keep a "project root" history of my commands so I choose this approach.
(Obsolete since migration to Treaefik)bin/update-certs.py
: pull certs and reload the proxy if any certs were created or updated. You could run this in a crontab every week if you want to stay up to date.bin/write-artifacts.py
: after updatingdb.yml
you can run this script to generate new artifacts.bin/validate-db.py
: also ran frombin/write-artifacts.py
bin/requirements-update.sh
: You may want to update requirements once in a while ;)
These are the scripts to install everything and start the proxy and api so that we can receive incoming challenge webhooks:
bin/install.sh
: creates a local.venv
and installs all python project deps.bin/start-all.sh
: starts the proxy (docker compose) and the api server (uvicorn).bin/apply.py
: applies all ofdb.yml
.bin/api-logs.sh
: tails the output of the api server.
But before doing so please configure your stuff:
- Copy
.env.sample
to.env
and set the correct info (comments should be self explanatory). - Copy
db.yml.sample
todb.yml
and edit your project and their services (see explanations below).
Project and service configuration is explained below with the following scenarios. Please also check db.yml.sample
as it contains more examples.
Edit db.yml
and add your projects with their service(s). Any service that is given an image:
prop will be deployed with docker compose
.
Example:
projects:
...
- description: whoami service
name: whoami
services:
- image: traefik/whoami:latest
ingress:
- domain: whoami.example.com
host: web
Run bin/apply.py
to write all artifacts and deploy/update relevant docker stacks.
Add a service with ingress and set passthrough: true
.
Example:
projects:
...
- description: Home Assistant passthrough
enabled: true
name: home-assistant
services:
- ingress:
- domain: home.example.com
passthrough: true
port: 443
host: 192.168.1.111
If you also need port 80 to listen for http challenges for your endpoint (home-assistant may do its own), then you may also add:
...
- ingress:
...
- domain: home.example.com
passthrough: true
path_prefix: /.well-known/acme-challenge/
port: 80
(Port 80 is disallowed for any other other cases.)
Add a service with ingress and set router: tcp
.
Example:
projects:
...
- description: Minio service
name: minio
services:
- command: server --console-address ":9001" /data
env:
MINIO_ROOT_USER: root
MINIO_ROOT_PASSWORD: xx
host: app
image: minio/minio:latest
ingress:
- domain: minio-api.example.com
port: 9000
router: tcp
- domain: minio-ui.example.com
port: 9001
volumes:
- /data
You can expose an existing service that is already running on the host by creating a service:
- without an
image
prop - targeting the host from within docker
- configuring it's ingress
Example:
projects:
...
- description: itsUP API running on the host
name: itsUP
services:
- ingress:
- domain: itsup.example.com
port: 8888
host: 172.17.0.1 # change this to host.docker.internal when on Docker Desktop
One can add additional docker properties to a service by adding them to the additional_properties
dictionary:
additional_properties:
cpus: 0.1
The following docker service properties exist at the service root level and MUST NOT be added via additional_properties
:
- command
- depends_on
- env
- image
- port
- name
- restart
- volumes
(Also see lib/models.py
)
You can enable and configure plugins in db.yml
. Right now we support the following:
CrowdSec can run as a container via plugin crowdsec-bouncer-traefik-plugin.
Step 1: generate api key
First set enable: true
, run bin/write-artifacts.py
, and bring up the crowdsec
container:
docker compose up -d crowdsec
Now we can execute the command to get the key:
docker compose exec crowdsec cscli bouncers add crowdsecBouncer
Put the resulting api key in the plugins.crowdsec.apikey
configuration in db.yml
and apply with bin/apply.py
.
Crowdsec is now running and wired up, but does not use any blocklists yet. Those can be managed manually, but preferable is to become part of the community by creating an account with CrowdSec to get access and contribute to the community blocklists, as well as view results in your account's dashboards.
Step 2: connect your instance with the CrowdSec console
After creating an account create a machine instance in the console, and register the enrollment key in your stack:
docker compose exec crowdsec cscli console enroll ${enrollment key}
Step 3: subscribe to 3rd party blocklists
In the security-engines section select the "Blocklists" of your engine and choose some blocklists of interest. Example:
- Free proxies list
- Firehol SSL proxies list
- Firehol cruzit.com list
The API allows openapi compatible clients to do management on this stack (ChatGPT works wonders).
Generate the spec with api/extract-openapi.py
.
All endpoints do auth and expect an incoming Bearer token to be set to .env/API_KEY
.
Exception: Only github webhook endpoints (check for annotation @app.hooks.register(...
) get it from the github_secret
header.
Webhooks are used for the following:
- to receive updates to this repo, which will result in a
git pull
andbin/apply.py
to update any changes in the code. The provided project withname: itsUP
is used for that, so DON'T delete it if you care about automated updates to this repo. - to receive incoming github webhooks (or GET requests to
/update-upstream?project=bla&service=dida
) that result in rolling up of a project or specific service only.
One GitHub webhook listening to workflow_job
s is provided, which needs:
- the hook you will register in the github project to end with
/hook?project=bla&service=dida
(service
optional), and thegithub_secret
set to.env/API_KEY
.
I mainly use GitHub workflows and created webhooks for my individual projects, so I can just manage all webhooks in one place.
NOTE:
When using crowdsec this webhook is probably not coming in as it exits the Azure cloud (public IP range), which is also host to many malicious actors that spin up ephemeral intrusion tools. To still receive signals from github you can use a vpn setup as the one used in this repo (check .github/workflows/test.yml
).
This setup contains a project called "vpn" which runs an openvpn service that gives ssh access. To bootstrap it:
dcu vpn run vpn-openvpn ovpn_genconfig -u udp4://vpn.itsup.example.com
dcu vpn run vpn-openvpn ovpn_initpki
Save the signing passphrase you created.
export CLIENTNAME='github'
dcu vpn run vpn-openvpn easyrsa build-client-full $CLIENTNAME
Save the client passphrase you created as it will be used for OVPN_PASSWORD
below.
dcu vpn run vpn-openvpn ovpn_getclient $CLIENTNAME combined > .github/workflows/client.ovpn
IMPORTANT: Now change udp
to udp4
in the remote: ...
line to target UDP with IPv4 as docker is still not there.
Test access (expects local openvpn
installed):
sudo openvpn .github/workflows/client.ovpn
Now save the $OVPN_USER_KEY
from client.ovpn
's <key>$OVPN_USER_KEY</key>
and remove the <key>...</key>
.
Also save the $OVPN_TLS_AUTH_KEY
from <tls-auth...
section and remove it.
Add the secrets to your github repo
OVPN_USERNAME
:github
OVPN_PASSWORD
: the client passphraseOVPN_USER_KEY
OVPN_TLS_AUTH_KEY
In order for ssh access by github, create a private key and add the pub part to the authorized_keys
on the host:
ssh-keygen -t ed25519 -C "[email protected]"
cat ~/.ssh/id_ed25519.pub >> ~/.ssh/authorized_keys
Add the secrets to GitHub:
SERVER_HOST
: the hostname of this repo's api serverSERVER_USERNAME
: the username that has access to your host's ssh serverSSH_PRIVATE_KEY
: the private key of the user
Now we can start the server and expect all to work ok.
If you wish to revoke a cert or do something else, please visit this page: kylemanna/docker-openvpn/blob/master/docs/docker-compose.md
As you may have noted there is a lot of functionality based on Nginx in this repo. I started out using their proxy, but later on ran into the problem of their engine not picking up upstream changes, learning that only the paid Nginx+ does that. I heavily relied on kubernetes in the past years and such was not an issue in their ingress-NGINX
controller. When I found that Traefik does not suffer this, AND manages letsencrypt certs gracefully, AND gives us label based L7 functionality (like in Kubernetes), I decided to integrate that instead. Weary about its performance though, I intended to keep both approaches side by side. The Nginx part is not working anymore, but I left the code for others to see how one can overcome certain problems in that ecosystem. If one would like to use Nginx for some reason (it is about 40% faster), it is very easy to switch back. But be aware it implies hooking up the hacky bin/update-certs.py
script to a cron tab for automatic cert rotation.
In the future we might consider expanding this setup to use docker swarm, as it should be easy to do. For now we like to keep it simple.
Don't blame this infra automation tooling for anything going wrong inside your containers!
I suggest you repeat that mantra now and then and question yourself when things go wrong: where lies the problem?