diff --git a/docs/maintenance/holodeck/update-mcp-awareness.md b/docs/maintenance/holodeck/update-mcp-awareness.md
index 51c9e628..94ad491d 100644
--- a/docs/maintenance/holodeck/update-mcp-awareness.md
+++ b/docs/maintenance/holodeck/update-mcp-awareness.md
@@ -1,53 +1,65 @@
 <!-- SPDX-License-Identifier: AGPL-3.0-or-later | Copyright (C) 2026 Chris Means -->
 # Update MCP Awareness on Holodeck
 
-Manual deployment steps for updating the mcp-awareness service on the holodeck Proxmox host (CT 201 — `awareness-app`).
+The mcp-awareness service runs on two app nodes (CT 210, CT 211) behind an HAProxy load balancer (CT 203). Updates are deployed using the zero-downtime deploy script.
 
 ## Prerequisites
 
-- SSH access to holodeck (`192.168.200.70`)
-- Root access on CT 201 (`awareness-app`, `192.168.200.101`)
+- SSH access to holodeck and all CTs (via `~/.ssh/config` aliases)
+- The deploy script at `scripts/holodeck/deploy.sh`
 
-## Steps
+## Deploying Updates
 
-### 1. SSH into the container
+### Code-only updates (zero-downtime)
 
 ```bash
-ssh root@192.168.200.101
+scripts/holodeck/deploy.sh hot
 ```
 
-### 2. Pull latest code
+This performs a rolling update: drains each node from HAProxy, pulls latest code, installs, restarts the service, waits for health check, then re-enables. One node is always serving traffic.
 
-```bash
-git config --global --add safe.directory /opt/mcp-awareness
-cd /opt/mcp-awareness
-git pull origin main
-```
+**Note:** Active MCP sessions on the restarting node will get "Session terminated" errors. Clients need to reconnect. See issues #161–#163 for planned improvements.
 
-### 3. Install updated package
+### Updates with migrations or config changes
 
 ```bash
-/opt/mcp-awareness/venv/bin/pip install -e .
+scripts/holodeck/deploy.sh maintenance
 ```
 
-### 4. Add any new environment variables
+This drains all nodes, runs Alembic migrations on the first node, then updates and restarts all nodes. There is a brief service interruption during migration.
+
+### Adding new environment variables
 
-If the release includes new env vars, append them to the env file:
+If a release requires new env vars, update the env file on both app nodes before deploying:
 
 ```bash
-nano /etc/awareness/env
+ssh awareness-app-a 'nano /etc/awareness/env'
+ssh awareness-app-b 'nano /etc/awareness/env'
 ```
 
-### 5. Restart the service
+## Verification
+
+After deploy, verify via HAProxy:
 
 ```bash
-systemctl restart mcp-awareness
+curl -s http://192.168.200.103:8420/health | python3 -m json.tool
 ```
 
-### 6. Verify
+Or check both backends directly:
 
 ```bash
-curl -s localhost:8420/health | python3 -m json.tool
+curl -s http://192.168.200.110:8420/health | python3 -m json.tool
+curl -s http://192.168.200.111:8420/health | python3 -m json.tool
 ```
 
-Confirm `status: ok` and expected uptime (should be a few seconds).
+## Architecture
+
+See `docs/superpowers/specs/2026-04-02-zero-downtime-deployment-design.md` for the full design spec.
+
+| Component | Host | IP |
+|-----------|------|----|
+| HAProxy (load balancer) | CT 203 `awareness-lb` | 192.168.200.103 |
+| App node A | CT 210 `awareness-app-a` | 192.168.200.110 |
+| App node B | CT 211 `awareness-app-b` | 192.168.200.111 |
+| Postgres | CT 200 `awareness-pg` | 192.168.200.100 |
+| Cloudflare tunnel | CT 202 `awareness-tunnel` | 192.168.200.102 |
diff --git a/docs/superpowers/plans/2026-04-02-zero-downtime-deployment.md b/docs/superpowers/plans/2026-04-02-zero-downtime-deployment.md
index 54d20e96..61de0a2f 100644
--- a/docs/superpowers/plans/2026-04-02-zero-downtime-deployment.md
+++ b/docs/superpowers/plans/2026-04-02-zero-downtime-deployment.md
@@ -71,7 +71,7 @@ each CT. The provisioning script (Task 2) handles this for new CTs.
 
 **Where:** `[holodeck]`
 
-- [ ] **Step 1: Identify the Debian 12 template**
+- [x] **Step 1: Identify the Debian 12 template**
 
 ```bash
 pveam list local | grep debian-12
@@ -79,39 +79,39 @@ pveam list local | grep debian-12
 
 Expected: Shows a `debian-12-standard_*.tar.zst` template. Note the exact filename.
 
-- [ ] **Step 2: Create the LXC**
+- [x] **Step 2: Create the LXC**
 
 ```bash
-pct create 203 local:vztmpl/debian-12-standard_12.12-1_amd64.tar.zst --hostname awareness-lb --cores 1 --memory 256 --swap 128 --rootfs local-lvm:4 --net0 name=eth0,bridge=vmbr0,ip=192.168.200.103/24,gw=192.168.200.1 --nameserver 192.168.200.1 --unprivileged 1 --features nesting=0 --start 0 --password
+pct create 203 local:vztmpl/debian-12-standard_12.12-1_amd64.tar.zst --hostname awareness-lb --cores 1 --memory 256 --swap 128 --rootfs local-lvm:4 --net0 name=eth0,bridge=vmbr0,ip=192.168.200.103/24,gw=192.168.200.1 --nameserver 192.168.200.10 --unprivileged 1 --features nesting=0 --start 0 --password
 ```
 
 **[USER]** Set root password, store in KeePass.
 
 Adjust the template filename if it differs from step 1.
 
-- [ ] **Step 3: Start and enter CT 203**
+- [x] **Step 3: Start and enter CT 203**
 
 ```bash
 pct start 203
 pct enter 203
 ```
 
-- [ ] **Step 4: Update base system**
+- [x] **Step 4: Update base system**
 
 ```bash
 apt update && apt upgrade -y
 ```
 
-- [ ] **Step 5: Install HAProxy and socat**
+- [x] **Step 5: Install HAProxy, socat, and curl**
 
 ```bash
-apt install -y haproxy socat
+apt install -y haproxy socat curl
 haproxy -v
 ```
 
 Expected: HAProxy version 2.6+ (Debian 12 ships 2.6.x).
 
-- [ ] **Step 6: Install openssh-server and push SSH key**
+- [x] **Step 6: Install openssh-server and push SSH key**
 
 ```bash
 apt install -y openssh-server
@@ -131,7 +131,7 @@ ssh root@192.168.200.103 hostname
 
 Expected: `awareness-lb`
 
-- [ ] **Step 7: Configure HAProxy**
+- [x] **Step 7: Configure HAProxy**
 
 Create `/etc/haproxy/haproxy.cfg`:
 
@@ -167,6 +167,7 @@ backend awareness-backend
     http-check expect status 200
     stick-table type string len 64 size 10k expire 30m
     stick on req.hdr(mcp-session-id) if { req.hdr(mcp-session-id) -m found }
+    stick store-response res.hdr(mcp-session-id) if { res.hdr(mcp-session-id) -m found }
     server app-a 192.168.200.110:8420 check inter 5s fall 3 rise 2
     server app-b 192.168.200.111:8420 check inter 5s fall 3 rise 2
 
@@ -181,14 +182,14 @@ EOF
 
 **[USER]** Change the stats password (`admin:haproxy-stats`) to something from KeePass.
 
-- [ ] **Step 8: Create runtime socket directory**
+- [x] **Step 8: Create runtime socket directory**
 
 ```bash
 mkdir -p /var/run/haproxy
 chown haproxy:haproxy /var/run/haproxy
 ```
 
-- [ ] **Step 9: Validate config and restart**
+- [x] **Step 9: Validate config and restart**
 
 ```bash
 haproxy -c -f /etc/haproxy/haproxy.cfg
@@ -199,7 +200,7 @@ systemctl status haproxy
 
 Expected: Config valid, service active. Backends will show as DOWN until app LXCs are provisioned.
 
-- [ ] **Step 10: Verify stats page**
+- [x] **Step 10: Verify stats page**
 
 From workstation:
 ```bash
@@ -216,7 +217,7 @@ Expected: Non-zero (stats page is serving, shows backend names).
 
 This script automates creating new app LXCs with all operational fixes applied.
 
-- [ ] **Step 1: Create the provisioning script**
+- [x] **Step 1: Create the provisioning script**
 
 Create `scripts/holodeck/create-app-ct.sh`:
 
@@ -273,7 +274,7 @@ pct create "$CT_ID" "$TEMPLATE" \
     --swap 256 \
     --rootfs local-lvm:8 \
     --net0 "name=eth0,bridge=vmbr0,ip=${IP}/24,gw=192.168.200.1" \
-    --nameserver 192.168.200.1 \
+    --nameserver 192.168.200.10 \
     --unprivileged 1 \
     --features nesting=0 \
     --start 1 \
@@ -283,7 +284,7 @@ echo "Waiting for container to start..."
 sleep 5
 
 echo "Installing base packages..."
-pct exec "$CT_ID" -- bash -c "apt update -qq && apt install -y -qq openssh-server python3 python3-pip python3-venv python3-dev git build-essential libpq-dev > /dev/null 2>&1"
+pct exec "$CT_ID" -- bash -c "apt update -qq && apt install -y -qq openssh-server sudo python3 python3-pip python3-venv python3-dev git build-essential libpq-dev curl > /dev/null 2>&1"
 
 echo "Configuring SSH..."
 pct exec "$CT_ID" -- bash -c "mkdir -p /root/.ssh && chmod 700 /root/.ssh"
@@ -341,13 +342,13 @@ echo "  2. Start service: pct exec ${CT_ID} -- systemctl start mcp-awareness"
 echo "  3. Verify health: curl -s http://${IP}:8420/health | python3 -m json.tool"
 ```
 
-- [ ] **Step 2: Make executable**
+- [x] **Step 2: Make executable**
 
 ```bash
 chmod +x scripts/holodeck/create-app-ct.sh
 ```
 
-- [ ] **Step 3: Commit**
+- [x] **Step 3: Commit**
 
 ```bash
 git add scripts/holodeck/create-app-ct.sh
@@ -360,21 +361,21 @@ git commit -m "infra: add app LXC provisioning script for holodeck"
 
 **Where:** `[holodeck]`
 
-- [ ] **Step 1: Copy SSH key to holodeck**
+- [x] **Step 1: Copy SSH key to holodeck**
 
 From workstation:
 ```bash
 scp ~/.ssh/id_ed25519.pub root@192.168.200.70:/tmp/awareness-ssh-key.pub
 ```
 
-- [ ] **Step 2: Copy provisioning script to holodeck**
+- [x] **Step 2: Copy provisioning script to holodeck**
 
 From workstation:
 ```bash
 scp scripts/holodeck/create-app-ct.sh root@192.168.200.70:/tmp/create-app-ct.sh
 ```
 
-- [ ] **Step 3: Run provisioning script**
+- [x] **Step 3: Run provisioning script**
 
 From holodeck:
 ```bash
@@ -385,14 +386,14 @@ bash /tmp/create-app-ct.sh 210 110 awareness-app-a
 
 Expected: Script completes with "CT 210 (awareness-app-a) provisioned at 192.168.200.110."
 
-- [ ] **Step 4: Copy env file from CT 201**
+- [x] **Step 4: Copy env file from CT 201**
 
 From holodeck:
 ```bash
 pct exec 201 -- cat /etc/awareness/env | pct exec 210 -- bash -c 'cat > /etc/awareness/env && chmod 600 /etc/awareness/env'
 ```
 
-- [ ] **Step 5: Start the service**
+- [x] **Step 5: Start the service**
 
 ```bash
 pct exec 210 -- systemctl start mcp-awareness
@@ -401,7 +402,7 @@ pct exec 210 -- systemctl status mcp-awareness
 
 Expected: Active (running).
 
-- [ ] **Step 6: Verify health**
+- [x] **Step 6: Verify health**
 
 From holodeck:
 ```bash
@@ -410,7 +411,7 @@ curl -s http://192.168.200.110:8420/health | python3 -m json.tool
 
 Expected: `{"status": "ok", ...}`
 
-- [ ] **Step 7: Verify SSH from workstation**
+- [x] **Step 7: Verify SSH from workstation**
 
 From workstation:
 ```bash
@@ -419,7 +420,7 @@ ssh root@192.168.200.110 hostname
 
 Expected: `awareness-app-a`
 
-- [ ] **Step 8: Verify CLI tools**
+- [x] **Step 8: Verify CLI tools**
 
 ```bash
 ssh root@192.168.200.110 mcp-awareness-user list
@@ -435,7 +436,7 @@ Expected: Shows user list (may fail if env isn't sourced — the CLI tools need
 
 Repeat Task 3 with different parameters.
 
-- [ ] **Step 1: Run provisioning script**
+- [x] **Step 1: Run provisioning script**
 
 From holodeck:
 ```bash
@@ -444,13 +445,13 @@ bash /tmp/create-app-ct.sh 211 111 awareness-app-b
 
 **[USER]** Set root password when prompted, store in KeePass.
 
-- [ ] **Step 2: Copy env file from CT 201**
+- [x] **Step 2: Copy env file from CT 201**
 
 ```bash
 pct exec 201 -- cat /etc/awareness/env | pct exec 211 -- bash -c 'cat > /etc/awareness/env && chmod 600 /etc/awareness/env'
 ```
 
-- [ ] **Step 3: Start and verify**
+- [x] **Step 3: Start and verify**
 
 ```bash
 pct exec 211 -- systemctl start mcp-awareness
@@ -459,7 +460,7 @@ curl -s http://192.168.200.111:8420/health | python3 -m json.tool
 
 Expected: `{"status": "ok", ...}`
 
-- [ ] **Step 4: Verify SSH from workstation**
+- [x] **Step 4: Verify SSH from workstation**
 
 ```bash
 ssh root@192.168.200.111 hostname
@@ -475,7 +476,7 @@ Expected: `awareness-app-b`
 
 Both app nodes should now be visible and healthy in HAProxy.
 
-- [ ] **Step 1: Check HAProxy stats**
+- [x] **Step 1: Check HAProxy stats**
 
 ```bash
 curl -s -u admin:haproxy-stats http://192.168.200.103:8421/\;csv | grep -E "app-a|app-b" | cut -d, -f1,2,18
@@ -483,7 +484,7 @@ curl -s -u admin:haproxy-stats http://192.168.200.103:8421/\;csv | grep -E "app-
 
 Expected: Both `app-a` and `app-b` show status `UP`.
 
-- [ ] **Step 2: Test traffic routing**
+- [x] **Step 2: Test traffic routing**
 
 Send a request through HAProxy and verify it reaches an app node:
 
@@ -493,7 +494,7 @@ curl -s http://192.168.200.103:8420/health | python3 -m json.tool
 
 Expected: `{"status": "ok", ...}` — response came from one of the app nodes via HAProxy.
 
-- [ ] **Step 3: Test session stickiness**
+- [x] **Step 3: Test session stickiness**
 
 Initialize an MCP session through HAProxy and verify subsequent requests go to the same backend:
 
@@ -511,7 +512,7 @@ curl -s http://192.168.200.103:8420/mcp -X POST -H "Content-Type: application/js
 
 Expected: Returns data (the session was routed to the same backend that created it).
 
-- [ ] **Step 4: Test connection draining**
+- [x] **Step 4: Test connection draining**
 
 Set app-a to drain and verify new requests go to app-b:
 
@@ -537,7 +538,7 @@ ssh root@192.168.200.103 'echo "set server awareness-backend/app-a state ready"
 
 **Where:** `[laptop]`
 
-- [ ] **Step 1: Create the deploy script**
+- [x] **Step 1: Create the deploy script**
 
 Create `scripts/holodeck/deploy.sh`:
 
@@ -669,16 +670,17 @@ maintenance_deploy() {
     done
 
     echo ""
-    echo "Step 2: All nodes drained. Running migration..."
+    echo "Step 2: Updating first node and running migration..."
     local first_ip
     first_ip=$(node_ip "${APP_NODES[0]}")
-    ssh "root@${first_ip}" 'cd /opt/mcp-awareness && git pull origin main && venv/bin/pip install -e . -q'
-    ssh "root@${first_ip}" 'cd /opt/mcp-awareness && set -a && source /etc/awareness/env && set +a && venv/bin/mcp-awareness-migrate upgrade head'
+    update_node "$first_ip"
+    ssh "root@${first_ip}" 'cd /opt/mcp-awareness && sudo -u awareness bash -c "set -a && source /etc/awareness/env && set +a && /opt/mcp-awareness/venv/bin/mcp-awareness-migrate upgrade head"'
     echo "  Migration complete on ${first_ip}"
+    wait_healthy "$first_ip" || echo "  WARNING: ${first_ip} not healthy after migration"
 
     echo ""
-    echo "Step 3: Updating and restarting all nodes..."
-    for entry in "${APP_NODES[@]}"; do
+    echo "Step 3: Updating remaining nodes..."
+    for entry in "${APP_NODES[@]:1}"; do
         local ip
         ip=$(node_ip "$entry")
         update_node "$ip"
@@ -714,13 +716,13 @@ case "$MODE" in
 esac
 ```
 
-- [ ] **Step 2: Make executable**
+- [x] **Step 2: Make executable**
 
 ```bash
 chmod +x scripts/holodeck/deploy.sh
 ```
 
-- [ ] **Step 3: Commit**
+- [x] **Step 3: Commit**
 
 ```bash
 git add scripts/holodeck/deploy.sh
@@ -735,7 +737,7 @@ git commit -m "infra: add zero-downtime deploy script (hot + maintenance modes)"
 
 This is the cutover — traffic starts flowing through HAProxy.
 
-- [ ] **Step 1: Verify CT 201 is still serving (fallback ready)**
+- [x] **Step 1: Verify CT 201 is still serving (fallback ready)**
 
 ```bash
 curl -s http://192.168.200.101:8420/health | python3 -m json.tool
@@ -743,7 +745,7 @@ curl -s http://192.168.200.101:8420/health | python3 -m json.tool
 
 Expected: `{"status": "ok", ...}`
 
-- [ ] **Step 2: Update tunnel config**
+- [x] **Step 2: Update tunnel config**
 
 SSH to CT 202 and update the cloudflared config:
 
@@ -765,7 +767,7 @@ ingress:
   - service: http_status:404
 ```
 
-- [ ] **Step 3: Restart cloudflared**
+- [x] **Step 3: Restart cloudflared**
 
 ```bash
 systemctl restart cloudflared
@@ -775,7 +777,7 @@ journalctl -u cloudflared -n 10 --no-pager
 
 Expected: Active, "Connection established", "Registered tunnel connection".
 
-- [ ] **Step 4: Verify end-to-end**
+- [x] **Step 4: Verify end-to-end**
 
 From workstation, test via the public URL:
 
@@ -796,14 +798,14 @@ Expected: Health response (or auth challenge, depending on mount path config).
 
 **Where:** `[holodeck]`
 
-- [ ] **Step 1: Create resource pool**
+- [x] **Step 1: Create resource pool**
 
 ```bash
 pvesh create /pools --poolid awareness
 pvesh set /pools/awareness --vms 200,202,203,210,211
 ```
 
-- [ ] **Step 2: Set boot order for new containers**
+- [x] **Step 2: Set boot order for new containers**
 
 ```bash
 pct set 203 --onboot 1 --startup order=2,up=5
@@ -813,7 +815,7 @@ pct set 211 --onboot 1 --startup order=3,up=15
 
 Boot order: CT 200 (Postgres, order=1) → CT 203 (HAProxy, order=2, 5s delay) → CT 210+211 (apps, order=3, 15s delay for Postgres) → CT 202 (tunnel, order=4).
 
-- [ ] **Step 2b: Verify CT 202 boot order**
+- [x] **Step 2b: Verify CT 202 boot order**
 
 CT 202 (tunnel) must boot after HAProxy (CT 203) and the app nodes, otherwise cloudflared starts before its upstream is available.
 
@@ -827,27 +829,29 @@ If order is not set or is lower than 4, fix it:
 pct set 202 --onboot 1 --startup order=4,up=5
 ```
 
-- [ ] **Step 3: Update snapshot script**
-
-Edit `/usr/local/bin/awareness-snapshots.sh` on holodeck:
+- [x] **Step 3: Create backup script**
 
-Change:
-```bash
-for ct in 200 201 202; do
-```
+The plan originally called for `pct snapshot`, but local-lvm storage doesn't support
+container snapshots. Created `/usr/local/bin/awareness-snapshots.sh` using `vzdump`
+instead:
 
-To:
 ```bash
+#!/usr/bin/env bash
+set -euo pipefail
 for ct in 200 202 203 210 211; do
+    echo "Backing up CT ${ct}..."
+    vzdump "$ct" --mode snapshot --compress zstd --storage local
+done
+echo "Done."
 ```
 
-- [ ] **Step 4: Verify snapshot script**
+- [x] **Step 4: Verify backup script**
 
 ```bash
 bash /usr/local/bin/awareness-snapshots.sh
 ```
 
-Expected: Creates snapshots for all 5 containers.
+Expected: Creates vzdump backups for all 5 containers. Verified — ~1.9GB total (600MB pg, 214MB tunnel, 199MB lb, 436MB app-a, 435MB app-b).
 
 ---
 
@@ -855,7 +859,7 @@ Expected: Creates snapshots for all 5 containers.
 
 **Where:** `[laptop]`
 
-- [ ] **Step 1: Update SSH config**
+- [x] **Step 1: Update SSH config**
 
 Replace the existing `holodeck` entry and add the new CT aliases in `~/.ssh/config`.
 Use the config from the **Remote access** section in Conventions above — it uses
@@ -866,7 +870,7 @@ Key changes:
 - `holodeck` IdentityFile changes from `id_ed25519_github` to `id_ed25519`
 - New entries for `awareness-lb`, `awareness-app-a`, `awareness-app-b` with ProxyJump
 
-- [ ] **Step 2: Verify from workstation**
+- [x] **Step 2: Verify from workstation**
 
 ```bash
 ssh holodeck hostname
@@ -883,7 +887,7 @@ Expected: `holodeck`, `awareness-lb`, `awareness-app-a`, `awareness-app-b`
 
 **Where:** `[laptop]`
 
-- [ ] **Step 1: Run a hot deploy**
+- [x] **Step 1: Run a hot deploy**
 
 ```bash
 scripts/holodeck/deploy.sh hot
@@ -910,7 +914,7 @@ Expected output:
 === Hot deploy complete ===
 ```
 
-- [ ] **Step 2: Verify service is healthy after deploy**
+- [x] **Step 2: Verify service is healthy after deploy**
 
 ```bash
 curl -s http://192.168.200.103:8420/health | python3 -m json.tool
@@ -918,7 +922,7 @@ curl -s http://192.168.200.103:8420/health | python3 -m json.tool
 
 Expected: `{"status": "ok", ...}`
 
-- [ ] **Step 3: Verify from Claude Desktop**
+- [x] **Step 3: Verify from Claude Desktop**
 
 **[USER]** Call `get_briefing` from Claude Desktop. Should work without reconnecting — existing sessions should have survived (they were drained, not killed).
 
@@ -960,7 +964,7 @@ Remove CT 201 from any snapshot scripts or resource pools if it was added.
 
 **Where:** `[laptop]`
 
-- [ ] **Step 1: Update maintenance guide**
+- [x] **Step 1: Update maintenance guide**
 
 Update `docs/maintenance/holodeck/update-mcp-awareness.md` to reference the deploy script instead of manual steps:
 
@@ -984,11 +988,11 @@ scripts/holodeck/deploy.sh maintenance
 See `docs/superpowers/specs/2026-04-02-zero-downtime-deployment-design.md` for details.
 ```
 
-- [ ] **Step 2: Update deployment design spec topology diagram**
+- [x] **Step 2: Update deployment design spec topology diagram**
 
 Update `docs/superpowers/specs/2026-04-01-holodeck-deployment-design.md` topology section to reflect the new architecture (HAProxy + app pool instead of single CT 201).
 
-- [ ] **Step 3: Commit**
+- [x] **Step 3: Commit**
 
 ```bash
 git add docs/
diff --git a/docs/superpowers/specs/2026-04-01-holodeck-deployment-design.md b/docs/superpowers/specs/2026-04-01-holodeck-deployment-design.md
index 6c0d7d1e..2ff350a1 100644
--- a/docs/superpowers/specs/2026-04-01-holodeck-deployment-design.md
+++ b/docs/superpowers/specs/2026-04-01-holodeck-deployment-design.md
@@ -31,31 +31,46 @@ Holodeck is a Proxmox VE host with 40 Xeon threads, 128GB RAM, 2x Quadro P4000 G
 ## Topology
 
 ```
-┌──────────────────────────────────────────────────────────────┐
-│  holodeck (192.168.200.70)  ·  Proxmox VE 8.4.1             │
-│                                                              │
-│  ┌──────────────────┐  ┌──────────────────┐  ┌────────────┐ │
-│  │ CT 200           │  │ CT 201           │  │ CT 202     │ │
-│  │ awareness-pg     │  │ awareness-app    │  │ awareness- │ │
-│  │ 192.168.200.100  │  │ 192.168.200.101  │  │ tunnel     │ │
-│  │                  │  │                  │  │ 192.168.   │ │
-│  │ Postgres 17      │  │ mcp-awareness    │  │ 200.102    │ │
-│  │ pgvector         │  │ (pip from git)   │  │            │ │
-│  │ PostGIS          │  │ systemd service  │  │ cloudflared│ │
-│  │ pg_stat_stmts    │  │                  │  │ systemd    │ │
-│  │                  │  │ OAuth enabled    │  │            │ │
-│  │ :5432 LAN        │  │ :8420            │  │ → CF tunnel│ │
-│  └──────────────────┘  └──────────────────┘  └────────────┘ │
-│         ↑                    ↑        ↑            │        │
-│         │ pg connect         │        │ embed      │        │
-│         └────────────────────┘        │            │        │
-│                                       │            │        │
-│  Ollama (bare metal, 2x P4000)  ◄─────┘            │        │
-│  :11434                                             │        │
-│                                                     │        │
-│  Internet → Cloudflare → staging.mcpawareness.com ──┘        │
-│             → CT 202 tunnel → CT 201 awareness               │
-└──────────────────────────────────────────────────────────────┘
+┌─────────────────────────────────────────────────────────────────────────┐
+│  holodeck (192.168.200.70)  ·  Proxmox VE 8.4.1                        │
+│                                                                         │
+│  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐      │
+│  │ CT 200           │  │ CT 203           │  │ CT 202           │      │
+│  │ awareness-pg     │  │ awareness-lb     │  │ awareness-tunnel │      │
+│  │ 192.168.200.100  │  │ 192.168.200.103  │  │ 192.168.200.102  │      │
+│  │                  │  │                  │  │                  │      │
+│  │ Postgres 17      │  │ HAProxy 2.6      │  │ cloudflared      │      │
+│  │ pgvector         │  │ session sticky   │  │ systemd          │      │
+│  │ PostGIS          │  │ :8420 frontend   │  │ → CF tunnel      │      │
+│  │ pg_stat_stmts    │  │ :8421 stats      │  │                  │      │
+│  │ :5432 LAN        │  │                  │  │                  │      │
+│  └──────────────────┘  └──────────────────┘  └──────────────────┘      │
+│         ↑                  │          │              │                  │
+│         │ pg connect       │          │              │                  │
+│         │        ┌─────────┘          └─────────┐    │                  │
+│         │        ↓                              ↓    │                  │
+│  ┌──────────────────┐              ┌──────────────────┐                │
+│  │ CT 210           │              │ CT 211           │                │
+│  │ awareness-app-a  │              │ awareness-app-b  │                │
+│  │ 192.168.200.110  │              │ 192.168.200.111  │                │
+│  │                  │              │                  │                │
+│  │ mcp-awareness    │              │ mcp-awareness    │                │
+│  │ (pip from git)   │              │ (pip from git)   │                │
+│  │ systemd service  │              │ systemd service  │                │
+│  │ OAuth enabled    │              │ OAuth enabled    │                │
+│  │ :8420            │              │ :8420            │                │
+│  └──────────────────┘              └──────────────────┘                │
+│         ↑        ↑                        ↑        ↑                   │
+│         │        │ embed                  │        │ embed             │
+│         │        │                        │        │                   │
+│  Ollama (bare metal, 2x P4000)  ◄─────────┘────────┘                   │
+│  :11434                                                                 │
+│                                                                         │
+│  Internet → Cloudflare → staging.mcpawareness.com ──→ CT 202 tunnel    │
+│             → CT 203 HAProxy → CT 210/211 (round-robin, sticky)        │
+└─────────────────────────────────────────────────────────────────────────┘
+
+CT 201 (awareness-app, 192.168.200.101) — decommissioned, replaced by CT 210/211.
 
 Synology NAS "Seska" (192.168.200.52)
   └─ /volume1/awareness-backups (encrypted, NFS, 10GB quota)
@@ -93,20 +108,40 @@ Synology NAS "Seska" (192.168.200.52)
 192.168.200.52:/volume1/awareness-backups /mnt/backup nfs rw,hard,intr 0 0
 ```
 
-## CT 201 — Awareness App (`awareness-app`)
+## CT 203 — HAProxy Load Balancer (`awareness-lb`)
 
 ### Provisioning
-- CT ID: 201
-- Hostname: `awareness-app`
+- CT ID: 203
+- Hostname: `awareness-lb`
 - Template: Debian 12
-- Static IP: `192.168.200.101/24`, gateway `192.168.200.1`
-- Resources: 1 CPU core, 512MB RAM, 8GB disk (`local-lvm`)
+- Static IP: `192.168.200.103/24`, gateway `192.168.200.1`
+- Resources: 1 CPU core, 256MB RAM, 4GB disk (`local-lvm`)
+
+### Software
+- HAProxy 2.6 (Debian 12 repos), socat, curl
+
+### Configuration
+- Frontend: `:8420` → backend pool
+- Backend: round-robin with `mcp-session-id` header stickiness (request + response)
+- Health checks: `GET /health` every 5s
+- Stats: `:8421` (admin-only)
+- Session stick table: captures `mcp-session-id` from both request headers and server response headers to maintain MCP session affinity
+
+## CT 210/211 — Awareness App Pool (`awareness-app-a`, `awareness-app-b`)
+
+### Provisioning
+- CT IDs: 210, 211
+- Hostnames: `awareness-app-a`, `awareness-app-b`
+- Template: Debian 12
+- Static IPs: `192.168.200.110/24`, `192.168.200.111/24`, gateway `192.168.200.1`
+- Resources: 1 CPU core, 512MB RAM, 8GB disk (`local-lvm`) each
+- Provisioned via `scripts/holodeck/create-app-ct.sh`
 
 ### Software
-- Python 3.12 (Debian repos or deadsnakes PPA)
+- Python 3.11 (Debian repos)
 - Clone `cmeans/mcp-awareness` from GitHub
 - `pip install -e .` (editable install from source)
-- Alembic migrations run on first deploy
+- Alembic migrations run on first deploy (from one node only)
 
 ### Runtime — systemd service (`mcp-awareness.service`)
 ```ini
@@ -148,12 +183,9 @@ AWARENESS_OLLAMA_URL=http://192.168.200.70:11434
 ```
 
 ### Updates
-```bash
-cd /opt/mcp-awareness
-git pull
-pip install -e .
-sudo systemctl restart mcp-awareness
-```
+Deployed via `scripts/holodeck/deploy.sh`:
+- `deploy.sh hot` — rolling zero-downtime update (drain → update → health check → re-enable, one node at a time)
+- `deploy.sh maintenance` — full stop, migrate, restart (brief downtime for schema changes)
 
 ### User provisioning
 Pre-provision Chris's user before first OAuth login:
@@ -176,17 +208,16 @@ This tests the flow where OAuth login finds an existing user by email match.
 
 ### Runtime
 - Install as system service: `cloudflared service install`
-- Tunnel config points to `http://192.168.200.101:8420`
+- Tunnel config points to `http://192.168.200.103:8420` (HAProxy)
 - Credentials file copied from laptop (`~/.cloudflared/staging-config.yml` and tunnel JSON)
 
-### Tunnel config update
-The existing staging tunnel config references `http://awareness-oauth:8421` (Docker network). Must update to:
+### Tunnel config
 ```yaml
 tunnel: <tunnel-id>
 credentials-file: /etc/cloudflared/credentials.json
 ingress:
   - hostname: staging.mcpawareness.com
-    service: http://192.168.200.101:8420
+    service: http://192.168.200.103:8420
   - service: http_status:404
 ```
 
@@ -237,7 +268,8 @@ Each LXC has cgroup-enforced resource limits that Proxmox tracks. These metrics
 | LXC | Metric | Cloud equivalent |
 |-----|--------|-----------------|
 | CT 200 (Postgres) | CPU, RAM, disk I/O | RDS/Cloud SQL instance tier |
-| CT 201 (Awareness) | CPU, RAM, request rate | Cloud Run instance sizing |
+| CT 203 (HAProxy) | CPU, RAM, connections | Cloud LB (ALB/Cloud LB) |
+| CT 210/211 (Awareness) | CPU, RAM, request rate | Cloud Run instance sizing |
 | CT 202 (Tunnel) | CPU, RAM, bandwidth | Cloudflare handles this in cloud (free) |
 | Host (Ollama) | GPU util, VRAM, latency | GPU instance or API costs |
 
@@ -264,8 +296,10 @@ The Docker image, Postgres config, and load profiling data all carry forward at
 | holodeck | 192.168.200.70 | Proxmox host, Ollama bare metal |
 | Seska (Synology) | 192.168.200.52 | NAS, backup storage |
 | CT 200 | 192.168.200.100 | Postgres |
-| CT 201 | 192.168.200.101 | Awareness app |
 | CT 202 | 192.168.200.102 | Cloudflare tunnel |
+| CT 203 | 192.168.200.103 | HAProxy load balancer |
+| CT 210 | 192.168.200.110 | Awareness app-a |
+| CT 211 | 192.168.200.111 | Awareness app-b |
 | Laptop | (DHCP) | Fallback production stack |
 
 ## Implementation Order
diff --git a/scripts/holodeck/create-app-ct.sh b/scripts/holodeck/create-app-ct.sh
new file mode 100755
index 00000000..11597142
--- /dev/null
+++ b/scripts/holodeck/create-app-ct.sh
@@ -0,0 +1,119 @@
+#!/usr/bin/env bash
+# mcp-awareness — ambient system awareness for AI agents
+# Copyright (C) 2026 Chris Means
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+# Provision an awareness app LXC on holodeck.
+# Usage: create-app-ct.sh <ct-id> <ip-suffix> <hostname>
+# Example: create-app-ct.sh 210 110 awareness-app-a
+#
+# Run from holodeck host. Requires: pct, a Debian 12 template, and the
+# workstation SSH public key at /tmp/awareness-ssh-key.pub on holodeck.
+set -euo pipefail
+
+CT_ID="${1:?Usage: create-app-ct.sh <ct-id> <ip-suffix> <hostname>}"
+IP_SUFFIX="${2:?Usage: create-app-ct.sh <ct-id> <ip-suffix> <hostname>}"
+HOSTNAME="${3:?Usage: create-app-ct.sh <ct-id> <ip-suffix> <hostname>}"
+IP="192.168.200.${IP_SUFFIX}"
+
+TEMPLATE=$(pveam list local | grep "debian-12-standard" | awk '{print $1}' | head -1)
+if [[ -z "$TEMPLATE" ]]; then
+    echo "Error: No Debian 12 template found. Run: pveam download local debian-12-standard_12.12-1_amd64.tar.zst" >&2
+    exit 1
+fi
+
+SSH_KEY="/tmp/awareness-ssh-key.pub"
+if [[ ! -f "$SSH_KEY" ]]; then
+    echo "Error: SSH public key not found at $SSH_KEY" >&2
+    echo "Copy your workstation key: scp ~/.ssh/id_ed25519.pub holodeck:/tmp/awareness-ssh-key.pub" >&2
+    exit 1
+fi
+
+echo "Creating CT ${CT_ID} (${HOSTNAME}) at ${IP}..."
+
+echo "You will be prompted to set a root password for the container."
+pct create "$CT_ID" "$TEMPLATE" \
+    --hostname "$HOSTNAME" \
+    --cores 1 \
+    --memory 512 \
+    --swap 256 \
+    --rootfs local-lvm:8 \
+    --net0 "name=eth0,bridge=vmbr0,ip=${IP}/24,gw=192.168.200.1" \
+    --nameserver 192.168.200.10 \
+    --unprivileged 1 \
+    --features nesting=0 \
+    --start 1 \
+    --password
+
+echo "Waiting for container to start..."
+sleep 5
+
+echo "Installing base packages..."
+pct exec "$CT_ID" -- bash -c "apt update -qq && apt install -y -qq openssh-server sudo python3 python3-pip python3-venv python3-dev git build-essential libpq-dev curl > /dev/null 2>&1"
+
+echo "Configuring SSH..."
+pct exec "$CT_ID" -- bash -c "mkdir -p /root/.ssh && chmod 700 /root/.ssh"
+pct push "$CT_ID" "$SSH_KEY" /root/.ssh/authorized_keys
+pct exec "$CT_ID" -- bash -c "chmod 600 /root/.ssh/authorized_keys"
+
+echo "Creating awareness user..."
+pct exec "$CT_ID" -- bash -c "useradd --system --create-home --shell /bin/bash awareness"
+
+echo "Cloning repo and installing..."
+# NOTE: HTTPS clone requires the repo to be public, or a deploy key / credential
+# helper configured on the container. If the repo is private, set up a read-only
+# deploy key on each app node before running this script.
+pct exec "$CT_ID" -- bash -c "mkdir -p /opt/mcp-awareness && chown awareness:awareness /opt/mcp-awareness"
+pct exec "$CT_ID" -- bash -c "sudo -u awareness git clone https://github.com/cmeans/mcp-awareness.git /opt/mcp-awareness"
+pct exec "$CT_ID" -- bash -c "sudo -u awareness python3 -m venv /opt/mcp-awareness/venv"
+pct exec "$CT_ID" -- bash -c "sudo -u awareness /opt/mcp-awareness/venv/bin/pip install -e /opt/mcp-awareness"
+
+echo "Creating CLI symlinks..."
+pct exec "$CT_ID" -- bash -c "ln -sf /opt/mcp-awareness/venv/bin/mcp-awareness-token /usr/local/bin/"
+pct exec "$CT_ID" -- bash -c "ln -sf /opt/mcp-awareness/venv/bin/mcp-awareness-user /usr/local/bin/"
+pct exec "$CT_ID" -- bash -c "ln -sf /opt/mcp-awareness/venv/bin/mcp-awareness-secret /usr/local/bin/"
+pct exec "$CT_ID" -- bash -c "ln -sf /opt/mcp-awareness/venv/bin/mcp-awareness-migrate /usr/local/bin/"
+
+echo "Installing systemd service..."
+pct exec "$CT_ID" -- bash -c 'cat > /etc/systemd/system/mcp-awareness.service << SVC
+[Unit]
+Description=MCP Awareness Server
+After=network.target
+
+[Service]
+Type=simple
+User=awareness
+EnvironmentFile=/etc/awareness/env
+ExecStart=/opt/mcp-awareness/venv/bin/mcp-awareness
+Restart=on-failure
+RestartSec=5
+WorkingDirectory=/opt/mcp-awareness
+
+[Install]
+WantedBy=multi-user.target
+SVC'
+pct exec "$CT_ID" -- bash -c "systemctl daemon-reload && systemctl enable mcp-awareness"
+
+echo "Creating env directory (env file must be copied separately)..."
+pct exec "$CT_ID" -- bash -c "mkdir -p /etc/awareness && chmod 700 /etc/awareness"
+
+echo ""
+echo "CT ${CT_ID} (${HOSTNAME}) provisioned at ${IP}."
+echo ""
+echo "Next steps:"
+echo "  1. Copy env file: pct exec ${CT_ID} -- bash -c 'cat > /etc/awareness/env << EOF'"
+echo "     (paste contents from an existing app node or KeePass)"
+echo "  2. Start service: pct exec ${CT_ID} -- systemctl start mcp-awareness"
+echo "  3. Verify health: curl -s http://${IP}:8420/health | python3 -m json.tool"
diff --git a/scripts/holodeck/create-ct200.sh b/scripts/holodeck/create-ct200.sh
new file mode 100755
index 00000000..65274a54
--- /dev/null
+++ b/scripts/holodeck/create-ct200.sh
@@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+# mcp-awareness — ambient system awareness for AI agents
+# Copyright (C) 2026 Chris Means
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+# Create CT 200 — Postgres LXC for awareness.
+# No arguments — creates CT 200 with fixed parameters (IP .100, 2 cores, 2GB RAM).
+# Run on holodeck host: bash create-ct200.sh
+set -euo pipefail
+
+echo "Creating CT 200 (awareness-pg)..."
+pct create 200 local:vztmpl/debian-12-standard_12.12-1_amd64.tar.zst \
+  --hostname awareness-pg \
+  --cores 2 \
+  --memory 2048 \
+  --swap 512 \
+  --rootfs local-lvm:20 \
+  --net0 name=eth0,bridge=vmbr0,ip=192.168.200.100/24,gw=192.168.200.1 \
+  --nameserver 192.168.200.10 \
+  --unprivileged 1 \
+  --features nesting=0 \
+  --start 0 \
+  --password
+
+echo "CT 200 created. Starting..."
+pct start 200
+echo "CT 200 running."
diff --git a/scripts/holodeck/deploy.sh b/scripts/holodeck/deploy.sh
new file mode 100755
index 00000000..4f904a6c
--- /dev/null
+++ b/scripts/holodeck/deploy.sh
@@ -0,0 +1,172 @@
+#!/usr/bin/env bash
+# mcp-awareness — ambient system awareness for AI agents
+# Copyright (C) 2026 Chris Means
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see <https://www.gnu.org/licenses/>.
+
+# Zero-downtime deploy for mcp-awareness on holodeck.
+# Run from any host with SSH access to app nodes and HAProxy.
+# Usage: deploy.sh hot              — rolling code update, zero-downtime
+#        deploy.sh maintenance      — full stop, migrate, restart (scheduled)
+set -euo pipefail
+
+HAPROXY_HOST="192.168.200.103"
+HAPROXY_SOCK="/var/run/haproxy/admin.sock"
+APP_NODES=("192.168.200.110:app-a" "192.168.200.111:app-b")
+DRAIN_TIMEOUT=60
+HEALTH_TIMEOUT=30
+HEALTH_INTERVAL=2
+
+MODE="${1:?Usage: deploy.sh <hot|maintenance>}"
+
+# --- Helpers ---
+
+haproxy_cmd() {
+    ssh "root@${HAPROXY_HOST}" "echo '$1' | socat stdio ${HAPROXY_SOCK}"
+}
+
+node_ip() { echo "${1%%:*}"; }
+node_name() { echo "${1##*:}"; }
+
+drain_node() {
+    local name="$1"
+    echo "  Draining ${name}..."
+    haproxy_cmd "set server awareness-backend/${name} state drain"
+
+    local waited=0
+    while (( waited < DRAIN_TIMEOUT )); do
+        local conns
+        conns=$(haproxy_cmd "show stat" | grep "awareness-backend,${name}," | cut -d, -f5)
+        if [[ "${conns:-0}" == "0" ]]; then
+            echo "  ${name}: all connections drained"
+            return 0
+        fi
+        echo "  ${name}: ${conns} active connections, waiting..."
+        sleep 5
+        waited=$((waited + 5))
+    done
+    echo "  WARNING: ${name} drain timeout (${DRAIN_TIMEOUT}s), proceeding anyway"
+}
+
+enable_node() {
+    local name="$1"
+    haproxy_cmd "set server awareness-backend/${name} state ready"
+    echo "  ${name}: re-enabled"
+}
+
+update_node() {
+    local ip="$1"
+    echo "  Updating ${ip}..."
+    ssh "root@${ip}" 'cd /opt/mcp-awareness && sudo -u awareness git pull origin main && sudo -u awareness venv/bin/pip install -e . -q && systemctl restart mcp-awareness'
+}
+
+wait_healthy() {
+    local ip="$1"
+    local waited=0
+    while (( waited < HEALTH_TIMEOUT )); do
+        if curl -sf "http://${ip}:8420/health" > /dev/null 2>&1; then
+            echo "  ${ip}: healthy"
+            return 0
+        fi
+        sleep "$HEALTH_INTERVAL"
+        waited=$((waited + HEALTH_INTERVAL))
+    done
+    echo "  ERROR: ${ip} failed health check after ${HEALTH_TIMEOUT}s"
+    return 1
+}
+
+# --- Hot deploy (rolling, zero-downtime) ---
+
+hot_deploy() {
+    echo "=== Hot deploy (zero-downtime) ==="
+    for entry in "${APP_NODES[@]}"; do
+        local ip name
+        ip=$(node_ip "$entry")
+        name=$(node_name "$entry")
+
+        echo ""
+        echo "--- ${name} (${ip}) ---"
+        drain_node "$name"
+        update_node "$ip"
+
+        if wait_healthy "$ip"; then
+            enable_node "$name"
+        else
+            echo "  ALERT: ${name} failed health check — leaving drained!"
+            echo "  Manual intervention required."
+            # Continue to next node — don't leave the whole service down
+        fi
+    done
+
+    echo ""
+    echo "=== Hot deploy complete ==="
+}
+
+# --- Maintenance deploy (full stop, migrate, restart) ---
+
+maintenance_deploy() {
+    echo "=== Maintenance deploy ==="
+    echo ""
+
+    # Drain all nodes
+    echo "Step 1: Draining all nodes..."
+    for entry in "${APP_NODES[@]}"; do
+        drain_node "$(node_name "$entry")"
+    done
+
+    echo ""
+    echo "Step 2: Updating first node and running migration..."
+    local first_ip
+    first_ip=$(node_ip "${APP_NODES[0]}")
+    update_node "$first_ip"
+    ssh "root@${first_ip}" 'cd /opt/mcp-awareness && sudo -u awareness bash -c "set -a && source /etc/awareness/env && set +a && /opt/mcp-awareness/venv/bin/mcp-awareness-migrate upgrade head"'
+    echo "  Migration complete on ${first_ip}"
+    wait_healthy "$first_ip" || echo "  WARNING: ${first_ip} not healthy after migration"
+
+    echo ""
+    echo "Step 3: Updating remaining nodes..."
+    for entry in "${APP_NODES[@]:1}"; do
+        local ip
+        ip=$(node_ip "$entry")
+        update_node "$ip"
+        wait_healthy "$ip" || echo "  WARNING: ${ip} not healthy yet"
+    done
+
+    echo ""
+    echo "Step 4: Re-enabling all nodes..."
+    for entry in "${APP_NODES[@]}"; do
+        enable_node "$(node_name "$entry")"
+    done
+
+    echo ""
+    echo "=== Maintenance deploy complete ==="
+}
+
+# --- Main ---
+
+case "$MODE" in
+    hot)
+        hot_deploy
+        ;;
+    maintenance)
+        echo "This will briefly take the service offline for migrations."
+        read -p "Continue? [y/N] " -r
+        [[ $REPLY =~ ^[Yy]$ ]] || exit 0
+        maintenance_deploy
+        ;;
+    *)
+        echo "Usage: deploy.sh <hot|maintenance>" >&2
+        exit 1
+        ;;
+esac