Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
c1662c9
feat(environments): daemon vellumRoot() honors BASE_DATA_DIR per-inst…
clopen-set Apr 14, 2026
5584220
feat(environments): Swift-side VellumPaths env-aware helpers (#25457)
clopen-set Apr 14, 2026
6590ca1
feat(environments): route CLI lockfile R/W and allocator through envi…
clopen-set Apr 14, 2026
e477710
feat(environments): route CLI platform token, guardian token, and dev…
clopen-set Apr 14, 2026
5226584
fix(cli): delete unused LOCKFILE_NAMES export (#25488)
clopen-set Apr 14, 2026
143d8f8
fix(environments): route daemon protected/ callers through platform h…
clopen-set Apr 14, 2026
4e58ce5
fix(environments): orphan-detection and recover find daemons across a…
clopen-set Apr 14, 2026
7dd7355
refactor(environments): route Swift client path sites through VellumP…
clopen-set Apr 14, 2026
4571ea3
fix(environments): make daemon XDG platform-token and device-id env-a…
clopen-set Apr 14, 2026
b959bdc
refactor(device-id): inline base-dir helper into migration 003
clopen-set Apr 14, 2026
307548b
docs(environments): update AGENTS.md and ARCHITECTURE.md for per-assi…
clopen-set Apr 14, 2026
81a04bf
fix(environments): CLI falls back to production on unknown VELLUM_ENV…
clopen-set Apr 14, 2026
4f397d2
fix(permissions): restore legacy signing-key path in risk classificat…
clopen-set Apr 14, 2026
27b0a72
fix(config-watcher): use || for GATEWAY_SECURITY_DIR fallback to matc…
clopen-set Apr 14, 2026
729f276
fix(cli): read platformBaseUrl from lockfile instead of legacy worksp…
clopen-set Apr 14, 2026
66d71fc
fix(chrome-ext): native host reads env-aware lockfile path (#25547)
clopen-set Apr 14, 2026
d69be06
fix(environments): VellumPaths accepts relative XDG_CONFIG_HOME to ma…
clopen-set Apr 14, 2026
8f1ac0a
refactor(recover): drop unreachable legacy fallback in collision chec…
clopen-set Apr 14, 2026
ca50c93
fix(cli): sync platformBaseUrl to lockfile on vellum use / vellum wak…
clopen-set Apr 14, 2026
0901ebb
fix(environments): env-seed fallback for getPlatformUrl, revert H1 sy…
clopen-set Apr 14, 2026
8b789c5
test(environments): drift-guard for KNOWN_ENVIRONMENTS across TS site…
clopen-set Apr 14, 2026
6ad1239
fix(environments): forward VELLUM_ENVIRONMENT across desktop→CLI→daem…
clopen-set Apr 14, 2026
35277c7
default to dev environment
clopen-set Apr 15, 2026
93235fd
remove duplicate env var
clopen-set Apr 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,11 +158,17 @@ CES tools are the only approved exception — see `assistant/src/tools/AGENTS.md

## Multi-Instance Path Invariant

The assistant daemon resolves its root directory as `join(homedir(), ".vellum")` via the internal `vellumRoot()` helper. Root-level paths (PID file, platform token, daemon stderr log, protected directory) always resolve under `~/.vellum/`. Remaining root-level files are being migrated to the workspace directory or removed entirely — see the phase plan in the repo for details.
The assistant daemon's root directory is **per-instance**. `vellumRoot()` in `assistant/src/util/platform.ts` reads `BASE_DATA_DIR` — set by the CLI on every daemon and gateway spawn — and returns `join(BASE_DATA_DIR, ".vellum")`. When `BASE_DATA_DIR` is unset (containerized deployments, manual test invocations outside the CLI-spawn lifecycle), it falls back to `join(homedir(), ".vellum")`. Every root-level path (PID file, `.env`, `runtime-port`, `protected/` — with its `keys.enc`, `trust.json`, `capability-token-secret`, `credentials/`, etc. — daemon stderr log) and the workspace directory itself derive from this helper, so a single fix cascades through every consumer.

The CLI (`cli/src/lib/local.ts`) still sets `BASE_DATA_DIR` when spawning named local instances. This is a legacy mechanism slated for removal — the CLI should be migrated to pass `VELLUM_WORKSPACE_DIR` (and any future per-instance env vars) instead of `BASE_DATA_DIR`. Until that migration is complete, the CLI constructs instance-scoped paths directly (e.g. `join(instanceDir, ".vellum", ...)`) rather than relying on the daemon's path helpers.
New local hatches allocate `instanceDir` under `$XDG_DATA_HOME/vellum/assistants/<name>/` (or `$XDG_DATA_HOME/vellum-<env>/assistants/<name>/` in non-production environments), and the daemon for that instance writes everything under `<instanceDir>/.vellum/`. Existing production lockfile entries that were created before this change may have `instanceDir = homedir()`; the read path honors whatever is stored in `resources.instanceDir`, so those assistants continue to live at `~/.vellum/` with no on-disk migration.

In Docker mode, `VELLUM_WORKSPACE_DIR` overrides the workspace location (e.g. `/workspace`). Code that needs the workspace path must use the resolved workspace directory rather than assuming it lives under `vellumRoot()`. The workspace volume is shared between the assistant and gateway containers.
`BASE_DATA_DIR` is the **canonical per-instance signal** the daemon consumes — it is not legacy and is not slated for removal. The CLI sets it in `cli/src/lib/local.ts` on every local daemon/gateway spawn, sourced from `resources.instanceDir` in the lockfile entry. The gateway (`gateway/src/paths.ts:getRootDir`) reads the same variable, so the daemon and gateway always agree on the root.

Note that the CLI still writes the authoritative `vellum.pid` (and `gateway.pid`, `ngrok.pid`) for each instance externally from its own process at `<instanceDir>/.vellum/vellum.pid` during spawn. The daemon also writes its own `vellumRoot()/vellum.pid` via `getPidPath()`. These paths are identical under `BASE_DATA_DIR`, so there is no divergence in practice — the CLI-written PID is the one consulted for lifecycle commands.

**XDG-shared config files** (platform-token, device-id, guardian tokens) are environment-scoped on both the CLI and daemon sides via `getXdgVellumConfigDirName()` in `assistant/src/util/platform.ts`: production resolves to `$XDG_CONFIG_HOME/vellum/`, non-production to `$XDG_CONFIG_HOME/vellum-<env>/` (for the seed environments `dev`, `staging`, `test`, `local`; unknown values fall back to `vellum` for safety). The Swift client's `VellumPaths.current.configDir` mirrors the same convention so every writer of these files agrees on the location.

In Docker mode, `VELLUM_WORKSPACE_DIR` overrides the workspace location (e.g. `/workspace`) and `GATEWAY_SECURITY_DIR` points at the gateway's security volume. Docker mode does not set `BASE_DATA_DIR`, so the two resolution paths do not conflict — containerized deployments keep using the workspace/security-volume conventions and ignore the per-instance override. Code that needs the workspace path must use `getWorkspaceDir()` rather than assuming it lives under `vellumRoot()`.

## Qdrant Port Override

Expand Down
127 changes: 83 additions & 44 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ This file is the cross-system architecture index. Detailed designs live in domai
| Trusted contact access design | [`assistant/docs/trusted-contact-access.md`](assistant/docs/trusted-contact-access.md) |
| Trusted contacts operator runbook | [`assistant/docs/runbook-trusted-contacts.md`](assistant/docs/runbook-trusted-contacts.md) |
| Credential Execution Service (CES) | [`assistant/docs/credential-execution-service.md`](assistant/docs/credential-execution-service.md) |
| Environment and data layout | [Environment and Data Layout](#environment-and-data-layout) (this file) |
| Multi-local instance isolation | [Multi-Local Instance Isolation](#multi-local-instance-isolation) (this file) |
| Docker volume architecture | [Docker Volume Architecture](#docker-volume-architecture) (this file) |

## Cross-Cutting Invariants
Expand All @@ -34,49 +36,99 @@ This file is the cross-system architecture index. Detailed designs live in domai
- **Permission controls v2** removes deterministic tool-by-tool approval friction for assistant-owned actions. Under `permission-controls-v2`, the only built-in deterministic approval surface is conversation-scoped host computer access for `host_*` / host-target tools. All other assistant-owned tool usage relies on model-mediated consent, not temporary approvals, wildcard scopes, per-tool persistence, or network/side-effect approval cards. Cross-principal identity checks (for example unknown actors) still fail closed deterministically.
- **Context overflow resilience**: The session loop implements a deterministic overflow convergence pipeline that recovers from context-too-large failures without surfacing errors to users. A preflight budget check catches overflow before provider calls; a tiered reducer (forced compaction, tool-result truncation, media stubbing, injection downgrade) iteratively shrinks the payload; and an overflow policy resolver gates latest-turn compression behind user approval for interactive sessions. Non-interactive sessions auto-compress; denied compression produces a graceful assistant explanation message (not a `conversation_error`). Config lives under `contextWindow.overflowRecovery`. See [`assistant/ARCHITECTURE.md`](assistant/ARCHITECTURE.md#context-overflow-recovery) for the full design and [`assistant/docs/architecture/memory.md`](assistant/docs/architecture/memory.md#context-compaction-and-overflow-recovery-interaction) for compaction interaction details.

## Environment and Data Layout

Environments are **namespaces**, not containers. `VELLUM_ENVIRONMENT` selects a path prefix (`vellum` for `production`, `vellum-<env>` for the non-production seeds `dev`, `staging`, `test`, `local`). It does not own data. Data directories are always per-assistant, and the lockfile's `resources.instanceDir` field is the source of truth for any given assistant's on-disk location.

### Per-assistant data directories

Every local assistant's daemon root is `<resources.instanceDir>/.vellum/`. The daemon receives `instanceDir` via the `BASE_DATA_DIR` environment variable set by the CLI on every spawn (`cli/src/lib/local.ts`), and the gateway reads the same variable in `gateway/src/paths.ts:getRootDir`. `assistant/src/util/platform.ts:vellumRoot` returns `join(BASE_DATA_DIR, ".vellum")` when the variable is set, and falls back to `join(homedir(), ".vellum")` otherwise. All root-level state (PID file, `.env`, `runtime-port`, `protected/` with its encrypted keys, trust rules, credentials, capability token, approved-devices list, etc.) and the workspace directory (`getWorkspaceDir()` = `vellumRoot()/workspace` when `VELLUM_WORKSPACE_DIR` is unset) derive from this helper.

Allocation of `instanceDir` for new hatches:

| Environment | `instanceDir` path |
|---|---|
| `production` | `$XDG_DATA_HOME/vellum/assistants/<name>/` |
| non-production (`vellum-<env>`) | `$XDG_DATA_HOME/vellum-<env>/assistants/<name>/` |

There is no "first local" special case — every new hatch goes through the same allocator (`cli/src/lib/assistant-config.ts:allocateLocalResources`) and lands under the XDG multi-instance tree. `~/.vellum/` is never an allocation target; it is only reached via existing lockfile entries whose `instanceDir = homedir()` was recorded before this change.

### Lockfile

| Environment | Canonical path | Read fallback |
|---|---|---|
| `production` | `~/.vellum.lock.json` | `~/.vellum.lockfile.json` (legacy rename) |
| non-production | `$XDG_CONFIG_HOME/vellum-<env>/lockfile.json` | (none — new path) |

The CLI routes all lockfile reads/writes through `cli/src/lib/environments/paths.ts:getLockfilePath` / `getLockfilePaths` so non-production environments land in the env-scoped XDG config tree. The parent directory is created on first write.

### Config directory (XDG-shared auth state)

| Environment | Config dir |
|---|---|
| `production` | `$XDG_CONFIG_HOME/vellum/` |
| non-production | `$XDG_CONFIG_HOME/vellum-<env>/` |

Platform tokens (`platform-token`), device IDs (`device-id`), and guardian tokens (`assistants/<id>/guardian-token.json`) live under the env-scoped config dir. The CLI (`cli/src/lib/platform-client.ts`, `cli/src/lib/guardian-token.ts`), the daemon (`assistant/src/util/platform.ts:getXdgPlatformTokenPath`, `getXdgVellumConfigDirName`), and the Swift client (`clients/shared/Utilities/VellumPaths.swift:configDir`) all agree on the same env-scoped path, so `vellum login`, guardian leasing, persisted device IDs, and desktop session state never bleed between environments.

### Backwards compatibility

Backwards compatibility lives entirely in the read path — no on-disk migration is performed.

- Existing production lockfile entries with `instanceDir = homedir()` continue to work: the daemon receives `BASE_DATA_DIR = homedir()` and resolves to `~/.vellum/` exactly as before.
- Production writes still go to the legacy `~/.vellum.lock.json` filename; the rename-era `~/.vellum.lockfile.json` is accepted as a read fallback.
- Unknown values of `VELLUM_ENVIRONMENT` (anything outside the seed table) resolve to `vellum` rather than a fabricated `vellum-<garbage>` directory, so misconfiguration degrades gracefully to the production path.

### Mixed local/remote and targeting

The lockfile can contain both local and remote entries side-by-side. Remote entries (`cloud: "gcp"`, `"aws"`, `"vellum"`, `"custom"`) carry connection metadata (`runtimeUrl`, `bearerToken`, etc.) but no `resources` block. `wake` and `sleep` only operate on local entries. `retire` works on both and dispatches per-cloud teardown for remote entries. CLI commands resolve which instance to target via `resolveTargetAssistant()` in the order: explicit name argument → `activeAssistant` field (set by `vellum use`) → sole local assistant.

## Multi-Local Instance Isolation

Multiple local assistant instances can run side-by-side on the same machine, each fully isolated. This enables development, testing, or running multiple assistants concurrently without conflicts.

### Instance Directory Layout
### Instance directory layout

Each named instance gets its own directory tree under `~/.vellum/instances/<name>/`:
Each named instance gets its own directory tree. The exact location depends on environment and whether the lockfile entry predates the env-aware allocator (see [Environment and Data Layout](#environment-and-data-layout) for allocation rules). For a production install of two new assistants `alice` and `bob`:

```
~/.vellum.lock.json # Global lockfile (all entries + activeAssistant)
~/.vellum/
├── instances/
│ ├── alice/ # Instance root (= BASE_DATA_DIR for this daemon)
│ │ └── .vellum/ # Runtime dir (getRootDir() resolves here)
│ │ ├── vellum.pid # Daemon PID
│ │ ├── gateway.pid # Gateway PID
│ │ ├── outbound-proxy.pid
│ │ ├── session-token
│ │ └── workspace/
│ │ ├── config.json
│ │ ├── data/
│ │ │ ├── db/assistant.db
│ │ │ ├── qdrant/
│ │ │ └── logs/
│ │ └── skills/
│ └── bob/
│ └── .vellum/
│ └── ... # Same structure as alice
~/.vellum.lock.json # Global lockfile
~/.local/share/vellum/assistants/
├── alice/ # instanceDir for alice (= BASE_DATA_DIR)
│ └── .vellum/ # Daemon root (vellumRoot())
│ ├── vellum.pid # Daemon PID (duplicated by the CLI on spawn)
│ ├── gateway.pid
│ ├── ngrok.pid
│ ├── runtime-port
│ ├── .env
│ ├── protected/ # keys.enc, trust.json, credentials/, ...
│ └── workspace/
│ ├── config.json
│ ├── data/
│ │ ├── db/assistant.db
│ │ ├── qdrant/
│ │ └── logs/
│ └── skills/
└── bob/
└── .vellum/
└── ... # Same structure as alice
```

An existing production lockfile entry created before env-aware allocation may still have `instanceDir = ~` and all of its state under `~/.vellum/`. That path is preserved via the lockfile read path — no data is moved. Non-production (`vellum-<env>`) hatches use the same layout under `$XDG_DATA_HOME/vellum-<env>/assistants/<name>/`.

All instances are created with explicit names via `vellum hatch --name <name>`.

### Isolation Model
### Isolation model

Each instance gets its own:
- **`BASE_DATA_DIR`**: Set to the instance directory (e.g. `~/.vellum/instances/alice/`). The daemon appends `.vellum` to this to derive `getRootDir()`, so all runtime files land under the instance.
- **`BASE_DATA_DIR`**: Set to `resources.instanceDir`. The daemon (and gateway) append `.vellum` to this to derive the root directory, so all runtime files land under the instance.
- **Daemon port** (`RUNTIME_HTTP_PORT`): Allocated by scanning from base port 7821.
- **Gateway port** (`GATEWAY_PORT`): Allocated by scanning from base port 7830.
- **Qdrant port** (`QDRANT_HTTP_PORT`): Allocated by scanning from base port 6333.
- **PID file**: `<instanceDir>/.vellum/vellum.pid`
- **SQLite database, logs, memory indices**: All under `<instanceDir>/.vellum/workspace/data/`

### Port Allocation
### Port allocation

Ports are allocated sequentially by `allocateLocalResources()` in `cli/src/lib/assistant-config.ts`:

Expand All @@ -86,9 +138,9 @@ Ports are allocated sequentially by `allocateLocalResources()` in `cli/src/lib/a

Availability is checked via TCP connect probe. Each scan range spans up to 100 ports. Allocated ports are persisted in the lockfile `resources` field so `wake`/`sleep` can restart instances on the same ports.

### Lockfile Schema
### Lockfile schema

The global lockfile (`~/.vellum.lock.json`) tracks all instances:
The production lockfile (`~/.vellum.lock.json`) tracks all instances:

```jsonc
{
Expand All @@ -98,12 +150,12 @@ The global lockfile (`~/.vellum.lock.json`) tracks all instances:
"runtimeUrl": "http://localhost:7821",
"cloud": "local",
"hatchedAt": "2026-03-04T...",
"resources": { // Present for local multi-instance entries
"instanceDir": "~/.vellum/instances/alice",
"resources": { // Present for local entries
"instanceDir": "~/.local/share/vellum/assistants/alice",
"daemonPort": 7821,
"gatewayPort": 7830,
"qdrantPort": 6333,
"pidFile": "~/.vellum/instances/alice/.vellum/vellum.pid"
"pidFile": "~/.local/share/vellum/assistants/alice/.vellum/vellum.pid"
}
},
{
Expand All @@ -119,21 +171,8 @@ The global lockfile (`~/.vellum.lock.json`) tracks all instances:

- `resources` (`LocalInstanceResources`): Present on all local entries. Contains per-instance ports and paths.
- `activeAssistant`: Determines which instance CLI commands target by default.
- Remote assistants (`cloud: "gcp"`, `"aws"`, etc.) are unaffected and have no `resources` field.

### Active Assistant Targeting

CLI commands resolve which instance to target via `resolveTargetAssistant()`:

1. **Explicit name argument** — `vellum sleep alice`
2. **Active assistant** — set via `vellum use <name>`, stored as `activeAssistant` in lockfile
3. **Sole local assistant** — when exactly one local instance exists

`wake` and `sleep` guard against targeting remote assistants (they exit with an error for non-`local` entries).

### Mixed Local/Remote

The lockfile can contain both local and remote entries. Remote entries (cloud providers) carry connection metadata (`runtimeUrl`, `bearerToken`, etc.) but no `resources`. `wake` and `sleep` only operate on local instances (they error for remote entries). `retire` works on both local and remote instances, using cloud-specific teardown for GCP/AWS/custom entries.
- Remote assistants (`cloud: "gcp"`, `"aws"`, `"vellum"`, etc.) are unaffected and have no `resources` field.
- Non-production environments use `$XDG_CONFIG_HOME/vellum-<env>/lockfile.json` with the same schema.

## Docker Volume Architecture

Expand Down
19 changes: 19 additions & 0 deletions assistant/src/__tests__/checker.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,25 @@ describe("Permission Checker", () => {
});
expect(risk).toBe(RiskLevel.High);
});

test("file_read of legacy signing key is high risk even when BASE_DATA_DIR relocates getProtectedDir()", async () => {
const savedBaseDataDir = process.env.BASE_DATA_DIR;
process.env.BASE_DATA_DIR = "/tmp/fake-instance-signing-key-test";
try {
const risk = await classifyRisk("file_read", {
path: join(
homedir(),
".vellum",
"protected",
"actor-token-signing-key",
),
});
expect(risk).toBe(RiskLevel.High);
} finally {
if (savedBaseDataDir === undefined) delete process.env.BASE_DATA_DIR;
else process.env.BASE_DATA_DIR = savedBaseDataDir;
}
});
});

// file_write is always low (sandboxed)
Expand Down
Loading
Loading