Skip to content

refactor(environments): env-aware data and config path layout#25499

Merged
clopen-set merged 24 commits into
mainfrom
clopen-set/env-data-layout
Apr 16, 2026
Merged

refactor(environments): env-aware data and config path layout#25499
clopen-set merged 24 commits into
mainfrom
clopen-set/env-data-layout

Conversation

@clopen-set
Copy link
Copy Markdown
Collaborator

@clopen-set clopen-set commented Apr 14, 2026

Summary

Routes all data- and config-dir path sites through environment-aware helpers so per-environment instances (dev / staging / prod / per-user) get isolated lockfiles, tokens, device IDs, and protected paths. Previously many call sites hardcoded ~/.vellum or used getDataDir(), which broke multi-env workflows.

  • CLI — lockfile R/W, allocator, orphan detection, assistant config, guardian token, platform token, device ID, and hatch-local now resolve paths via getConfigDir(env) / getDataDir(env); getDataDir and LOCKFILE_NAMES are deleted.
  • DaemonvellumRoot() honors per-instance BASE_DATA_DIR overrides; XDG platform-token and device-id paths are env-aware; protected/ callers route through platform helpers; orphan-detection and recover find daemons across all lockfile entries.
  • Swift clients — new VellumPaths helper; DeviceIdStore, GuardianTokenFileReader, SessionTokenManager, SigningIdentityManager, FileCredentialStorage, and LockfilePaths all route through it.
  • Chrome extension native host — lockfile path routed through env-aware helper.
  • Adds unit tests for CLI assistant-config, guardian-token, orphan-detection, platform-client, teleport, multi-local, device-id, platform, and a new VellumPathsTests suite on the Swift side.

Test plan

  • cd cli && pnpm test
  • cd assistant && pnpm test
  • Swift: run VellumPathsTests in Xcode (vellum-assistantTests target)
  • Smoke test: spin up two envs with different BASE_DATA_DIR and confirm lockfiles/tokens/device-ids are isolated
  • Smoke test: vellum recover locates daemons across all lockfile entries
  • Chrome extension native host launches and finds its lockfile under the expected env dir

🤖 Generated with Claude Code


Open with Devin

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dd8a5e465b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


function writeLockfile(data: LockfileData): void {
const lockfilePath = join(getLockfileDir(), LOCKFILE_NAMES[0]);
const lockfilePath = getLockfilePath(getCurrentEnvironment());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Mirror env-scoped lockfile path in native host

writeLockfile() now persists to getLockfilePath(getCurrentEnvironment()), which places non-production lockfiles under $XDG_CONFIG_HOME/vellum-<env>/lockfile.json. The Chrome native host still only reads ~/.vellum.lock.json / ~/.vellum.lockfile.json (clients/chrome-extension/native-host/src/lockfile.ts), so in local/dev/test/staging environments list_assistants and assistant-scoped request_token lookups will fail to discover assistants unless VELLUM_LOCKFILE_DIR is manually provided.

Useful? React with 👍 / 👎.

devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 8 additional findings in Devin Review.

Open in Devin Review

Comment thread cli/src/commands/recover.ts Outdated
Comment on lines 59 to 74
const target = entry.resources?.instanceDir
? join(entry.resources.instanceDir, ".vellum")
: join(homedir(), ".vellum");
if (existsSync(target)) {
console.error(
"Error: ~/.vellum already exists. Retire the current assistant first.",
`Error: ${target} already exists (owned by ${entry.assistantId}). ` +
`Retire the current assistant first.`,
);
process.exit(1);
}

// 4. Extract archive
// TODO: extraction target is hardcoded to homedir(); multi-instance entries
// whose instanceDir differs from homedir will extract to the wrong
// location. Tracked separately from the collision-check regression.
await exec("tar", ["xzf", archivePath, "-C", homedir()]);
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 recover.ts: archive extraction target diverges from collision check

The collision check at cli/src/commands/recover.ts:59-61 was updated to check the per-instance directory (entry.resources.instanceDir + "/.vellum") instead of the global ~/.vellum. However, the actual archive extraction at line 74 still targets homedir(). For multi-instance entries whose instanceDir differs from homedir(), the archive would be extracted to the wrong location. This is explicitly acknowledged with a TODO at lines 71-73, but worth noting that the collision check and extraction target are now structurally misaligned — the check guards one path while the extraction writes to another. A future multi-instance retire+recover cycle would silently extract files to the wrong tree.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread assistant/src/util/platform.ts Outdated
Comment on lines +157 to +166
// Kept in sync with `cli/src/lib/environments/seeds.ts`. The daemon does not
// import from the CLI package, so the list is duplicated here. If a new
// environment is added to the seed table, add it here too.
const KNOWN_ENVIRONMENTS: ReadonlySet<string> = new Set([
"production",
"staging",
"test",
"dev",
"local",
]);
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Four independent KNOWN_ENVIRONMENTS lists must stay in sync

The set of recognized non-production environment names is duplicated in four locations that must stay in sync:

  1. assistant/src/util/platform.ts:160-166KNOWN_ENVIRONMENTS (includes production)
  2. cli/src/lib/environments/seeds.ts — the SEEDS table keys
  3. clients/chrome-extension/native-host/src/lockfile.ts:40-45NON_PRODUCTION_ENVIRONMENTS (excludes production)
  4. clients/shared/App/VellumEnvironment.swift:9-14 — Swift enum cases

The comment at platform.ts:157-158 says "Kept in sync with cli/src/lib/environments/seeds.ts... If a new environment is added to the seed table, add it here too." The native host and Swift client also need updating. Today all four agree on dev, staging, test, local (plus production). A future addition to the seed table that doesn't propagate to all four sites would cause path disagreements.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

known issue that we're tracking. in the future, would like to move to a context file like kubectl (with some seeds), but it's overengineered for current needs

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 10 additional findings in Devin Review.

Open in Devin Review

Comment on lines +329 to +347
process.env.BASE_DATA_DIR = resources.instanceDir;
try {
const ngrokChild = await maybeStartNgrokTunnel(resources.gatewayPort);
if (ngrokChild?.pid) {
const ngrokPidFile = join(resources.instanceDir, ".vellum", "ngrok.pid");
writeFileSync(ngrokPidFile, String(ngrokChild.pid));
}

emitProgress(7, 7, "Saving configuration...");
saveAssistantEntry(localEntry);
setActiveAssistant(instanceName);
syncConfigToLockfile();
} finally {
if (prevBaseDataDir !== undefined) {
process.env.BASE_DATA_DIR = prevBaseDataDir;
} else {
delete process.env.BASE_DATA_DIR;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 hatch-local.ts restructured to keep syncConfigToLockfile inside BASE_DATA_DIR scope

The try/finally block at hatch-local.ts:329-347 now wraps ngrok startup, lockfile save, AND syncConfigToLockfile() inside the BASE_DATA_DIR = resources.instanceDir scope. This is intentional and correct — syncConfigToLockfile() calls getBaseDir() which reads BASE_DATA_DIR, so it needs the env var set to find the right config.json. In the old code, syncConfigToLockfile() was called AFTER BASE_DATA_DIR was restored to its previous value, meaning it would read config from homedir()/.vellum/workspace/config.json rather than the per-instance path. A subtle consequence: if maybeStartNgrokTunnel() or the ngrok PID write throws, saveAssistantEntry is now skipped. The daemon and gateway are already running at that point (started at lines 282-299), so a throw here would leave orphaned processes without a lockfile entry. This is unlikely in practice since maybeStartNgrokTunnel is designed as a best-effort operation, but the coupling is tighter than before.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@clopen-set clopen-set force-pushed the clopen-set/env-data-layout branch from 5d62367 to ef6e046 Compare April 15, 2026 12:47
devin-ai-integration[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@dvargasfuertes dvargasfuertes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulling it down now to give a whirl

} from "../util/platform.js";

const originalWorkspaceDir = process.env.VELLUM_WORKSPACE_DIR;
const originalBaseDataDir = process.env.BASE_DATA_DIR;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'm trying to clear out all usages of this env var. what are the remaining high risk places on the assistant side that depend on it? It should now be gutted on the gateway side

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw that chain of PRs a few weeks back trying to deprecate this. I brought this back to minimize behavioral changes for this assistants work. We should tackle full deprecation as part of a separate refactor.

IIRC, remaining use cases were for local assistants, centered around this vellumRoot helper:

function vellumRoot(): string {

  • isTCPEnabled
  • getPlatformTokenPath (I think this is a fallback and can be deprecated now? Unclear)
  • getPidPath for vellum.pid
  • getRuntimePortFilePath for runtime-port
  • getDotEnvPath for local .env files

@@ -33,6 +33,10 @@ const inflight = new Map<string, Promise<string | null>>();

/** Where all cached packages live on disk. */
export function getCacheDir(): string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woah what do we use this for? can imagine this centralizing in XDG too

Comment on lines +53 to +58
// VELLUM_ROOT_DIR is kept at the legacy `~/.vellum` value even when
// vellumRoot() resolves per-instance via BASE_DATA_DIR. User hook
// scripts written against this env var expected the legacy path;
// changing it would be a silent contract break. Hooks that need the
// per-instance root should read BASE_DATA_DIR themselves or use the
// new env vars the environment-layout plan adds.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant with the above comment yea?

*/
function buildCesProtectedPaths(): string[] {
const securityDir =
process.env.GATEWAY_SECURITY_DIR || join(homedir(), ".vellum", "protected");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure these GATEWAY_SECURITY_DIR reads from assistant are silly since we don't ever define it for the assistant daemon

Comment on lines +24 to +25
* currently-active assistant target" — reading the workspace
* `config.json` directly is incorrect for multi-instance and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cli should additionally have no access to the assistant workspace at all

beforeEach(() => {
// Save env vars we may mutate so each test starts from a clean slate.
savedVellumEnvironment = process.env.VELLUM_ENVIRONMENT;
savedXdgConfigHome = process.env.XDG_CONFIG_HOME;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do chrome extensions have access to the file system?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the chrome extension communicates with a native program running on the user's machine that does have file access

Comment on lines 181 to 188
const DEFAULT_ASSISTANT_PORT = 7821;
// NOTE: `~/.vellum/runtime-port` is the legacy single-instance fallback and
// is not env-aware. This is a known limitation for new production
// multi-local users — tracked separately from the env-data-layout fix.
// The authoritative source for per-assistant routing is the lockfile's
// `resources.daemonPort`, resolved via `resolveDaemonPort()` in
// `./lockfile.ts` using an env-aware path.
const RUNTIME_PORT_FILE = join(homedir(), ".vellum", "runtime-port");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @noanflaherty I would not expect the chrome extension to:

  • have access to the assistant's workspace (homedir(), ".vellum")
  • invoke the daemon instead of the gateway (7821)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The goal was to auto detect whether the assistant was local or hosted in the cloud in order to influence the pairing and auto strat. I viewed reading the lock file as the easiest way to do that. Is that valid in your mind? If not, what alternative would you suggest?
  2. Agree. I can make this change

Copy link
Copy Markdown
Contributor

@dvargasfuertes dvargasfuertes Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I viewed reading the lock file as the easiest way to do that

agreed, but this isn't reading the lockfile

  • Bare-Metal Assistant workspace -> ~/.vellum
  • Client-side Lockfile -> ~/.vellum.lock.json

@clopen-set clopen-set force-pushed the clopen-set/env-data-layout branch from ef6e046 to e1be6d9 Compare April 15, 2026 15:01
devin-ai-integration[bot]

This comment was marked as resolved.

clopen-set and others added 16 commits April 15, 2026 20:02
…ice-id paths through getConfigDir(env) (#25456)

* feat(environments): route CLI platform token, guardian token, and device-id paths through getConfigDir(env)

* fix(cli): use spyOn for platform-client and guardian-token in teleport tests

`mock.module()` in bun:test replaces a module globally in the process and
provides no way to unmock. `teleport.test.ts` was using it to stub both
`../lib/platform-client.js` and `../lib/guardian-token.js`, so those mocks
leaked into `platform-client.test.ts` and `guardian-token.test.ts` when they
ran in the same bun test process — every call to `readPlatformToken()` in
the platform-client tests returned the literal string "platform-token" from
the stale teleport mock, and `loadGuardianToken()` in the guardian-token
tests returned a minimal fake object missing the `guardianPrincipalId`
field the tests assert on.

Mirror the existing `assistant-config` pattern (already using `spyOn` for
the same reason per its inline comment) for `platform-client` and
`guardian-token`. `spyOn()` mutates the imported module namespace object
only, and `mockRestore()` in `afterAll` fully reverts the stubs so other
test files see the real implementations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ll lockfile entries (#25481)

* fix(environments): orphan-detection and recover find daemons across all lockfile entries

* fix(recover): scope collision check to recovering entry's own target path

Addresses Codex P1 and Devin P1 on #25481: the iterate-all-entries loop
blocked recovery whenever any unrelated local assistant was still installed.
…aths (#25483)

* refactor(environments): route Swift client path sites through VellumPaths

* fix(environments): remove dead xdgDataHome field + reject relative XDG paths

Addresses Devin and Codex P2 findings on PR #25457:
- xdgDataHome was stored but never read; remove the field and its resolver helper
- resolveXdgConfigHome() no longer rewrites relative XDG_CONFIG_HOME values against cwd — relative values are rejected for parity with the TypeScript env package
Removes the stale getDeviceIdBaseDir() export from device-id.ts —
getDeviceId() no longer uses it, so it was a maintenance trap whose
return value diverged from where device.json actually resolves in
non-production envs. Its sole remaining caller was workspace migration
003, so inline the 2-line containerized-vs-homedir branch there.

Brings migration 003 closer to the self-containment rule in
assistant/src/workspace/migrations/AGENTS.md (no external imports
beyond types/logger).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
clopen-set and others added 8 commits April 15, 2026 20:08
…nc-on-switch (#25595)

Under our invariant "each env has its own lockfile, and all assistants in
that lockfile share a platform URL", the H1 workspace-config→lockfile sync
on `vellum use`/`vellum wake` was load-bearing for nothing: switching the
active assistant within a single env cannot change the platform URL.
Revert that sync. When no lockfile is seeded yet, fall back to the current
environment's seed URL instead of the hardcoded production default so
`VELLUM_ENVIRONMENT=dev vellum …` targets `dev-platform.vellum.ai` out of
the box.

- Revert H1: `vellum use` no longer calls `syncActiveAssistantConfigToLockfile`,
  `vellum wake` no longer re-runs `syncConfigToLockfile`, and the helper
  itself is deleted (the hatch-time sync is still done by `syncConfigToLockfile`,
  unchanged).
- `getPlatformUrl()` fallback: prefer `getCurrentEnvironment().platformUrl`
  over the hardcoded prod URL so non-prod CLI users get the right tenant
  before any assistant is registered.
- Tests: drop the H1 sync-on-switch suite, add a dev-env seed fallback test,
  keep the existing prod fallback test.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
#25617)

The set of recognized environment names is duplicated in three TS
locations: `cli/src/lib/environments/seeds.ts` (SEEDS), the daemon's
`assistant/src/util/platform.ts` (KNOWN_ENVIRONMENTS), and the Chrome
native host's `native-host/src/lockfile.ts` (NON_PRODUCTION_ENVIRONMENTS).
Cross-package imports don't work today — assistant's tsconfig restricts
`include` to its own src tree, and the native host is a standalone TS
project with `rootDir: ./src`.

Add a drift-guard test in cli that parses the literal Set bodies from
both external files and asserts they agree with CLI's SEEDS (minus
`production` for the native host set). Catches any future addition to
the seed table that fails to propagate to the other two sites.

Also refresh the comments on all three declarations to point at the
drift-guard test and the fast-follow plan: hoist the shared name list
into a `packages/environments` package (mirroring `packages/ces-contracts`
etc.) so this check becomes a compile-time import instead of a runtime
regex. That refactor is planned alongside CLI-driven context support.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on handoffs (#25633)

Two call sites were stripping VELLUM_ENVIRONMENT from spawn whitelists,
breaking environment isolation for the main desktop launch path:

1. macOS `VellumCli.makeBaseEnvironment()` — `forwardedEnvKeys` did not
   include `VELLUM_ENVIRONMENT`, so every bundled-CLI command launched
   from the app (hatch, wake, sleep, retire, …) ran as production even
   when the app itself was built for a non-production environment.
   The app's Info.plist sets `VELLUM_ENVIRONMENT` at build time
   (`build.sh:1054`), so forwarding it is sufficient.

2. `cli/src/lib/local.ts` compiled-daemon spawn — the `daemonEnv`
   whitelist used when `bun run` is unavailable (packaged desktop
   builds) also omitted `VELLUM_ENVIRONMENT`. Even when the CLI
   process itself had the variable set, the spawned daemon fell back
   to production path/env behavior, so assistant-side env-scoped state
   (device ID, XDG-backed tokens and config reads) bled into prod.

Note: the source/watch daemon spawn path in `local.ts:281` is
unaffected — it uses `{...process.env}` and inherits everything.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clopen-set clopen-set force-pushed the clopen-set/env-data-layout branch from e1be6d9 to 93235fd Compare April 16, 2026 00:09
@clopen-set clopen-set merged commit 813c37c into main Apr 16, 2026
15 checks passed
@clopen-set clopen-set deleted the clopen-set/env-data-layout branch April 16, 2026 00:15
asharma53 pushed a commit that referenced this pull request Apr 16, 2026
* feat(environments): daemon vellumRoot() honors BASE_DATA_DIR per-instance override (#25455)

* feat(environments): Swift-side VellumPaths env-aware helpers (#25457)

* feat(environments): route CLI lockfile R/W and allocator through environment helpers; delete getDataDir (#25458)

* feat(environments): route CLI platform token, guardian token, and device-id paths through getConfigDir(env) (#25456)

* feat(environments): route CLI platform token, guardian token, and device-id paths through getConfigDir(env)

* fix(cli): use spyOn for platform-client and guardian-token in teleport tests

`mock.module()` in bun:test replaces a module globally in the process and
provides no way to unmock. `teleport.test.ts` was using it to stub both
`../lib/platform-client.js` and `../lib/guardian-token.js`, so those mocks
leaked into `platform-client.test.ts` and `guardian-token.test.ts` when they
ran in the same bun test process — every call to `readPlatformToken()` in
the platform-client tests returned the literal string "platform-token" from
the stale teleport mock, and `loadGuardianToken()` in the guardian-token
tests returned a minimal fake object missing the `guardianPrincipalId`
field the tests assert on.

Mirror the existing `assistant-config` pattern (already using `spyOn` for
the same reason per its inline comment) for `platform-client` and
`guardian-token`. `spyOn()` mutates the imported module namespace object
only, and `mockRestore()` in `afterAll` fully reverts the stubs so other
test files see the real implementations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(cli): delete unused LOCKFILE_NAMES export (#25488)

* fix(environments): route daemon protected/ callers through platform helpers (#25493)

* fix(environments): orphan-detection and recover find daemons across all lockfile entries (#25481)

* fix(environments): orphan-detection and recover find daemons across all lockfile entries

* fix(recover): scope collision check to recovering entry's own target path

Addresses Codex P1 and Devin P1 on #25481: the iterate-all-entries loop
blocked recovery whenever any unrelated local assistant was still installed.

* refactor(environments): route Swift client path sites through VellumPaths (#25483)

* refactor(environments): route Swift client path sites through VellumPaths

* fix(environments): remove dead xdgDataHome field + reject relative XDG paths

Addresses Devin and Codex P2 findings on PR #25457:
- xdgDataHome was stored but never read; remove the field and its resolver helper
- resolveXdgConfigHome() no longer rewrites relative XDG_CONFIG_HOME values against cwd — relative values are rejected for parity with the TypeScript env package

* fix(environments): make daemon XDG platform-token and device-id env-aware (#25497)

* refactor(device-id): inline base-dir helper into migration 003

Removes the stale getDeviceIdBaseDir() export from device-id.ts —
getDeviceId() no longer uses it, so it was a maintenance trap whose
return value diverged from where device.json actually resolves in
non-production envs. Its sole remaining caller was workspace migration
003, so inline the 2-line containerized-vs-homedir branch there.

Brings migration 003 closer to the self-containment rule in
assistant/src/workspace/migrations/AGENTS.md (no external imports
beyond types/logger).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(environments): update AGENTS.md and ARCHITECTURE.md for per-assistant data layout (#25504)

* fix(environments): CLI falls back to production on unknown VELLUM_ENVIRONMENT (parity with daemon and Swift) (#25541)

* fix(permissions): restore legacy signing-key path in risk classification (#25542)

* fix(config-watcher): use || for GATEWAY_SECURITY_DIR fallback to match sibling convention (#25543)

* fix(cli): read platformBaseUrl from lockfile instead of legacy workspace config path (#25544)

* fix(chrome-ext): native host reads env-aware lockfile path (#25547)

* fix(environments): VellumPaths accepts relative XDG_CONFIG_HOME to match TS/daemon (#25550)

* refactor(recover): drop unreachable legacy fallback in collision check (#25575)

* fix(cli): sync platformBaseUrl to lockfile on vellum use / vellum wake (#25578)

* fix(environments): env-seed fallback for getPlatformUrl, revert H1 sync-on-switch (#25595)

Under our invariant "each env has its own lockfile, and all assistants in
that lockfile share a platform URL", the H1 workspace-config→lockfile sync
on `vellum use`/`vellum wake` was load-bearing for nothing: switching the
active assistant within a single env cannot change the platform URL.
Revert that sync. When no lockfile is seeded yet, fall back to the current
environment's seed URL instead of the hardcoded production default so
`VELLUM_ENVIRONMENT=dev vellum …` targets `dev-platform.vellum.ai` out of
the box.

- Revert H1: `vellum use` no longer calls `syncActiveAssistantConfigToLockfile`,
  `vellum wake` no longer re-runs `syncConfigToLockfile`, and the helper
  itself is deleted (the hatch-time sync is still done by `syncConfigToLockfile`,
  unchanged).
- `getPlatformUrl()` fallback: prefer `getCurrentEnvironment().platformUrl`
  over the hardcoded prod URL so non-prod CLI users get the right tenant
  before any assistant is registered.
- Tests: drop the H1 sync-on-switch suite, add a dev-env seed fallback test,
  keep the existing prod fallback test.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(environments): drift-guard for KNOWN_ENVIRONMENTS across TS sites (#25617)

The set of recognized environment names is duplicated in three TS
locations: `cli/src/lib/environments/seeds.ts` (SEEDS), the daemon's
`assistant/src/util/platform.ts` (KNOWN_ENVIRONMENTS), and the Chrome
native host's `native-host/src/lockfile.ts` (NON_PRODUCTION_ENVIRONMENTS).
Cross-package imports don't work today — assistant's tsconfig restricts
`include` to its own src tree, and the native host is a standalone TS
project with `rootDir: ./src`.

Add a drift-guard test in cli that parses the literal Set bodies from
both external files and asserts they agree with CLI's SEEDS (minus
`production` for the native host set). Catches any future addition to
the seed table that fails to propagate to the other two sites.

Also refresh the comments on all three declarations to point at the
drift-guard test and the fast-follow plan: hoist the shared name list
into a `packages/environments` package (mirroring `packages/ces-contracts`
etc.) so this check becomes a compile-time import instead of a runtime
regex. That refactor is planned alongside CLI-driven context support.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(environments): forward VELLUM_ENVIRONMENT across desktop→CLI→daemon handoffs (#25633)

Two call sites were stripping VELLUM_ENVIRONMENT from spawn whitelists,
breaking environment isolation for the main desktop launch path:

1. macOS `VellumCli.makeBaseEnvironment()` — `forwardedEnvKeys` did not
   include `VELLUM_ENVIRONMENT`, so every bundled-CLI command launched
   from the app (hatch, wake, sleep, retire, …) ran as production even
   when the app itself was built for a non-production environment.
   The app's Info.plist sets `VELLUM_ENVIRONMENT` at build time
   (`build.sh:1054`), so forwarding it is sufficient.

2. `cli/src/lib/local.ts` compiled-daemon spawn — the `daemonEnv`
   whitelist used when `bun run` is unavailable (packaged desktop
   builds) also omitted `VELLUM_ENVIRONMENT`. Even when the CLI
   process itself had the variable set, the spawned daemon fell back
   to production path/env behavior, so assistant-side env-scoped state
   (device ID, XDG-backed tokens and config reads) bled into prod.

Note: the source/watch daemon spawn path in `local.ts:281` is
unaffected — it uses `{...process.env}` and inherits everything.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* default to dev environment

* remove duplicate env var

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
asharma53 added a commit that referenced this pull request Apr 16, 2026
…25977)

* fix(gmail): make retry sleeps signal-aware in Gmail client

The abort controller from sender-digest can now interrupt retry sleep
delays, preventing Promise.allSettled from hanging past the deadline
when batchGetMessages enters exponential backoff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(macos): gate thinking-anchor reset to toolRunning only (#25968)

Resetting the thinking anchor on .streamingCode can erase valid post-tool
thinking intervals when a late code preview fires after tools complete.
Restrict the reset to .toolRunning so the thinking duration is accurate.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(gmail): safe default for has_prior_reply, time-budget enrichment (#25967)

- Default has_prior_reply to true on API errors (safe direction per SKILL.md)
- Skip reply checks when already rate-limited to avoid wasting quota
- Add time budget to enrichment step using remaining TIME_BUDGET_MS
- Over-fetch sender candidates before capping to max_senders

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(gmail): restrict archive fallback to expired scans, sanitize query (#25966)

* fix(gmail): restrict archive fallback to expired scans, sanitize query

- Only fall back to query-based archiving when scan is truly expired (null),
  not when sender IDs don't match (empty array).
- Quote emails in fallback query to prevent Gmail query injection.
- Update SKILL.md to reflect new fallback behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(gmail): clarify SKILL.md scan expiration fallback behavior

Document the distinction between expired scan (null, falls back to
query) vs sender ID mismatch (empty array, returns error).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(oauth): add gmail.settings.basic scope to Google OAuth defaults (#25970)

The Gmail settings scope is required for filter creation/deletion, label
management, and other settings-level operations. Without it, the
gmail_filters tool fails with 403 ACCESS_TOKEN_SCOPE_INSUFFICIENT.

Existing tokens will be flagged by the credential health service's scope
drift detection, prompting users to re-authenticate.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add contacts + contact_channels tables to gateway SQLite (#25951)

* feat: add contacts + contact_channels tables to gateway SQLite

Gateway cutover step 1: declare contacts and contact_channels tables
in the gateway DB schema. This is the foundation for moving contact
auth/authz ownership from the assistant daemon to the gateway.

- contacts: mirrors assistant's contacts table (auth/authz fields only)
- contact_channels: mirrors assistant's contact_channels table with
  same indexes (type+external_user_id, type+external_chat_id)
- m0002-seed-contacts: one-time data migration that seeds both tables
  from assistant.db on first startup (INSERT OR IGNORE, transactional)
- ContactStore: read-only store with prepared-statement queries
  (getContact, listContacts, getContactByChannel, getChannelsForContact)
- IPC handlers: list_contacts, get_contact, get_contact_by_channel,
  get_channels_for_contact — wired into the gateway IPC server
- Tests: ContactStore unit tests + IPC round-trip tests

* review: address Vargas feedback on PR #25951

- Strip contacts table to auth/authz-only: remove notes, user_file,
  contact_type columns (not needed for actor validation)
- Remove m0002-seed-contacts data migration — hold off until endpoints
  have cutover and we're dual-writing
- Move test imports to top level (no more inline await import())
- Use fake channel IDs in tests instead of real ones
- Clean test state between runs (DELETE before seed)
- Update ARCHITECTURE.md + gateway/ARCHITECTURE.md to document the
  contacts ownership migration direction
- Add Drizzle migration + test preload env var cleanup tasks to
  workstream Up Next

---------

Co-authored-by: root <root@assistant-89f9b42a-2563-4bbe-96b7-b2840c145b37-0.assistant-89f9b42a-2563-4bbe-96b7-b2840c145b37.warm-pool.svc.cluster.local>

* fix(runtime): wake adapter drains queue, persists with metadata, broadcasts to all clients (#25972)

* meet-join: SKILL.md guidance for voice participation (#25973)

* fix(gmail): persist blocklist only after archive succeeds (#25971)

Move addToBlocklist() call from before the batch archive operation to
after it succeeds. Prevents corrupted cleanup state when archiving fails.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor(environments): env-aware data and config path layout (#25499)

* feat(environments): daemon vellumRoot() honors BASE_DATA_DIR per-instance override (#25455)

* feat(environments): Swift-side VellumPaths env-aware helpers (#25457)

* feat(environments): route CLI lockfile R/W and allocator through environment helpers; delete getDataDir (#25458)

* feat(environments): route CLI platform token, guardian token, and device-id paths through getConfigDir(env) (#25456)

* feat(environments): route CLI platform token, guardian token, and device-id paths through getConfigDir(env)

* fix(cli): use spyOn for platform-client and guardian-token in teleport tests

`mock.module()` in bun:test replaces a module globally in the process and
provides no way to unmock. `teleport.test.ts` was using it to stub both
`../lib/platform-client.js` and `../lib/guardian-token.js`, so those mocks
leaked into `platform-client.test.ts` and `guardian-token.test.ts` when they
ran in the same bun test process — every call to `readPlatformToken()` in
the platform-client tests returned the literal string "platform-token" from
the stale teleport mock, and `loadGuardianToken()` in the guardian-token
tests returned a minimal fake object missing the `guardianPrincipalId`
field the tests assert on.

Mirror the existing `assistant-config` pattern (already using `spyOn` for
the same reason per its inline comment) for `platform-client` and
`guardian-token`. `spyOn()` mutates the imported module namespace object
only, and `mockRestore()` in `afterAll` fully reverts the stubs so other
test files see the real implementations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(cli): delete unused LOCKFILE_NAMES export (#25488)

* fix(environments): route daemon protected/ callers through platform helpers (#25493)

* fix(environments): orphan-detection and recover find daemons across all lockfile entries (#25481)

* fix(environments): orphan-detection and recover find daemons across all lockfile entries

* fix(recover): scope collision check to recovering entry's own target path

Addresses Codex P1 and Devin P1 on #25481: the iterate-all-entries loop
blocked recovery whenever any unrelated local assistant was still installed.

* refactor(environments): route Swift client path sites through VellumPaths (#25483)

* refactor(environments): route Swift client path sites through VellumPaths

* fix(environments): remove dead xdgDataHome field + reject relative XDG paths

Addresses Devin and Codex P2 findings on PR #25457:
- xdgDataHome was stored but never read; remove the field and its resolver helper
- resolveXdgConfigHome() no longer rewrites relative XDG_CONFIG_HOME values against cwd — relative values are rejected for parity with the TypeScript env package

* fix(environments): make daemon XDG platform-token and device-id env-aware (#25497)

* refactor(device-id): inline base-dir helper into migration 003

Removes the stale getDeviceIdBaseDir() export from device-id.ts —
getDeviceId() no longer uses it, so it was a maintenance trap whose
return value diverged from where device.json actually resolves in
non-production envs. Its sole remaining caller was workspace migration
003, so inline the 2-line containerized-vs-homedir branch there.

Brings migration 003 closer to the self-containment rule in
assistant/src/workspace/migrations/AGENTS.md (no external imports
beyond types/logger).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(environments): update AGENTS.md and ARCHITECTURE.md for per-assistant data layout (#25504)

* fix(environments): CLI falls back to production on unknown VELLUM_ENVIRONMENT (parity with daemon and Swift) (#25541)

* fix(permissions): restore legacy signing-key path in risk classification (#25542)

* fix(config-watcher): use || for GATEWAY_SECURITY_DIR fallback to match sibling convention (#25543)

* fix(cli): read platformBaseUrl from lockfile instead of legacy workspace config path (#25544)

* fix(chrome-ext): native host reads env-aware lockfile path (#25547)

* fix(environments): VellumPaths accepts relative XDG_CONFIG_HOME to match TS/daemon (#25550)

* refactor(recover): drop unreachable legacy fallback in collision check (#25575)

* fix(cli): sync platformBaseUrl to lockfile on vellum use / vellum wake (#25578)

* fix(environments): env-seed fallback for getPlatformUrl, revert H1 sync-on-switch (#25595)

Under our invariant "each env has its own lockfile, and all assistants in
that lockfile share a platform URL", the H1 workspace-config→lockfile sync
on `vellum use`/`vellum wake` was load-bearing for nothing: switching the
active assistant within a single env cannot change the platform URL.
Revert that sync. When no lockfile is seeded yet, fall back to the current
environment's seed URL instead of the hardcoded production default so
`VELLUM_ENVIRONMENT=dev vellum …` targets `dev-platform.vellum.ai` out of
the box.

- Revert H1: `vellum use` no longer calls `syncActiveAssistantConfigToLockfile`,
  `vellum wake` no longer re-runs `syncConfigToLockfile`, and the helper
  itself is deleted (the hatch-time sync is still done by `syncConfigToLockfile`,
  unchanged).
- `getPlatformUrl()` fallback: prefer `getCurrentEnvironment().platformUrl`
  over the hardcoded prod URL so non-prod CLI users get the right tenant
  before any assistant is registered.
- Tests: drop the H1 sync-on-switch suite, add a dev-env seed fallback test,
  keep the existing prod fallback test.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test(environments): drift-guard for KNOWN_ENVIRONMENTS across TS sites (#25617)

The set of recognized environment names is duplicated in three TS
locations: `cli/src/lib/environments/seeds.ts` (SEEDS), the daemon's
`assistant/src/util/platform.ts` (KNOWN_ENVIRONMENTS), and the Chrome
native host's `native-host/src/lockfile.ts` (NON_PRODUCTION_ENVIRONMENTS).
Cross-package imports don't work today — assistant's tsconfig restricts
`include` to its own src tree, and the native host is a standalone TS
project with `rootDir: ./src`.

Add a drift-guard test in cli that parses the literal Set bodies from
both external files and asserts they agree with CLI's SEEDS (minus
`production` for the native host set). Catches any future addition to
the seed table that fails to propagate to the other two sites.

Also refresh the comments on all three declarations to point at the
drift-guard test and the fast-follow plan: hoist the shared name list
into a `packages/environments` package (mirroring `packages/ces-contracts`
etc.) so this check becomes a compile-time import instead of a runtime
regex. That refactor is planned alongside CLI-driven context support.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(environments): forward VELLUM_ENVIRONMENT across desktop→CLI→daemon handoffs (#25633)

Two call sites were stripping VELLUM_ENVIRONMENT from spawn whitelists,
breaking environment isolation for the main desktop launch path:

1. macOS `VellumCli.makeBaseEnvironment()` — `forwardedEnvKeys` did not
   include `VELLUM_ENVIRONMENT`, so every bundled-CLI command launched
   from the app (hatch, wake, sleep, retire, …) ran as production even
   when the app itself was built for a non-production environment.
   The app's Info.plist sets `VELLUM_ENVIRONMENT` at build time
   (`build.sh:1054`), so forwarding it is sufficient.

2. `cli/src/lib/local.ts` compiled-daemon spawn — the `daemonEnv`
   whitelist used when `bun run` is unavailable (packaged desktop
   builds) also omitted `VELLUM_ENVIRONMENT`. Even when the CLI
   process itself had the variable set, the spawned daemon fell back
   to production path/env behavior, so assistant-side env-scoped state
   (device ID, XDG-backed tokens and config reads) bled into prod.

Note: the source/watch daemon spawn path in `local.ts:281` is
unaffected — it uses `{...process.env}` and inherits everything.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* default to dev environment

* remove duplicate env var

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* meet-bot: implement POST /play_audio streaming endpoint (#25974)

* fix(macos): reset thinking anchor on streamingCode when tools active

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: vellum-apollo-bot[bot] <242025090+vellum-apollo-bot[bot]@users.noreply.github.com>
Co-authored-by: root <root@assistant-89f9b42a-2563-4bbe-96b7-b2840c145b37-0.assistant-89f9b42a-2563-4bbe-96b7-b2840c145b37.warm-pool.svc.cluster.local>
Co-authored-by: siddseethepalli <siddseethepalli@gmail.com>
Co-authored-by: clopen-set <33433326+clopen-set@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants