Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions cli/src/commands/hatch.ts
Original file line number Diff line number Diff line change
Expand Up @@ -538,6 +538,16 @@ async function hatchLocal(species: Species, name: string | null, daemonOnly: boo
}
}

const baseDataDir = join(process.env.BASE_DATA_DIR?.trim() || (process.env.HOME ?? userInfo().homedir), ".vellum");

if (existsSync(baseDataDir)) {
throw new Error(
Comment on lines +543 to +544
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Narrow base-data-dir existence check before local hatch

This check fails hatch local whenever ~/.vellum already exists, but that directory is also created by unrelated setup paths (for example vellum config set creates ~/.vellum/workspace/config.json in cli/src/commands/config.ts), so users can be blocked from hatching even when no local assistant is running. In practice, a documented pre-hatch config step now causes a hard failure, so the guard should detect active/conflicting local runtime state rather than treating any preexisting directory as an overwrite risk.

Useful? React with 👍 / 👎.

`Base data directory already exists: ${baseDataDir}\n` +
" Another assistant may already be using this directory.\n" +
" To use a different directory, set the BASE_DATA_DIR environment variable.",
);
}
Comment on lines +543 to +549
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 existsSync check on baseDataDir blocks re-hatching after any failed hatch or incomplete cleanup

The new existsSync(baseDataDir) check at line 543 throws an error if ~/.vellum already exists, but this directory is routinely created by the daemon during startLocalDaemon() and is NOT removed by stopLocalProcesses(). This makes vellum hatch permanently broken after any failed hatch attempt.

Detailed failure scenarios and root cause

Scenario 1: Failed hatch recovery is impossible

If a previous hatchLocal() call succeeds at startLocalDaemon() (line 555) but fails at startGateway() (line 559), the error handler at lines 560-565 calls stopLocalProcesses(), which only removes PID and socket files (cli/src/lib/process.ts:85-89) — it does NOT remove the ~/.vellum directory. On the next vellum hatch attempt, the existsSync(baseDataDir) check at line 543 finds the leftover directory and throws, preventing recovery without manual rm -rf ~/.vellum.

Scenario 2: Contradicts stale cleanup code above

The stale process cleanup at lines 529-539 explicitly handles the case where ~/.vellum exists with PID files but no lock file entries — it cleans up stale processes and continues. But the new check at line 543 then immediately throws because the directory still exists after cleanup. The stale cleanup code becomes dead code in practice.

Scenario 3: Desktop daemon creates the directory

In desktop mode, startLocalDaemon() calls mkdirSync(vellumDir, { recursive: true }) at cli/src/lib/local.ts:181. While this happens after the check, any previous daemon run would have created the directory, and stopLocalProcesses() doesn't remove it.

Impact: vellum hatch --remote local is broken for any user who has previously attempted a local hatch that didn't complete cleanly, or whose ~/.vellum directory exists for any reason other than an active assistant. The only workaround is manual rm -rf ~/.vellum.

Prompt for agents
The existsSync check on the entire baseDataDir (~/.vellum) is too broad. This directory is used for PID files, socket files, and other runtime state that persists across hatch/retire cycles. The check should be more targeted — for example, checking for a specific sentinel file that indicates an active assistant is using the directory (e.g., a running daemon's PID file with a live process, or a specific lock/marker file), rather than checking for the directory's mere existence.

In cli/src/commands/hatch.ts, lines 543-549, replace the existsSync(baseDataDir) check with a more targeted check. Options include:
1. Check if there's an active daemon process using isProcessAlive() on the PID file inside baseDataDir
2. Check for a specific marker file that the daemon creates when actively using the directory
3. Remove the check entirely and rely on the existing stale cleanup logic at lines 529-539

The stale cleanup code at lines 529-539 already handles the case of leftover processes with no lock file entries, so the directory existence check is redundant and harmful.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


console.log(`🥚 Hatching local assistant: ${instanceName}`);
console.log(` Species: ${species}`);
console.log("");
Expand All @@ -555,8 +565,6 @@ async function hatchLocal(species: Species, name: string | null, daemonOnly: boo
throw error;
}

const baseDataDir = join(process.env.BASE_DATA_DIR?.trim() || (process.env.HOME ?? userInfo().homedir), ".vellum");

// Read the bearer token written by the daemon so the client can authenticate
// with the gateway (which requires auth by default).
let bearerToken: string | undefined;
Expand Down