Remove .vellum writes during remote setup, add Hatch! button with hatching animation#4832
Conversation
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
⚙️ Control Options:
|
|
DevinAI resolve conflicts from main |
66b8fe8 to
05497a8
Compare
| await exec("gcloud", [ | ||
| "auth", | ||
| "activate-service-account", | ||
| `--key-file=${envKeyPath}`, | ||
| ]); | ||
| const project = process.env.GCP_PROJECT; | ||
| if (project) { | ||
| await exec("gcloud", ["config", "set", "project", project]); | ||
| } |
There was a problem hiding this comment.
DevinAI I would love to avoid activating the user's service account, and instead, use environment variables set by the desktop app to drive behavior
There was a problem hiding this comment.
Updated - the CLI now skips gcloud auth activate-service-account entirely when GOOGLE_APPLICATION_CREDENTIALS is set (which the desktop app sets). No more explicit activation.
| private var continueButton: some View { | ||
| Button(action: { saveAndContinue() }) { | ||
| Text("Continue") | ||
| Text(isAws || isCustomHardware ? "Continue" : "Hatch!") |
There was a problem hiding this comment.
DevinAI always use Hatch!
There was a problem hiding this comment.
Fixed - button now always says "Hatch!" for all cloud providers.
| const startTime = Date.now(); | ||
| try { | ||
| await activateGcpCredentialsFromConfig(); | ||
| await activateGcpCredentials(); |
There was a problem hiding this comment.
DevinAI let's remove this method entirely and use --account= on all gcloud commands based on the email found in the credentials file
|
|
||
| let entryFile = FileManager.default.temporaryDirectory | ||
| .appendingPathComponent("vellum-hatch-entry-\(ProcessInfo.processInfo.processIdentifier).json") | ||
| env["VELLUM_HATCH_ENTRY_FILE"] = entryFile.path |
There was a problem hiding this comment.
DevinAI rebase with main, we no longer track this
5d0309f to
8cbabbf
Compare
|
DevinAI rebase this pr with main to resolve conflicts |
…ching animation - Stop writing to .vellum/workspace/config.json during GCP/AWS/self-hosted onboarding - Store cloud credentials in OnboardingState memory instead - Change Continue button to Hatch! on GCP credentials page - Add HatchingStepView with egg animation and live stdout streaming - Update CLILauncher to support --remote hatch with env var credential passing - Update CLI hatch.ts to read GCP credentials from env vars (VELLUM_GCP_SA_KEY_PATH) Co-Authored-By: vargas@vellum.ai <vargas@vellum.ai>
…on, always show Hatch! button Co-Authored-By: vargas@vellum.ai <vargas@vellum.ai>
… and VELLUM_HATCH_ENTRY_FILE Co-Authored-By: vargas@vellum.ai <vargas@vellum.ai>
| let apiKey = APIKeyManager.getKey() ?? "" | ||
|
|
||
| let config = CLILauncher.RemoteHatchConfig( | ||
| remote: state.cloudProvider, |
There was a problem hiding this comment.
🔴 cloudProvider "customHardware" doesn't match CLI's expected "custom" remote value
When the user selects "Custom Hardware" in the onboarding flow, HostingMode.customHardware.rawValue is "customHardware", which gets stored as state.cloudProvider. This value flows through to CLILauncher.RemoteHatchConfig.remote and is passed as --remote customHardware to the CLI.
Root Cause and Impact
The CLI's VALID_REMOTE_HOSTS at cli/src/lib/constants.ts:3 is ["local", "gcp", "aws", "custom"] — it expects "custom", not "customHardware". The CLI argument parser at cli/src/commands/hatch.ts:169 will reject "customHardware" with an error.
Additionally, CLILauncher.swift:124 checks config.remote == "custom" to decide whether to set VELLUM_CUSTOM_HOST and VELLUM_SSH_KEY_PATH environment variables. Since config.remote is "customHardware", this branch is never taken, so even if the CLI accepted the value, the required environment variables would be missing.
Impact: The "Custom Hardware" hosting option is completely broken — clicking "Hatch!" will always fail with a CLI validation error.
Prompt for agents
The HostingMode enum in clients/macos/vellum-assistant/Features/Onboarding/APIKeyStepView.swift uses rawValue "customHardware" (line 7), but the CLI expects "custom" (cli/src/lib/constants.ts:3 VALID_REMOTE_HOSTS). The CLILauncher.swift also checks config.remote == "custom" at line 124.
Fix option 1: In HatchingStepView.swift startHatching(), map the cloudProvider value before passing it to RemoteHatchConfig. Change line 183 from:
remote: state.cloudProvider
to:
remote: state.cloudProvider == "customHardware" ? "custom" : state.cloudProvider
Fix option 2 (cleaner): Add a computed property to OnboardingState that maps cloudProvider to the CLI remote value, or change HostingMode.customHardware to have rawValue "custom" (but this would affect the isCustomHardware check in CloudCredentialsStepView which compares against "customHardware").
Was this helpful? React with 👍 or 👎 to provide feedback.
| if config.remote == "gcp" { | ||
| if !config.gcpProjectId.isEmpty { | ||
| env["GCP_PROJECT"] = config.gcpProjectId | ||
| } | ||
| if !config.gcpServiceAccountKey.isEmpty { | ||
| let tmpKeyPath = FileManager.default.temporaryDirectory | ||
| .appendingPathComponent("vellum-sa-key-\(ProcessInfo.processInfo.processIdentifier).json") | ||
| try config.gcpServiceAccountKey.write(to: tmpKeyPath, atomically: true, encoding: .utf8) | ||
| env["GOOGLE_APPLICATION_CREDENTIALS"] = tmpKeyPath.path | ||
|
|
||
| if let data = config.gcpServiceAccountKey.data(using: .utf8), | ||
| let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any], | ||
| let email = json["client_email"] as? String { | ||
| env["GCP_ACCOUNT_EMAIL"] = email | ||
| } | ||
| } |
There was a problem hiding this comment.
🔴 GCP_DEFAULT_ZONE not set in environment, causing GCP hatch to always fail
The CLI's hatchGcp function requires the GCP_DEFAULT_ZONE environment variable at cli/src/commands/hatch.ts:455-458 and will process.exit(1) if it's not set. However, CLILauncher.runRemoteHatch never sets this variable.
Root Cause and Impact
In CLILauncher.swift:104-119, the GCP branch sets GCP_PROJECT, GOOGLE_APPLICATION_CREDENTIALS, and GCP_ACCOUNT_EMAIL, but there is no GCP_DEFAULT_ZONE being set. The macOS onboarding UI (CloudCredentialsStepView.swift) also has no field for zone selection.
When the CLI runs hatchGcp, it reads process.env.GCP_DEFAULT_ZONE at line 455 of hatch.ts, finds it undefined, prints an error, and exits with code 1.
Impact: GCP hatching from the macOS onboarding flow will always fail with "Error: GCP_DEFAULT_ZONE environment variable is not set." The user has no way to provide this value through the UI.
Prompt for agents
The GCP hatch flow requires GCP_DEFAULT_ZONE but the macOS app never sets it. Two things need to happen:
1. Add a zone field to the GCP credentials UI in clients/macos/vellum-assistant/Features/Onboarding/CloudCredentialsStepView.swift (similar to the gcpProjectId field), and add a corresponding gcpZone property to OnboardingState.swift.
2. In clients/macos/vellum-assistant/App/CLILauncher.swift, inside the `if config.remote == "gcp"` block (around line 104-119), add:
env["GCP_DEFAULT_ZONE"] = config.gcpZone (or a sensible default like "us-central1-a")
Also add gcpZone to the RemoteHatchConfig struct at line 66-75.
Was this helpful? React with 👍 or 👎 to provide feedback.
| } | ||
| } | ||
|
|
||
| proc.waitUntilExit() |
There was a problem hiding this comment.
🔴 runRemoteHatch blocks the main thread via @mainactor + waitUntilExit()
CLILauncher is annotated @MainActor (line 6-7), so runRemoteHatch executes on the main thread. proc.waitUntilExit() at line 173 is a synchronous blocking call that won't return until the CLI process finishes — which can take many minutes for a GCP hatch.
Root Cause and Impact
In HatchingStepView.swift:193, Task.detached is used to call cliLauncher.runRemoteHatch(...). However, since CLILauncher is @MainActor, Swift's concurrency system will hop to the main actor to execute this method. Once on the main actor, proc.waitUntilExit() at CLILauncher.swift:173 blocks the main thread.
This means the entire UI freezes: the egg wobble animation stops, the log output doesn't update (even though readabilityHandler callbacks fire on background threads, the Task { @MainActor in ... } blocks in the output handler can't execute because the main actor is blocked), and the app becomes unresponsive.
The existing runHatch() method (line 29-63) has the same pattern but is called with try? await from a fire-and-forget Task in the old code, so it was less visible. The new runRemoteHatch is specifically designed to show live output, making the main-thread blocking directly contradictory to its purpose.
Impact: The hatching animation and live log streaming UI will freeze for the entire duration of the hatch process (potentially 10+ minutes), making the app appear hung.
Prompt for agents
The issue is that CLILauncher is @MainActor, so proc.waitUntilExit() blocks the main thread. Fix by wrapping the blocking call in a background context. In clients/macos/vellum-assistant/App/CLILauncher.swift, replace line 173:
proc.waitUntilExit()
with something like:
await withCheckedContinuation { continuation in
DispatchQueue.global().async {
proc.waitUntilExit()
continuation.resume()
}
}
Alternatively, use proc.terminationHandler to get a callback when the process exits, and bridge that to async/await. This ensures the main thread stays free for UI updates and the readabilityHandler callbacks can dispatch to MainActor.
Was this helpful? React with 👍 or 👎 to provide feedback.
| let tmpKeyPath = FileManager.default.temporaryDirectory | ||
| .appendingPathComponent("vellum-sa-key-\(ProcessInfo.processInfo.processIdentifier).json") | ||
| try config.gcpServiceAccountKey.write(to: tmpKeyPath, atomically: true, encoding: .utf8) | ||
| env["GOOGLE_APPLICATION_CREDENTIALS"] = tmpKeyPath.path | ||
|
|
||
| if let data = config.gcpServiceAccountKey.data(using: .utf8), | ||
| let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any], | ||
| let email = json["client_email"] as? String { | ||
| env["GCP_ACCOUNT_EMAIL"] = email | ||
| } | ||
| } | ||
| } else if config.remote == "aws" { | ||
| if !config.awsRoleArn.isEmpty { | ||
| env["VELLUM_AWS_ROLE_ARN"] = config.awsRoleArn | ||
| } | ||
| } else if config.remote == "custom" { | ||
| if !config.sshHost.isEmpty { | ||
| let hostString = config.sshUser.isEmpty | ||
| ? config.sshHost | ||
| : "\(config.sshUser)@\(config.sshHost)" | ||
| env["VELLUM_CUSTOM_HOST"] = hostString | ||
| } | ||
| if !config.sshPrivateKey.isEmpty { | ||
| let tmpKeyPath = FileManager.default.temporaryDirectory | ||
| .appendingPathComponent("vellum-ssh-key-\(ProcessInfo.processInfo.processIdentifier)") | ||
| try config.sshPrivateKey.write(to: tmpKeyPath, atomically: true, encoding: .utf8) | ||
| try FileManager.default.setAttributes( | ||
| [.posixPermissions: 0o600], | ||
| ofItemAtPath: tmpKeyPath.path | ||
| ) | ||
| env["VELLUM_SSH_KEY_PATH"] = tmpKeyPath.path |
There was a problem hiding this comment.
🔴 GCP service account key and SSH private key temp files are never cleaned up
In CLILauncher.runRemoteHatch, sensitive credential files are written to the temporary directory but never deleted after the process completes.
Details
At CLILauncher.swift:109-112, the GCP service account key JSON is written to vellum-sa-key-<pid>.json in the temp directory. At CLILauncher.swift:132-134, the SSH private key is written to vellum-ssh-key-<pid>. Neither file is cleaned up after proc.waitUntilExit() returns.
The old activateGcpCredentialsFromConfig in hatch.ts (now removed) had a finally block that called unlinkSync(keyPath) to clean up the temp key file. The new Swift code has no equivalent cleanup.
Impact: Sensitive credentials (GCP service account keys, SSH private keys) persist on disk in the temp directory indefinitely. While macOS may eventually clean /tmp, the FileManager.default.temporaryDirectory maps to a per-user directory that is not automatically purged.
Prompt for agents
In clients/macos/vellum-assistant/App/CLILauncher.swift, the runRemoteHatch method writes sensitive credentials to temp files but never cleans them up. Add cleanup using defer or a finally-style pattern.
After line 141 (end of the env setup block), track the temp file paths that were created. Then after proc.waitUntilExit() (line 173) and before checking the termination status, add cleanup code like:
// Clean up temp credential files
for path in tempFilesToCleanup {
try? FileManager.default.removeItem(at: path)
}
Alternatively, use a defer block right after writing each temp file to ensure cleanup happens even if an error is thrown.
Was this helpful? React with 👍 or 👎 to provide feedback.
8cbabbf to
a1cb20f
Compare
|
Rebased onto main and resolved conflicts. Force pushed. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8cbabbfb4d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| } | ||
| } | ||
| state.isHatching = true | ||
| state.advance() |
There was a problem hiding this comment.
Defer onboarding completion until hatching actually finishes
Calling state.advance() here moves currentStep from 2 to 3 immediately after isHatching is set, which still satisfies the existing completion condition in OnboardingFlowView (newStep > maxOnboardingStep, where max is 2). In the remote-host flow this can trigger onComplete and close onboarding before the new hatching UI has a chance to run to completion, so users are taken into the app without reliable hatch progress/error handling.
Useful? React with 👍 / 👎.
| let apiKey = APIKeyManager.getKey() ?? "" | ||
|
|
||
| let config = CLILauncher.RemoteHatchConfig( | ||
| remote: state.cloudProvider, |
There was a problem hiding this comment.
Normalize custom hardware provider before passing --remote
This forwards state.cloudProvider verbatim to the CLI, but onboarding uses customHardware while the CLI parser only accepts custom (VALID_REMOTE_HOSTS in cli/src/lib/constants.ts). Selecting Custom Hardware therefore invokes hatch --remote customHardware, which is rejected during argument parsing and causes hatching to fail immediately for that path.
Useful? React with 👍 / 👎.
| } | ||
| } | ||
|
|
||
| proc.waitUntilExit() |
There was a problem hiding this comment.
Move process waiting off the MainActor
CLILauncher is @MainActor, so this synchronous waitUntilExit() blocks the UI thread for the full remote hatch duration. While blocked, the wobble animation and log updates queued back to @MainActor cannot render in real time, which breaks the intended live hatching experience.
Useful? React with 👍 / 👎.
Stops the macOS onboarding flow from writing to
~/.vellum/during GCP/AWS/self-hosted setups by holding credentials in-memory onOnboardingStateand passing them to the CLI via environment variables. Adds a "Hatch!" button that triggersvellum-cli hatch --remote <provider>with a newHatchingStepViewshowing an egg wobble animation and live stdout streaming, and replacesgcloud auth activate-service-accountwith--account=<client_email>on all gcloud commands.