Skip to content

QA Video Automation: Recording, Finalization & File-backed Attachments#6935

Closed
Jasonnnz wants to merge 74 commits into
mainfrom
feature/qa-video-automation
Closed

QA Video Automation: Recording, Finalization & File-backed Attachments#6935
Jasonnnz wants to merge 74 commits into
mainfrom
feature/qa-video-automation

Conversation

@Jasonnnz
Copy link
Copy Markdown
Contributor

@Jasonnnz Jasonnnz commented Feb 23, 2026

Summary

Enables Velly to run QA/test workflows through computer-use sessions, record the screen, and return a playable + draggable video in chat. Adds QA intent detection, IPC contract extensions, daemon-side finalization handling with summary + attachment injection, macOS screen recording lifecycle, file-backed attachment storage with HTTP range-request streaming, chat video playback with drag-to-Finder support, and configurable retention cleanup.

Changes

  • QA Intent Detection: detectQaIntent() pattern-matches QA/test commands and natural-language requests (narrowed to avoid false positives on common words like "check")
  • IPC Contract Extensions: reportToSessionId, qaMode on CuSessionCreate and TaskRouted; conversationId on TaskSubmit; retentionDays, captureScope, includeAudio on TaskRouted; new CuSessionFinalized message with recording metadata
  • Daemon Routing: Task routing and escalation handlers detect QA intent, set qaMode + reportToSessionId, pass retention/capture config
  • Client QA Wiring: AppDelegate+Sessions.swift reads qaMode from task_routed, creates ScreenRecorder, passes QA params + retention + capture config to ComputerUseSession; sends conversationId in task_submit
  • Daemon Finalization Handler: handleCuSessionFinalized creates file-backed attachment (always, for cleanup), injects summary + attachment into source chat (when reportToSessionId present). Uses recordingTracked flag to catch all edge cases.
  • File-backed Attachment Schema: New storageKind, filePath, sha256, expiresAt columns; createFileBackedAttachment() for large files; backward-compatible
  • Content Streaming Endpoint: GET /v1/attachments/:id/content with RFC 7233 range request support
  • macOS Recording Lifecycle: ScreenRecorder using ScreenCaptureKit; integrated into Session.swift; configurable retention + capture scope + audio from daemon config
  • Capture Scope Resolution: When captureScope is "window", resolves topmost non-self CGWindowID (excludes own PID + filters to layer 0); when "display", passes CGMainDisplayID(). Falls back gracefully.
  • Chat Video UX: InlineVideoAttachmentView fetches from /content for file-backed attachments; .onDrag for drag-to-Finder
  • Retention Cleanup: Configurable qaRecording.defaultRetentionDays; background cleanup task; orphan recordings tracked in DB
  • Cancel Safety: Deferred abort with safety-net timer; timer properly disarmed via Task.isCancelled guard
  • Metadata Lifecycle: Socket close handler owns socketToCuSession cleanup; targeted cleanup on session replacement
  • Recording Fallback: recordingTracked flag ensures all untracked recordings get DB entries (missing conversation, empty summary, etc.)

Milestone PRs

Fix PRs

Known limitations (deferred to M3/M4)

  • QA intent detection is regex-based — Works for direct commands but misses nuanced requests ("smoke test onboarding"). LLM-based semantic classification planned for M3.
  • Capture config is daemon-config-only, not per-requestcaptureScope and includeAudio always come from daemon config. Per-request overrides planned for M3.
  • Chat injection requires conversationId — If absent on task_submit, recording is tracked for cleanup but not injected into chat. Product decision deferred.
  • Saved QA scripts (M3) and scheduled self-QA (M4) remain unimplemented.

Project issue

Closes #6899

Test plan

  • Verify IPC contract snapshot test passes
  • Verify file-backed attachment CRUD tests pass
  • Verify attachment content route tests (range requests, 206/416 responses)
  • Verify recording cleanup tests
  • Verify cu_session_finalized handler tests (7 tests including edge cases)
  • Manual: send "test the onboarding flow" and verify QA intent triggers recording
  • Manual: send "check my email" and verify QA mode does NOT trigger
  • Manual: verify recording appears as playable video in chat with drag support
  • Manual: verify retention cleanup removes expired recordings
  • Manual: set qaRecording.captureScope to "window" and verify target app window is captured (not Vellum)

Generated with Claude Code

@Jasonnnz Jasonnnz self-assigned this Feb 23, 2026
devin-ai-integration[bot]

This comment was marked as resolved.

chatgpt-codex-connector[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@Jasonnnz
Copy link
Copy Markdown
Contributor Author

Addressed review feedback in #6978:

@Jasonnnz Jasonnnz force-pushed the feature/qa-video-automation branch 5 times, most recently from 57ff36f to cf0992c Compare February 23, 2026 21:44
Jasonnnz and others added 14 commits February 23, 2026 18:43
…Create (#6907)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude <noreply@anthropic.com>
…bing (#6909)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude <noreply@anthropic.com>
…#6910)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…nt (#6911)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude <noreply@anthropic.com>
…PCMessages (#6912)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude <noreply@anthropic.com>
…r RFC 7233 (#6913)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude <noreply@anthropic.com>
… drag support (#6916)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
… QA video (#6919)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
…eanup (#6920)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude <noreply@anthropic.com>
… to storage_kind (#6924)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude <noreply@anthropic.com>
…tion (#6926)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude <noreply@anthropic.com>
…metadata leak (#6931)

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Jasonnnz and others added 11 commits February 23, 2026 21:35
…7456)

Two fixes in mdfindApp:

1. Register process.terminationHandler BEFORE calling process.run()
   to prevent a race where mdfind exits before the handler is set,
   causing withCheckedContinuation to never resume and hang forever.

2. Escape single quotes in app names before interpolating into the
   mdfind Spotlight query string, preventing query breakage for names
   like "Kate's App".

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Add word boundary (\b) before action verb group in contextPattern to
  prevent partial matches like "login" matching "in" or "domain" matching "in"
- Switch Cursor from simple \bcursor\b to contextPattern('cursor') so
  "move cursor to the submit button" no longer triggers a match
- Switch Zoom from simple \bzoom\b to contextPattern('zoom') so
  "zoom in on the chart" no longer triggers a match
- Reorder iTerm before Terminal in APP_HINTS array so "open terminal in
  iterm2" resolves to iTerm instead of Terminal
- Add regression tests for all five false-positive scenarios

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When qaMode is true and the frontmost guard hits its 3-consecutive-block
limit, finalizeQARecording() and CuSessionAbortMessage were sent inside
handleAction, then sent AGAIN by the post-loop code in run(). This caused
duplicate cu_session_finalized and cu_session_abort messages to the daemon.

Add a didFinalizeQARecording boolean flag that is:
- Declared as a private var on the session
- Reset to false at the start of run()
- Set to true at the end of finalizeQARecording()
- Checked in the post-loop to skip redundant finalization and abort

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Move QA intent/opt-out detection in handleUserMessage to after the
secret-ingress and queue-rejection checks. Previously, the latch was
mutated before those guards, so a message blocked by checkIngressForSecrets
or rejected by enqueueMessage would still flip the QA latch — causing
incorrect qaMode/requiresRecording state for subsequent escalations.

Also adds two contract tests documenting the invariant that rejected and
blocked messages must not mutate the latch.

Addresses review feedback from #7424.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The canonical mapping ['gmail', 'mail'] incorrectly collapsed Gmail and
Mail into the same canonical key. This caused taskExplicitlyRequestsCrossApp()
to record only one app mention for tasks involving both Gmail and Mail,
preventing the cross-app escape path from activating.

Change the mapping to ['gmail', 'gmail'] so Gmail retains its own
canonical identity, allowing cross-app workflows like "copy from Gmail
into Mail" to be detected correctly.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
In cross-app workflows (e.g., "copy from Chrome and paste into Safari"),
the targetAppBundleId was unconditionally injected when the LLM didn't
provide one. This caused the target app's bundle ID to be attached to
requests for a different app, leading to wrong app activation since
bundle ID takes priority in openApp resolution.

Now only injects targetAppBundleId when the requested app matches the
target app (or no app name is specified).

Addresses review feedback from #7434.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The constructor log was deriving qaMode from requiresRecording, but
these are independent IPC fields — a session can be in QA mode without
requiring recording. Pass qaMode as a separate constructor parameter
from msg.qaMode and log it directly.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add Bundle.main.bundleIdentifier fallback in Session.swift's inline
isExternalTarget check so SPM builds (where bundleIdentifier is nil)
correctly identify Vellum-targeting sessions instead of always treating
them as external.

Extend isExternalAppTarget in AppDelegate+Sessions.swift to accept a
name parameter alongside bundleId. When bundleId is nil but targetAppName
is set (e.g., Slack, Chrome), the name-based check prevents Vellum from
stealing focus. Both call sites now pass routed.targetAppName.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Process.arguments bypasses the shell, so the shell-style single-quote
escaping ('\'') was passed raw to mdfind, breaking Spotlight query
parsing. Switch to double-quoted query values with proper double-quote
escaping instead.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…7465)

The didFinalizeQARecording flag was only set at the very end of
finalizeQARecording(), missing the early return path when recording
was required but no artifact was produced. This could cause the
post-loop code to call finalizeQARecording() again redundantly.

Use a defer block at the top of the method to guarantee the flag
is always set regardless of which return path is taken.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…7466)

When targetAppBundleId is nil but targetAppName is set (e.g. "Chrome"),
the inline isExternalTarget check incorrectly evaluated to false, causing
the Chrome accessibility NSAlert to steal focus from the app under test.
Now falls back to targetAppName matching, consistent with the external
app detection logic in AppDelegate+Sessions.swift.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

Jasonnnz and others added 3 commits February 23, 2026 22:25
Addresses review feedback on PR #7464: the sanitization only escaped
double quotes but not backslashes. Since Spotlight's query parser treats
`\` as an escape character inside double-quoted strings, an app name
containing a backslash would produce a malformed query. Escape
backslashes before double quotes to ensure correct nesting.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Add strictVisualQa flag to CU session, IPC contract, and client
- Block AppleScript (except target app activation) in strict mode
- Block focus-stealing keys (Cmd+H/M/Tab) during focus-lock sessions
- Reject completion if target app was never observed frontmost
- Add first-frame handshake to ScreenRecorder for capture pipeline validation
- Validate salvaged recordings for playability before tracking
- Add screen recording permission gate with polling for QA sessions
- Suppress title-bar minimize during focus-lock targeting Vellum
- Improve activation: use activateIgnoringOtherApps, unhide, deminiaturize
- Add requireExactAppMatch to ActionExecutor to prevent fuzzy matching
- Tighten QA routing: require positive GUI cue, narrow overly broad patterns
- Add clearAllQaLatches and call on session reset to prevent unbounded growth
- Respect QA opt-out for latch-based activation in escalation handler
- Add Restart menu item to menu bar
- Broaden Vellum target-app hint to match bare "vellum" mention
- Fix schema migration: storage_kind column to NOT NULL

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 38 additional findings in Devin Review.

Open in Devin Review

export const APP_HINTS: AppHintEntry[] = [
// Vellum (our app — highest priority)
{
patterns: [/\b(vellum|velly)\s+(desktop\s+)?app\b/, /\b(vellum|velly)\s+assistant\b/, /\bvellum\b/i],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Overly broad /\bvellum\b/i catch-all pattern causes false-positive target-app locking

The third pattern in the Vellum APP_HINTS entry (/\bvellum\b/i) matches any task text containing the standalone word "vellum", regardless of context. Because the Vellum entry is the first in the APP_HINTS array, it takes priority over all other app entries.

Root Cause & Impact

The function's own docstring states "This is intentionally conservative: only high-confidence patterns should lock the CU session to an app." However, /\bvellum\b/i is a bare-word match that fires on any mention of "vellum" — including non-app-reference contexts like:

  • "test Chrome's rendering of vellum.ai pages" → locks to Vellum instead of Chrome
  • "QA the login flow at vellum.ai in Safari" → locks to Vellum instead of Safari
  • "search for vellum paper types" → locks to Vellum (no app intended)

When the session is locked to the wrong app, the daemon-side isTargetAppMatch in assistant/src/daemon/computer-use-session.ts:685-706 blocks opening the intended app, and the client-side frontmost-app guard blocks actions when the correct app is in front. In strictVisualQa mode, this causes the session to fail immediately.

The other two patterns in the Vellum entry are properly contextualized (/\b(vellum|velly)\s+(desktop\s+)?app\b/ requires "app" after the name, and /\b(vellum|velly)\s+assistant\b/ requires "assistant"). Only the third pattern is problematic.

Suggested change
patterns: [/\b(vellum|velly)\s+(desktop\s+)?app\b/, /\b(vellum|velly)\s+assistant\b/, /\bvellum\b/i],
patterns: [/\b(vellum|velly)\s+(desktop\s+)?app\b/, /\b(vellum|velly)\s+assistant\b/, contextPattern('vellum')],
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Vellum Assistant and others added 6 commits February 24, 2026 10:30
…mation

# Conflicts:
#	assistant/src/config/schema.ts
#	assistant/src/daemon/ipc-contract-inventory.json
#	assistant/src/daemon/ipc-contract.ts
#	assistant/src/tools/assets/search.ts
#	clients/macos/vellum-assistant/ComputerUse/ActionExecutor.swift
- Add AXElementRegistry to map element IDs to AXUIElements during AX tree enumeration
- Add AXActionExecutor with AX-native click/type/focus via kAXPressAction
- Add FocusManager with multi-strategy retry (unhide, activate, AX raise)
- Wire AX-first path in Session execute loop with CGEvent fallback
- Make openApp fail-closed in strict QA mode via focusAcquireFailed error
- Replace inline frontmost guard with FocusManager.acquireVerifiedFocus
- Add post-open_app and post-action focus drift detection (terminal in strict mode)
- Add FocusReliabilityTests covering registry, focus, executor, and strict QA

Co-Authored-By: Claude <noreply@anthropic.com>
Add a bundled screen-recording SKILL.md that teaches the model about
screen recording capabilities, lifecycle, retention, and limitations.
Awareness-only — no tools, just contextual knowledge for the model.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add a bundled qa-testing skill that composes screen-recording and
media-processing via includes. This is an awareness-only skill (no
TOOLS.json) that teaches the model about QA workflows, strict focus
management, recording lifecycle, and post-session video analysis.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Instantiate ScreenRecorder when either qaMode or requiresRecording is
true, allowing screen recording to be used independently of QA mode.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…8021)

Add optional `requiresRecording` field to `TaskSubmit` IPC message type
so callers can explicitly request screen recording without triggering QA
mode. The daemon routing in misc.ts now uses `msg.requiresRecording` as
an override, falling back to the existing QA-based computation when the
field is not set.

Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
devin-ai-integration[bot]

This comment was marked as resolved.

- Add RUNTIME_HTTP_PORT to daemon env forwarding list in cli/local.ts
- Remove localHttpEnabled feature flag gate on RUNTIME_HTTP_PORT in AppDelegate.swift
- Always include RUNTIME_HTTP_PORT with default 7821 in AssistantCli.swift env

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 48 additional findings in Devin Review.

Open in Devin Review

Comment on lines +270 to +271
sizeBytes: msg.recording.sizeBytes,
filePath: msg.recording.localPath,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 cu_session_finalized allows arbitrary file-path attachment creation and file exfiltration

handleCuSessionFinalized persists file-backed attachments using the client-supplied msg.recording.localPath directly, without validating that the path is inside the recordings directory or otherwise trusted. The /v1/attachments/:id/content route later serves file-backed attachments by reading attachment.filePath from disk (assistant/src/runtime/routes/attachment-routes.ts:168-232).

Root Cause

The handler writes attacker-controlled paths into attachment metadata:

  • Injection path: filePath: msg.recording.localPath when creating linked attachment (assistant/src/daemon/handlers/computer-use.ts:270-271)
  • Fallback path: same unvalidated assignment in orphan-tracking path (assistant/src/daemon/handlers/computer-use.ts:353-354)

Because content serving trusts DB filePath, any authenticated client that can send cu_session_finalized can point an attachment at arbitrary readable files (for example, /etc/passwd or user secrets) and then fetch them via the new content endpoint.

Actual behavior: daemon accepts arbitrary localPath and later streams that file.

Expected behavior: daemon should only accept canonicalized paths within the managed recordings directory (or require a server-generated handle), rejecting out-of-root or non-recording paths.

Impact: local file disclosure / privilege boundary bypass through the attachment-content API.

Prompt for agents
In assistant/src/daemon/handlers/computer-use.ts, harden handleCuSessionFinalized so msg.recording.localPath cannot reference arbitrary files. In both createFileBackedAttachment call sites (around lines 266-274 and 349-356), canonicalize and validate the path before use: resolve realpath (or normalized absolute path), ensure it is inside the expected recordings root (from platform/data-dir config, e.g. <AppSupport>/vellum-assistant/recordings), reject symlink escapes and traversal, and verify the file exists and is a regular file. If validation fails, log a security warning and skip attachment creation. Also consider adding a helper function in this file for path validation to avoid duplication and keep behavior consistent between the primary and fallback attachment paths. Finally, add tests covering path traversal and out-of-root rejection in assistant/src/__tests__/cu-session-finalized.test.ts and/or attachment-content route tests.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@Jasonnnz
Copy link
Copy Markdown
Contributor Author

This feature branch/PR has been quite the hill climb. Havent been able to get QA use case across due to CU not detecting. And trying to extract features out of this is proving to be extremely hard lol. Will cut my losses, extract the larger features out and do this smaller scale and just merge the branches sooner.

Still need to tweak the final QA usecase to see if we can get CU to properly handle it, but will work on bringing in screen recording capabilities first (as well as in chat viewing of recordings)

- Add recording-intent detector (recording-intent.ts) to trigger recording
  without requiring QA intent — "record my screen" now works standalone
- Route recording-intent tasks to computer_use interaction type
- Decouple reportToSessionId from qaMode so video attaches back to chat
  for both QA and standalone recording sessions
- Only apply strictVisualQa guardrails when actually in QA mode
- Add allowSelfEnumeration to AccessibilityTreeProviding so CU sessions
  targeting Vellum itself enumerate the correct window
- Auto-detect self-targeted sessions in Session.swift init and configure
  the enumerator accordingly
- Add self-target focus drift detection — warn the agent when frontmost
  app doesn't match the self-targeted session
- Rename finalizeQARecording → finalizeRecording and use
  shouldFinalizeRecording computed property for clarity
- Add conversationId to TaskSubmit IPC contract
- Add channel_readiness IPC message types and snapshot tests
- Update IPC generated Swift types and snapshot expectations

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 49 additional findings in Devin Review.

Open in Devin Review

// Vellum (our app — highest priority)
{
patterns: [/\b(vellum|velly)\s+(desktop\s+)?app\b/, /\b(vellum|velly)\s+assistant\b/, /\bvellum\b/i],
appName: 'Vellum',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 APP_HINTS returns appName 'Vellum' but tests and macOS app expect 'Vellum Assistant'

The Vellum entry in APP_HINTS at assistant/src/daemon/target-app-hints.ts:46 sets appName: 'Vellum', but the corresponding tests at assistant/src/__tests__/target-app-hints.test.ts:9,14,19 all expect appName: 'Vellum Assistant'. The function resolveComputerUseTargetAppHint returns { appName: entry.appName, bundleId: entry.bundleId } directly from the table, so the tests will fail.

Root Cause and Impact

The APP_HINTS table entry for Vellum uses appName: 'Vellum' but the actual macOS application name is "Vellum Assistant" (bundle ID com.vellum.vellum-assistant). Three tests explicitly assert appName: 'Vellum Assistant'.

Beyond failing tests, appName propagates through the system:

  • It's sent as targetAppName in CuSessionCreate and TaskRouted messages
  • It's injected into the CU system prompt via buildComputerUseSystemPrompt (assistant/src/config/computer-use-prompt.ts:26)
  • The Swift client's isExternalAppTarget at AppDelegate+Sessions.swift:79 checks both "vellum" and "vellum assistant" — returning 'Vellum' instead of 'Vellum Assistant' could cause incorrect external/internal classification
  • The ActionExecutor.appAliases maps "vellum""Vellum", but the actual .app bundle may be named "Vellum Assistant.app", causing openApp to fail to find the app in filesystem search paths
Suggested change
appName: 'Vellum',
appName: 'Vellum Assistant',
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@Jasonnnz Jasonnnz closed this Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

QA Video Automation: Ad Hoc Recording + File-backed Attachments

1 participant