QA Video Automation: Recording, Finalization & File-backed Attachments#6935
QA Video Automation: Recording, Finalization & File-backed Attachments#6935Jasonnnz wants to merge 74 commits into
Conversation
|
Addressed review feedback in #6978:
|
57ff36f to
cf0992c
Compare
…Create (#6907) Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude <noreply@anthropic.com>
…bing (#6909) Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude <noreply@anthropic.com>
…#6910) Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…nt (#6911) Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude <noreply@anthropic.com>
…PCMessages (#6912) Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude <noreply@anthropic.com>
…r RFC 7233 (#6913) Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude <noreply@anthropic.com>
… drag support (#6916) Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
… QA video (#6919) Co-authored-by: Vellum Assistant <assistant@vellum.ai>
…eanup (#6920) Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude <noreply@anthropic.com>
… to storage_kind (#6924) Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude <noreply@anthropic.com>
…tion (#6926) Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude <noreply@anthropic.com>
…metadata leak (#6931) Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude <noreply@anthropic.com>
…7456) Two fixes in mdfindApp: 1. Register process.terminationHandler BEFORE calling process.run() to prevent a race where mdfind exits before the handler is set, causing withCheckedContinuation to never resume and hang forever. 2. Escape single quotes in app names before interpolating into the mdfind Spotlight query string, preventing query breakage for names like "Kate's App". Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Add word boundary (\b) before action verb group in contextPattern to
prevent partial matches like "login" matching "in" or "domain" matching "in"
- Switch Cursor from simple \bcursor\b to contextPattern('cursor') so
"move cursor to the submit button" no longer triggers a match
- Switch Zoom from simple \bzoom\b to contextPattern('zoom') so
"zoom in on the chart" no longer triggers a match
- Reorder iTerm before Terminal in APP_HINTS array so "open terminal in
iterm2" resolves to iTerm instead of Terminal
- Add regression tests for all five false-positive scenarios
Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When qaMode is true and the frontmost guard hits its 3-consecutive-block limit, finalizeQARecording() and CuSessionAbortMessage were sent inside handleAction, then sent AGAIN by the post-loop code in run(). This caused duplicate cu_session_finalized and cu_session_abort messages to the daemon. Add a didFinalizeQARecording boolean flag that is: - Declared as a private var on the session - Reset to false at the start of run() - Set to true at the end of finalizeQARecording() - Checked in the post-loop to skip redundant finalization and abort Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Move QA intent/opt-out detection in handleUserMessage to after the secret-ingress and queue-rejection checks. Previously, the latch was mutated before those guards, so a message blocked by checkIngressForSecrets or rejected by enqueueMessage would still flip the QA latch — causing incorrect qaMode/requiresRecording state for subsequent escalations. Also adds two contract tests documenting the invariant that rejected and blocked messages must not mutate the latch. Addresses review feedback from #7424. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The canonical mapping ['gmail', 'mail'] incorrectly collapsed Gmail and Mail into the same canonical key. This caused taskExplicitlyRequestsCrossApp() to record only one app mention for tasks involving both Gmail and Mail, preventing the cross-app escape path from activating. Change the mapping to ['gmail', 'gmail'] so Gmail retains its own canonical identity, allowing cross-app workflows like "copy from Gmail into Mail" to be detected correctly. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
In cross-app workflows (e.g., "copy from Chrome and paste into Safari"), the targetAppBundleId was unconditionally injected when the LLM didn't provide one. This caused the target app's bundle ID to be attached to requests for a different app, leading to wrong app activation since bundle ID takes priority in openApp resolution. Now only injects targetAppBundleId when the requested app matches the target app (or no app name is specified). Addresses review feedback from #7434. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The constructor log was deriving qaMode from requiresRecording, but these are independent IPC fields — a session can be in QA mode without requiring recording. Pass qaMode as a separate constructor parameter from msg.qaMode and log it directly. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add Bundle.main.bundleIdentifier fallback in Session.swift's inline isExternalTarget check so SPM builds (where bundleIdentifier is nil) correctly identify Vellum-targeting sessions instead of always treating them as external. Extend isExternalAppTarget in AppDelegate+Sessions.swift to accept a name parameter alongside bundleId. When bundleId is nil but targetAppName is set (e.g., Slack, Chrome), the name-based check prevents Vellum from stealing focus. Both call sites now pass routed.targetAppName. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Process.arguments bypasses the shell, so the shell-style single-quote
escaping ('\'') was passed raw to mdfind, breaking Spotlight query
parsing. Switch to double-quoted query values with proper double-quote
escaping instead.
Co-authored-by: Vellum Assistant <assistant@vellum.ai>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…7465) The didFinalizeQARecording flag was only set at the very end of finalizeQARecording(), missing the early return path when recording was required but no artifact was produced. This could cause the post-loop code to call finalizeQARecording() again redundantly. Use a defer block at the top of the method to guarantee the flag is always set regardless of which return path is taken. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…7466) When targetAppBundleId is nil but targetAppName is set (e.g. "Chrome"), the inline isExternalTarget check incorrectly evaluated to false, causing the Chrome accessibility NSAlert to steal focus from the app under test. Now falls back to targetAppName matching, consistent with the external app detection logic in AppDelegate+Sessions.swift. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Addresses review feedback on PR #7464: the sanitization only escaped double quotes but not backslashes. Since Spotlight's query parser treats `\` as an escape character inside double-quoted strings, an app name containing a backslash would produce a malformed query. Escape backslashes before double quotes to ensure correct nesting. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Add strictVisualQa flag to CU session, IPC contract, and client - Block AppleScript (except target app activation) in strict mode - Block focus-stealing keys (Cmd+H/M/Tab) during focus-lock sessions - Reject completion if target app was never observed frontmost - Add first-frame handshake to ScreenRecorder for capture pipeline validation - Validate salvaged recordings for playability before tracking - Add screen recording permission gate with polling for QA sessions - Suppress title-bar minimize during focus-lock targeting Vellum - Improve activation: use activateIgnoringOtherApps, unhide, deminiaturize - Add requireExactAppMatch to ActionExecutor to prevent fuzzy matching - Tighten QA routing: require positive GUI cue, narrow overly broad patterns - Add clearAllQaLatches and call on session reset to prevent unbounded growth - Respect QA opt-out for latch-based activation in escalation handler - Add Restart menu item to menu bar - Broaden Vellum target-app hint to match bare "vellum" mention - Fix schema migration: storage_kind column to NOT NULL Co-Authored-By: Claude <noreply@anthropic.com>
| export const APP_HINTS: AppHintEntry[] = [ | ||
| // Vellum (our app — highest priority) | ||
| { | ||
| patterns: [/\b(vellum|velly)\s+(desktop\s+)?app\b/, /\b(vellum|velly)\s+assistant\b/, /\bvellum\b/i], |
There was a problem hiding this comment.
🟡 Overly broad /\bvellum\b/i catch-all pattern causes false-positive target-app locking
The third pattern in the Vellum APP_HINTS entry (/\bvellum\b/i) matches any task text containing the standalone word "vellum", regardless of context. Because the Vellum entry is the first in the APP_HINTS array, it takes priority over all other app entries.
Root Cause & Impact
The function's own docstring states "This is intentionally conservative: only high-confidence patterns should lock the CU session to an app." However, /\bvellum\b/i is a bare-word match that fires on any mention of "vellum" — including non-app-reference contexts like:
"test Chrome's rendering of vellum.ai pages"→ locks to Vellum instead of Chrome"QA the login flow at vellum.ai in Safari"→ locks to Vellum instead of Safari"search for vellum paper types"→ locks to Vellum (no app intended)
When the session is locked to the wrong app, the daemon-side isTargetAppMatch in assistant/src/daemon/computer-use-session.ts:685-706 blocks opening the intended app, and the client-side frontmost-app guard blocks actions when the correct app is in front. In strictVisualQa mode, this causes the session to fail immediately.
The other two patterns in the Vellum entry are properly contextualized (/\b(vellum|velly)\s+(desktop\s+)?app\b/ requires "app" after the name, and /\b(vellum|velly)\s+assistant\b/ requires "assistant"). Only the third pattern is problematic.
| patterns: [/\b(vellum|velly)\s+(desktop\s+)?app\b/, /\b(vellum|velly)\s+assistant\b/, /\bvellum\b/i], | |
| patterns: [/\b(vellum|velly)\s+(desktop\s+)?app\b/, /\b(vellum|velly)\s+assistant\b/, contextPattern('vellum')], |
Was this helpful? React with 👍 or 👎 to provide feedback.
…mation # Conflicts: # assistant/src/config/schema.ts # assistant/src/daemon/ipc-contract-inventory.json # assistant/src/daemon/ipc-contract.ts # assistant/src/tools/assets/search.ts # clients/macos/vellum-assistant/ComputerUse/ActionExecutor.swift
- Add AXElementRegistry to map element IDs to AXUIElements during AX tree enumeration - Add AXActionExecutor with AX-native click/type/focus via kAXPressAction - Add FocusManager with multi-strategy retry (unhide, activate, AX raise) - Wire AX-first path in Session execute loop with CGEvent fallback - Make openApp fail-closed in strict QA mode via focusAcquireFailed error - Replace inline frontmost guard with FocusManager.acquireVerifiedFocus - Add post-open_app and post-action focus drift detection (terminal in strict mode) - Add FocusReliabilityTests covering registry, focus, executor, and strict QA Co-Authored-By: Claude <noreply@anthropic.com>
Add a bundled screen-recording SKILL.md that teaches the model about screen recording capabilities, lifecycle, retention, and limitations. Awareness-only — no tools, just contextual knowledge for the model. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Add a bundled qa-testing skill that composes screen-recording and media-processing via includes. This is an awareness-only skill (no TOOLS.json) that teaches the model about QA workflows, strict focus management, recording lifecycle, and post-session video analysis. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Instantiate ScreenRecorder when either qaMode or requiresRecording is true, allowing screen recording to be used independently of QA mode. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…8021) Add optional `requiresRecording` field to `TaskSubmit` IPC message type so callers can explicitly request screen recording without triggering QA mode. The daemon routing in misc.ts now uses `msg.requiresRecording` as an override, falling back to the existing QA-based computation when the field is not set. Co-authored-by: Vellum Assistant <assistant@vellum.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
- Add RUNTIME_HTTP_PORT to daemon env forwarding list in cli/local.ts - Remove localHttpEnabled feature flag gate on RUNTIME_HTTP_PORT in AppDelegate.swift - Always include RUNTIME_HTTP_PORT with default 7821 in AssistantCli.swift env Co-Authored-By: Claude <noreply@anthropic.com>
| sizeBytes: msg.recording.sizeBytes, | ||
| filePath: msg.recording.localPath, |
There was a problem hiding this comment.
🔴 cu_session_finalized allows arbitrary file-path attachment creation and file exfiltration
handleCuSessionFinalized persists file-backed attachments using the client-supplied msg.recording.localPath directly, without validating that the path is inside the recordings directory or otherwise trusted. The /v1/attachments/:id/content route later serves file-backed attachments by reading attachment.filePath from disk (assistant/src/runtime/routes/attachment-routes.ts:168-232).
Root Cause
The handler writes attacker-controlled paths into attachment metadata:
- Injection path:
filePath: msg.recording.localPathwhen creating linked attachment (assistant/src/daemon/handlers/computer-use.ts:270-271) - Fallback path: same unvalidated assignment in orphan-tracking path (
assistant/src/daemon/handlers/computer-use.ts:353-354)
Because content serving trusts DB filePath, any authenticated client that can send cu_session_finalized can point an attachment at arbitrary readable files (for example, /etc/passwd or user secrets) and then fetch them via the new content endpoint.
Actual behavior: daemon accepts arbitrary localPath and later streams that file.
Expected behavior: daemon should only accept canonicalized paths within the managed recordings directory (or require a server-generated handle), rejecting out-of-root or non-recording paths.
Impact: local file disclosure / privilege boundary bypass through the attachment-content API.
Prompt for agents
In assistant/src/daemon/handlers/computer-use.ts, harden handleCuSessionFinalized so msg.recording.localPath cannot reference arbitrary files. In both createFileBackedAttachment call sites (around lines 266-274 and 349-356), canonicalize and validate the path before use: resolve realpath (or normalized absolute path), ensure it is inside the expected recordings root (from platform/data-dir config, e.g. <AppSupport>/vellum-assistant/recordings), reject symlink escapes and traversal, and verify the file exists and is a regular file. If validation fails, log a security warning and skip attachment creation. Also consider adding a helper function in this file for path validation to avoid duplication and keep behavior consistent between the primary and fallback attachment paths. Finally, add tests covering path traversal and out-of-root rejection in assistant/src/__tests__/cu-session-finalized.test.ts and/or attachment-content route tests.
Was this helpful? React with 👍 or 👎 to provide feedback.
|
This feature branch/PR has been quite the hill climb. Havent been able to get QA use case across due to CU not detecting. And trying to extract features out of this is proving to be extremely hard lol. Will cut my losses, extract the larger features out and do this smaller scale and just merge the branches sooner. Still need to tweak the final QA usecase to see if we can get CU to properly handle it, but will work on bringing in screen recording capabilities first (as well as in chat viewing of recordings) |
- Add recording-intent detector (recording-intent.ts) to trigger recording without requiring QA intent — "record my screen" now works standalone - Route recording-intent tasks to computer_use interaction type - Decouple reportToSessionId from qaMode so video attaches back to chat for both QA and standalone recording sessions - Only apply strictVisualQa guardrails when actually in QA mode - Add allowSelfEnumeration to AccessibilityTreeProviding so CU sessions targeting Vellum itself enumerate the correct window - Auto-detect self-targeted sessions in Session.swift init and configure the enumerator accordingly - Add self-target focus drift detection — warn the agent when frontmost app doesn't match the self-targeted session - Rename finalizeQARecording → finalizeRecording and use shouldFinalizeRecording computed property for clarity - Add conversationId to TaskSubmit IPC contract - Add channel_readiness IPC message types and snapshot tests - Update IPC generated Swift types and snapshot expectations Co-Authored-By: Claude <noreply@anthropic.com>
| // Vellum (our app — highest priority) | ||
| { | ||
| patterns: [/\b(vellum|velly)\s+(desktop\s+)?app\b/, /\b(vellum|velly)\s+assistant\b/, /\bvellum\b/i], | ||
| appName: 'Vellum', |
There was a problem hiding this comment.
🟡 APP_HINTS returns appName 'Vellum' but tests and macOS app expect 'Vellum Assistant'
The Vellum entry in APP_HINTS at assistant/src/daemon/target-app-hints.ts:46 sets appName: 'Vellum', but the corresponding tests at assistant/src/__tests__/target-app-hints.test.ts:9,14,19 all expect appName: 'Vellum Assistant'. The function resolveComputerUseTargetAppHint returns { appName: entry.appName, bundleId: entry.bundleId } directly from the table, so the tests will fail.
Root Cause and Impact
The APP_HINTS table entry for Vellum uses appName: 'Vellum' but the actual macOS application name is "Vellum Assistant" (bundle ID com.vellum.vellum-assistant). Three tests explicitly assert appName: 'Vellum Assistant'.
Beyond failing tests, appName propagates through the system:
- It's sent as
targetAppNameinCuSessionCreateandTaskRoutedmessages - It's injected into the CU system prompt via
buildComputerUseSystemPrompt(assistant/src/config/computer-use-prompt.ts:26) - The Swift client's
isExternalAppTargetatAppDelegate+Sessions.swift:79checks both"vellum"and"vellum assistant"— returning'Vellum'instead of'Vellum Assistant'could cause incorrect external/internal classification - The
ActionExecutor.appAliasesmaps"vellum"→"Vellum", but the actual.appbundle may be named "Vellum Assistant.app", causingopenAppto fail to find the app in filesystem search paths
| appName: 'Vellum', | |
| appName: 'Vellum Assistant', |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary
Enables Velly to run QA/test workflows through computer-use sessions, record the screen, and return a playable + draggable video in chat. Adds QA intent detection, IPC contract extensions, daemon-side finalization handling with summary + attachment injection, macOS screen recording lifecycle, file-backed attachment storage with HTTP range-request streaming, chat video playback with drag-to-Finder support, and configurable retention cleanup.
Changes
detectQaIntent()pattern-matches QA/test commands and natural-language requests (narrowed to avoid false positives on common words like "check")reportToSessionId,qaModeonCuSessionCreateandTaskRouted;conversationIdonTaskSubmit;retentionDays,captureScope,includeAudioonTaskRouted; newCuSessionFinalizedmessage with recording metadataAppDelegate+Sessions.swiftreads qaMode from task_routed, createsScreenRecorder, passes QA params + retention + capture config toComputerUseSession; sendsconversationIdin task_submithandleCuSessionFinalizedcreates file-backed attachment (always, for cleanup), injects summary + attachment into source chat (when reportToSessionId present). UsesrecordingTrackedflag to catch all edge cases.storageKind,filePath,sha256,expiresAtcolumns;createFileBackedAttachment()for large files; backward-compatibleGET /v1/attachments/:id/contentwith RFC 7233 range request supportScreenRecorderusing ScreenCaptureKit; integrated intoSession.swift; configurable retention + capture scope + audio from daemon configcaptureScopeis"window", resolves topmost non-selfCGWindowID(excludes own PID + filters to layer 0); when"display", passesCGMainDisplayID(). Falls back gracefully.InlineVideoAttachmentViewfetches from/contentfor file-backed attachments;.onDragfor drag-to-FinderqaRecording.defaultRetentionDays; background cleanup task; orphan recordings tracked in DBTask.isCancelledguardrecordingTrackedflag ensures all untracked recordings get DB entries (missing conversation, empty summary, etc.)Milestone PRs
Fix PRs
Known limitations (deferred to M3/M4)
captureScopeandincludeAudioalways come from daemon config. Per-request overrides planned for M3.Project issue
Closes #6899
Test plan
Generated with Claude Code