Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion clients/macos/vellum-assistant/ComputerUse/Session.swift
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ final class ComputerUseSession: ObservableObject {
private var isPaused = false
private var confirmationContinuation: CheckedContinuation<Bool, Never>?
private var messageLoopTask: Task<Void, Never>?
private var cancelSafetyNetTask: Task<Void, Never>?

private let enumerator: AccessibilityTreeProviding
private let screenCapture: ScreenCaptureProviding
Expand Down Expand Up @@ -182,6 +183,11 @@ final class ComputerUseSession: ObservableObject {
} else {
state = .failed(reason: "No focused window and screen capture failed")
logger.finishSession(result: "failed: no window")
// Disarm the cancel safety net — run() reached the post-loop and will
// handle finalization + abort itself.
cancelSafetyNetTask?.cancel()
cancelSafetyNetTask = nil

// Finalize QA recording BEFORE sending abort — the daemon's handleCuSessionAbort
// deletes cuSessionMetadata, so cu_session_finalized must arrive first for
// summary injection to work.
Expand Down Expand Up @@ -276,6 +282,11 @@ final class ComputerUseSession: ObservableObject {
}
}

// Disarm the cancel safety net — run() reached the post-loop and will
// handle finalization + abort itself.
cancelSafetyNetTask?.cancel()
cancelSafetyNetTask = nil

// Finalize QA recording and send cu_session_finalized
if qaMode {
await finalizeQARecording()
Expand Down Expand Up @@ -1050,7 +1061,7 @@ final class ComputerUseSession: ObservableObject {
// Deferred abort: give run() a chance to send finalization first,
// but guarantee abort eventually fires as a safety net in case
// run() never reaches the post-loop block (e.g., throws or gets stuck).
Task { @MainActor in
cancelSafetyNetTask = Task { @MainActor in
try? await Task.sleep(nanoseconds: 2_000_000_000) // 2 seconds
guard self.isCancelled else { return } // in case state changed
Comment on lines +1064 to 1066
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Return when canceling the safety-net task

Canceling cancelSafetyNetTask in run() does not actually disarm it because the task body ignores Task.sleep cancellation (try? await Task.sleep(...)) and continues immediately; with isCancelled == true, it still sends cu_session_abort before finalizeQARecording() finishes, so the original metadata-loss race can still occur on slow finalization paths. This is triggered specifically when the new post-loop cancelSafetyNetTask?.cancel() path runs during QA cancellation.

Useful? React with 👍 / 👎.

try? self.daemonClient.send(CuSessionAbortMessage(sessionId: self.id))
Comment on lines +1064 to 1067
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Safety-net task cancellation is ineffective — abort still fires after cancel() on the Task

The safety-net Task created in cancel() uses try? await Task.sleep(...) which swallows CancellationError. After the sleep is interrupted by cancelSafetyNetTask?.cancel(), execution continues to the guard self.isCancelled check — but self.isCancelled is true (set by cancel()), so the guard passes and cu_session_abort is sent anyway.

Root Cause: Task.isCancelled vs self.isCancelled

When run() reaches line 287 and calls cancelSafetyNetTask?.cancel(), the cooperative cancellation sets the task's cancellation flag. The Task.sleep at Session.swift:1065 throws CancellationError, but try? swallows it. The task then continues to line 1066:

guard self.isCancelled else { return }

Since self.isCancelled was set to true by cancel() at Session.swift:1055, this guard passes. The task then sends CuSessionAbortMessage at line 1067.

Critically, run() then awaits finalizeQARecording() at line 291, which is a suspension point. This gives the cancelled safety-net task a chance to resume on @MainActor and send the abort before finalizeQARecording() completes — exactly the race condition the PR is supposed to fix.

The fix should check Task.isCancelled inside the safety-net task body, e.g.:

cancelSafetyNetTask = Task { @MainActor in
    try? await Task.sleep(nanoseconds: 2_000_000_000)
    guard !Task.isCancelled else { return }  // <-- check cooperative cancellation
    guard self.isCancelled else { return }
    try? self.daemonClient.send(CuSessionAbortMessage(sessionId: self.id))
}

Impact: The cu_session_abort message can still arrive at the daemon before cu_session_finalized, causing the daemon's handleCuSessionAbort to delete cuSessionMetadata before the summary can be injected — the exact bug this PR was meant to prevent.

(Refers to lines 1064-1068)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Expand Down
Loading