fix(workflow): prompt user on resume of failed run + allow abandoning failed + add --force flag#1551
fix(workflow): prompt user on resume of failed run + allow abandoning failed + add --force flag#1551ztech-gthb wants to merge 4 commits intocoleam00:devfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR prevents silent auto-resume of failed runs: ChangesAuto-resume hijack fix (single cohort)
Sequence DiagramsequenceDiagram
actor User
participant Handler as Command Handler
participant Orchestrator as Orchestrator Agent
participant DB as Database
participant Executor as Workflow Executor
rect rgba(0,150,0,0.5)
Note over User,Executor: Forced fresh run (skip resume lookup)
User->>Handler: /workflow run my-workflow "task B" --force
Handler->>Handler: extract --force, clean args
Handler->>Orchestrator: dispatchOrchestratorWorkflow(options:{force:true})
Orchestrator->>Orchestrator: skip findResumableRunByParentConversation
Orchestrator->>Executor: dispatchBackgroundWorkflow(fresh run)
Executor-->>Orchestrator: run dispatched
end
rect rgba(100,150,255,0.5)
Note over User,Executor: Paused run auto-resume
User->>Handler: /workflow run my-workflow "continue"
Handler->>Orchestrator: dispatchOrchestratorWorkflow(options:{force:false})
Orchestrator->>DB: findResumableRunByParentConversation()
DB-->>Orchestrator: {status:'paused', working_path:'...'}
Orchestrator->>Executor: executeWorkflow(working_path from prior run)
Executor-->>Orchestrator: run resumed
end
rect rgba(255,100,100,0.5)
Note over User,Orchestrator: Failed run → user prompt
User->>Handler: /workflow run my-workflow "task B"
Handler->>Orchestrator: dispatchOrchestratorWorkflow(options:{force:false})
Orchestrator->>DB: findResumableRunByParentConversation()
DB-->>Orchestrator: {status:'failed', id:'abc123'}
Orchestrator->>User: prompt (resume abc123 / abandon+rerun / start fresh --force)
User-->>Handler: selects action
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/core/src/operations/workflow-operations.ts (1)
112-127:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winReturn the post-cancel state from
abandonWorkflow.This still returns the pre-update row, so callers that read
run.statuswill continue to seefailed/pausedafter a successful abandon. Now that failed runs are intentionally abandonable, that stale status is much easier to surface in UI/metadata.Suggested fix
export async function abandonWorkflow(runId: string): Promise<WorkflowRun> { const run = await getRunOrThrow(runId, 'operations.workflow_abandon_lookup_failed'); if (run.status === 'completed' || run.status === 'cancelled') { throw new Error(`Cannot abandon run with status '${run.status}'. Run is already terminal.`); } try { await workflowDb.cancelWorkflowRun(runId); } catch (error) { const err = error as Error; getLog().error( { err, errorType: err.constructor.name, runId }, 'operations.workflow_abandon_failed' ); throw new Error(`Failed to abandon workflow run ${runId}: ${err.message}`); } - return run; + return (await workflowDb.getWorkflowRun(runId)) ?? { ...run, status: 'cancelled' }; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/operations/workflow-operations.ts` around lines 112 - 127, abandonWorkflow currently returns the pre-cancel WorkflowRun, so update it to return the post-cancel state: after awaiting workflowDb.cancelWorkflowRun(runId) call get the updated run (e.g. via getRunOrThrow(runId, 'operations.workflow_abandon_lookup_failed') or the appropriate workflowDb fetch method) and return that WorkflowRun instead of the original `run`; keep the existing error logging around workflowDb.cancelWorkflowRun and rethrow as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 281-348: The code is prompting users when a prior run.status ===
'failed' even for explicit resume attempts because resumeWorkflow() doesn't mark
the intent; update the orchestration logic to detect an explicit resume request
(e.g., accept a resumeRunId or resume flag passed into the entry point that
calls orchestrator-agent) and treat resumableRun as allowed to auto-resume when
resumableRun.id === resumeRunId (or when a resume=true flag is present), so the
branch that currently shows prompt will instead call executeWorkflow(...) for
that specific resumableRun; wire the resumeRunId from the /workflow resume <id>
handler (or make that handler dispatch directly to executeWorkflow) so
resumeWorkflow() sets that identifier before re-entering the resumable-run
check.
---
Outside diff comments:
In `@packages/core/src/operations/workflow-operations.ts`:
- Around line 112-127: abandonWorkflow currently returns the pre-cancel
WorkflowRun, so update it to return the post-cancel state: after awaiting
workflowDb.cancelWorkflowRun(runId) call get the updated run (e.g. via
getRunOrThrow(runId, 'operations.workflow_abandon_lookup_failed') or the
appropriate workflowDb fetch method) and return that WorkflowRun instead of the
original `run`; keep the existing error logging around
workflowDb.cancelWorkflowRun and rethrow as before.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 13320894-c3d7-4717-a5cc-d4a2e25a008f
📒 Files selected for processing (6)
packages/core/src/db/workflows.tspackages/core/src/handlers/command-handler.tspackages/core/src/operations/workflow-operations.tspackages/core/src/orchestrator/orchestrator-agent.test.tspackages/core/src/orchestrator/orchestrator-agent.tspackages/core/src/types/index.ts
…V2a prompt for explicit resume (resumeRunId option)
|
Good catch — fixed in latest commit. Added |
|
@ztech-gthb related to #1549 — workflow resume prompt fix. |
|
Thanks @ztech-gthb — the architectural primitives here are right. The three-option failure prompt captures the actual decision space, and the Three items worth tightening before merge: 1.
|
…ks/backslashes in failed-run prompt (review feedback)
|
Thanks @Wirasm — addressed in 332426e: 1. 2. Markdown escaping ( 3. ```ts No status transition, no `runs` row update, no event insert. The actual resume happens later in `dispatchOrchestratorWorkflow` once the workflow YAML has been resolved and validated. The function name is admittedly misleading; happy to rename to `validateResumable` (or similar) in a follow-up if you'd like — but no half-resumed-state risk in current code. Let me know if I'm missing a side-effect somewhere. |
Summary
/workflow run X "task B"silently auto-resumes a prior failed run ofXin the same chat, executing in the failed run's sub-worktree with the failed run's persisteduser_message(= "task A"). The new prompt is discarded with no UI/log indication.--force) instead of silently auto-resuming./workflow abandonaccepts failed runs (transitions them tocancelled). New--forceflag bypasses the resume-detection lookup for callers who explicitly want a fresh run while keeping the failed audit row./workflow resume <id>still accepts failed runs; DB schema unchanged; no migrations.UX Journey
Before
After
Architecture Diagram
Before
After
Connection inventory:
command-handler.ts:case 'run'args.findIndex('--force')command-handler.tsCommandResult.workflow.forceorchestrator-agent.ts:handleWorkflowRunCommanddispatchOrchestratorWorkflowoptionsparamdispatchOrchestratorWorkflowfindResumableRunByParentConversationoptions.forcedispatchOrchestratorWorkflowplatform.sendMessage(3-option prompt)dispatchOrchestratorWorkflowexecuteWorkflow(paused branch)abandonWorkflowTERMINAL_WORKFLOW_STATUSEScheckcompleted/cancelledLabel Snapshot
risk: lowsize: Mcorecore:orchestrator,core:operations,core:dbChange Metadata
bugcoreLinked Issue
archon-assistpersistence; same UX class (silent edit loss), independent mechanismfix: foreground resume for interactive workflows + chat auto-resumeintroduced thefindResumableRunByParentConversationlookup this PR refinesValidation Evidence (required)
End-to-end manual verification on a real chat thread (2026-05-03):
32c786ef…from prior session stayedfailedin DB./workflow run ztech-marimo-edit "..."produced the 3-option prompt. No silent dispatch, noexecuteWorkflowcall observed in workflow logs./workflow run ztech-marimo-edit --force "..."dispatched fresh, completedsuccess: truewith new feature branch (wf/marimo-edit-1777809446). Original32c786ef…row untouched./workflow abandon 32c786ef…transitioned the rowfailed → cancelled,completed_atpopulated. Subsequent/workflow runin the same chat no longer triggered the prompt.Security Impact (required)
The prompt text is constructed via plain string concatenation; the
userMessage's embedded quotes are escaped (\") when interpolated into the suggested re-run command block. No new untrusted-input parsing path.Compatibility / Migration
paused-run auto-resume path (PR 🐛 UserReportedError: Manual bug report #914),/workflow resume <id>for failed runs,/workflow runcallers without--force. The only behavior change is: failed-run-on-fresh-/workflow runnow prompts instead of silently auto-resuming — the prior behavior was silent data loss, so this is the intended fix, not a regression.Human Verification (required)
Verified scenarios:
--forceskips the resume lookup entirely (verified viamockFindResumableRunByParentConversation.not.toHaveBeenCalled()in test, plus end-to-end real run)./workflow abandonon a failed run transitions tocancelled(DB confirmed)./workflow runin the same conversation no longer prompts (cancelled is excluded from the resume lookup).paused-status auto-resume continues to work in the existing test fixture.Edge cases checked:
--forcetoken recognized at any position in args (not just immediately after workflow name).userMessagewith embedded"is escaped in the option-3 suggested command block.What was not verified:
failedrun silently (if such a case exists, this PR makes it require either/workflow resume <id>or--force). No such workflow has been identified in the default workflows.Side Effects / Blast Radius (required)
core:orchestratordispatch path,core:operationsabandon validation,core:handlersworkflow command parsing, types./workflow resume <id>) preserves that capability with one extra step.orchestrator.failed_resume_user_promptedlog event lets operators detect users hitting this path. The existingorchestrator.foreground_resume_detectedlog fires only on the (unchanged) paused branch.Rollback Plan (required)
--forceis opt-in for the override case, and the prompt itself emits one user-visible message).auto_resume_failedflag). Neither has been observed in testing.Risks and Mitigations
/workflow resume <id>as a copy-pasteable command; the retry capability is preserved with one extra user step.--forceis recognized as a token anywhere in args. A user passing--forceas part of literal description text ("... change the --force flag handling ...") could accidentally trigger force.--separator support. If literal collision becomes a real problem, a--separator could be added in a follow-up. Not a regression: the flag didn't exist before, no existing input pattern is broken.failed_resume_user_prompted) asserts the three command strings are present in the prompt; CI catches accidental removal. A future PR could derive the prompt from command-handler metadata, but is premature given the small surface area.Summary by CodeRabbit
New Features
/workflow resume <id>returns workflow details to trigger foreground resumption when appropriate.Bug Fixes