fix(workflow): prompt user on resume of failed run + allow abandoning failed + add --force flag by ztech-gthb · Pull Request #1551 · coleam00/Archon

ztech-gthb · 2026-05-03T12:57:03Z

Summary

Problem: /workflow run X "task B" silently auto-resumes a prior failed run of X in the same chat, executing in the failed run's sub-worktree with the failed run's persisted user_message (= "task A"). The new prompt is discarded with no UI/log indication.
Why it matters: confident-but-wrong reports of work the user didn't ask for. Trust-corroding silent data loss; users debug nonexistent problems while the assistant runs the same stale task repeatedly.
What changed: failed runs now emit a 3-option prompt (resume / abandon-then-rerun / --force) instead of silently auto-resuming. /workflow abandon accepts failed runs (transitions them to cancelled). New --force flag bypasses the resume-detection lookup for callers who explicitly want a fresh run while keeping the failed audit row.
What did not change: paused-run auto-resume (PR 🐛 UserReportedError: Manual bug report #914 approval-gate path) is unchanged; /workflow resume <id> still accepts failed runs; DB schema unchanged; no migrations.

UX Journey

Before

User                          Archon orchestrator           DB
────                          ───────────────────           ──
runs /workflow run X "A" ───▶ findResumable... → null
                              dispatch fresh run        ───▶ run-A row (failed)

runs /workflow run X "B" ───▶ findResumable... → run-A
                              [silent foreground resume]
                              executeWorkflow(working_path=
                                  thread-of-run-A,
                                  userMessage="B"
                                  ↑ ignored: scripts read
                                  $ARTIFACTS_DIR/.X persisted
                                  from run-A)             ───▶ "completed" on
                                                                task A again

sees positive report ◀──── reports task A as success
   (still confused about
    why task B didn't happen)

After

User                          Archon orchestrator           DB
────                          ───────────────────           ──
runs /workflow run X "A" ───▶ findResumable... → null
                              dispatch fresh run        ───▶ run-A row (failed)

runs /workflow run X "B" ───▶ findResumable... → run-A
                              status === 'paused'?
                                no → emit 3-option prompt    [no new run]
                                     **return** (no dispatch)
sees prompt with: ◀────
  /workflow resume <id>           [option 1]
  /workflow abandon <id>          [option 2: works on failed now]
   then re-run command
  /workflow run X --force "B"     [option 3: keep run-A as-is]

Architecture Diagram

Before

                   ┌─ command-handler.ts (parses /workflow run)
                   │     │ args.slice(2).join(' ') = "..."
                   │     ▼
                   │  CommandResult.workflow = { definition, args }
                   │     │
orchestrator-      │     ▼
agent.ts ──────────┴─ handleWorkflowRunCommand
                         │
                         ▼
                      dispatchOrchestratorWorkflow
                         │
                         ▼
                      findResumableRunByParentConversation
                      (status IN ['failed','paused'])
                         │
                         ▼
                      resumableRun? → executeWorkflow [silent]
                                     (auto-resume regardless of status)


workflow-operations.ts: abandonWorkflow rejects ALL terminal statuses
  (= completed | failed | cancelled) — failed runs un-abandonable

After

                   ┌─ command-handler.ts (parses /workflow run)
                   │     │ NEW: also parses --force from args
                   │     ▼
                   │  CommandResult.workflow = { definition, args, force? } [~]
                   │     │
orchestrator-      │     ▼
agent.ts ──────────┴─ handleWorkflowRunCommand(..., options={force}) [~]
                         │
                         ▼
                      dispatchOrchestratorWorkflow(..., options) [~]
                         │
                         ▼
                      options.force? null : findResumableRun(...) [+]
                         │
                         ▼
                      resumableRun?
                         ├─ status==='paused' → executeWorkflow [unchanged]
                         └─ else (failed/...)
                              → platform.sendMessage(3-option prompt) [+]
                              → return [+]


workflow-operations.ts: abandonWorkflow rejects only completed | cancelled [~]
  (failed → cancelled is the user's discard action; row stays in DB)

Connection inventory:

From	To	Status	Notes
`command-handler.ts:case 'run'`	`args.findIndex('--force')`	new	flag parsing
`command-handler.ts`	`CommandResult.workflow.force`	new	type-level pass-through
`orchestrator-agent.ts:handleWorkflowRunCommand`	`dispatchOrchestratorWorkflow`	modified	new `options` param
`dispatchOrchestratorWorkflow`	`findResumableRunByParentConversation`	modified	guarded by `options.force`
`dispatchOrchestratorWorkflow`	`platform.sendMessage` (3-option prompt)	new	failed-run user prompt
`dispatchOrchestratorWorkflow`	`executeWorkflow` (paused branch)	unchanged	PR #914 path preserved
`abandonWorkflow`	`TERMINAL_WORKFLOW_STATUSES` check	removed	replaced with explicit `completed/cancelled`

Label Snapshot

Risk: risk: low
Size: size: M
Scope: core
Module: core:orchestrator, core:operations, core:db

Change Metadata

Change type: bug
Primary scope: core

Linked Issue

Closes /workflow run silently auto-resumes failed runs with stale args, hijacking fresh requests #1549
Related archon-assist silently discards edits — workflow has no persistence step #1546 — archon-assist persistence; same UX class (silent edit loss), independent mechanism
Builds on 🐛 UserReportedError: Manual bug report #914 — fix: foreground resume for interactive workflows + chat auto-resume introduced the findResumableRunByParentConversation lookup this PR refines

Validation Evidence (required)

bun run type-check    # clean across all 10 packages
bun --filter @archon/core test
# 102 pass / 0 fail
# - foreground_resume_detected: existing test, fixture switched 'failed' → 'paused'
# - failed_resume_user_prompted: new (asserts no executeWorkflow, prompt text contains all 3 commands)
# - --force flag: new (asserts findResumable not consulted, dispatchBackground fires)

bun run lint          # clean for all touched files
bun run format:check  # clean

End-to-end manual verification on a real chat thread (2026-05-03):

Existing failed run 32c786ef… from prior session stayed failed in DB.
/workflow run ztech-marimo-edit "..." produced the 3-option prompt. No silent dispatch, no executeWorkflow call observed in workflow logs.
/workflow run ztech-marimo-edit --force "..." dispatched fresh, completed success: true with new feature branch (wf/marimo-edit-1777809446). Original 32c786ef… row untouched.
/workflow abandon 32c786ef… transitioned the row failed → cancelled, completed_at populated. Subsequent /workflow run in the same chat no longer triggered the prompt.

Security Impact (required)

New permissions/capabilities? No
New external network calls? No
Secrets/tokens handling changed? No
File system access scope changed? No

The prompt text is constructed via plain string concatenation; the userMessage's embedded quotes are escaped (\") when interpolated into the suggested re-run command block. No new untrusted-input parsing path.

Compatibility / Migration

Backward compatible? Yes for the paused-run auto-resume path (PR 🐛 UserReportedError: Manual bug report #914), /workflow resume <id> for failed runs, /workflow run callers without --force. The only behavior change is: failed-run-on-fresh-/workflow run now prompts instead of silently auto-resuming — the prior behavior was silent data loss, so this is the intended fix, not a regression.
Config/env changes? No
Database migration needed? No

Human Verification (required)

Verified scenarios:

Failed-run prompt fires for status='failed' resumable, three commands appear with correct IDs interpolated.
--force skips the resume lookup entirely (verified via mockFindResumableRunByParentConversation.not.toHaveBeenCalled() in test, plus end-to-end real run).
/workflow abandon on a failed run transitions to cancelled (DB confirmed).
After abandon, /workflow run in the same conversation no longer prompts (cancelled is excluded from the resume lookup).
paused-status auto-resume continues to work in the existing test fixture.

Edge cases checked:

--force token recognized at any position in args (not just immediately after workflow name).
userMessage with embedded " is escaped in the option-3 suggested command block.

What was not verified:

A workflow that legitimately wants to retry a transient failed run silently (if such a case exists, this PR makes it require either /workflow resume <id> or --force). No such workflow has been identified in the default workflows.

Side Effects / Blast Radius (required)

Affected subsystems: core:orchestrator dispatch path, core:operations abandon validation, core:handlers workflow command parsing, types.
Potential unintended effects: a workflow author who depended on the silent failed-run auto-resume as a "retry on flaky steps" mechanism will see the new prompt. The prompt's first option (/workflow resume <id>) preserves that capability with one extra step.
Guardrails: the new orchestrator.failed_resume_user_prompted log event lets operators detect users hitting this path. The existing orchestrator.foreground_resume_detected log fires only on the (unchanged) paused branch.

Rollback Plan (required)

Fast rollback: revert this commit. Behavior returns to silent auto-resume of failed runs and abandon-rejection. No data migration needed.
Feature flags: none. The behavior change is unconditional but additive (--force is opt-in for the override case, and the prompt itself emits one user-visible message).
Observable failure symptoms if rollback is needed: users complaining the prompt is too noisy (would suggest making the prompt suppressible per-workflow), or workflows expecting silent retry-on-failure breaking (would suggest a per-workflow auto_resume_failed flag). Neither has been observed in testing.

Risks and Mitigations

Risk: a workflow author relied on silent auto-resume of failed runs as a retry mechanism.
- Mitigation: the prompt's first option offers /workflow resume <id> as a copy-pasteable command; the retry capability is preserved with one extra user step.
Risk: --force is recognized as a token anywhere in args. A user passing --force as part of literal description text ("... change the --force flag handling ...") could accidentally trigger force.
- Mitigation: known pattern in CLI tools without -- separator support. If literal collision becomes a real problem, a -- separator could be added in a follow-up. Not a regression: the flag didn't exist before, no existing input pattern is broken.
Risk: hand-maintained 3-option prompt drifts from the actual command-handler behavior over time.
- Mitigation: the new test (failed_resume_user_prompted) asserts the three command strings are present in the prompt; CI catches accidental removal. A future PR could derive the prompt from command-handler metadata, but is premature given the small surface area.

Summary by CodeRabbit

New Features
- Global --force option to start a fresh workflow run, skipping resume detection.
- /workflow resume <id> returns workflow details to trigger foreground resumption when appropriate.
Bug Fixes
- Paused runs auto-resume; failed runs prompt users with clear choices (resume, abandon+retry, or start fresh).
- Abandon now accepts running, paused, and failed runs.
- Clear error shown when a resumed workflow definition is missing.

… --force flag

coderabbitai · 2026-05-03T12:57:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fecbb006-f93a-4a57-b3cc-9ca39467b820

📥 Commits

Reviewing files that changed from the base of the PR and between f1168a0 and 332426e.

📒 Files selected for processing (2)

packages/core/src/handlers/command-handler.ts
packages/core/src/orchestrator/orchestrator-agent.ts

🚧 Files skipped from review as they are similar to previous changes (2)

packages/core/src/handlers/command-handler.ts
packages/core/src/orchestrator/orchestrator-agent.ts

📝 Walkthrough

Walkthrough

This PR prevents silent auto-resume of failed runs: /workflow run can be forced with --force to skip resume detection; only paused runs auto-resume; failed runs prompt the user (or can be explicitly resumed via resumeRunId); abandonWorkflow now allows abandoning failed runs. Command parsing, types, orchestration, and tests updated.

Changes

Auto-resume hijack fix (single cohort)

Layer / File(s)	Summary
Data Shape `packages/core/src/types/index.ts`	`CommandResult.workflow` adds optional `force?: boolean` and `resumeRunId?: string`.
Command Parsing / CLI `packages/core/src/handlers/command-handler.ts`	`/workflow run` extracts `--force` positionally (first bare arg), removes it from workflow args, and returns `workflow.force`; `/workflow resume <id>` reloads discovered workflows from the resumed run's cwd, validates the workflow still exists, and returns a `workflow` payload with `definition`, `args` (from `run.user_message ?? ''`) and `resumeRunId` (or explicit error on discovery/missing definition).
Core Orchestration Logic `packages/core/src/orchestrator/orchestrator-agent.ts`	`dispatchOrchestratorWorkflow(options?: {force?: boolean; resumeRunId?: string})` added. On web path, skips resumable-run lookup when `options.force` is true; otherwise may find a prior run and auto-resume only for `paused` runs or when `resumableRun.id === options.resumeRunId`. For other prior runs (e.g., `failed`) it stops and prompts the user with resume/abandon/force choices. Natural-language approval resume calls `dispatchOrchestratorWorkflow(..., { resumeRunId })`. Deterministic `/workflow` handling forwards `force` and `resumeRunId` into `handleWorkflowRunCommand`.
Operations Layer `packages/core/src/operations/workflow-operations.ts`	`abandonWorkflow()` now rejects only `completed` and `cancelled` statuses; it allows abandoning `failed`, `running`, and `paused` runs. Removed `TERMINAL_WORKFLOW_STATUSES` usage.
DB Docs `packages/core/src/db/workflows.ts`	Expanded doc comment for `findResumableRunByParentConversation` to document orchestrator semantics: `paused` runs auto-resume; `failed` runs cause a prompt to avoid reusing stale persisted `user_message`.
Tests `packages/core/src/orchestrator/orchestrator-agent.test.ts`, `packages/core/src/handlers/command-handler.test.ts`	Reset `mockFindResumableRunByParentConversation` in `beforeEach`. Updated foreground-resume test to use `status: 'paused'`. Added tests covering `failed`-run prompt, explicit resume via `workflow.resumeRunId`, and `--force` bypass; updated `/workflow resume` tests to assert returned `workflow.resumeRunId` and discovery behavior.

Sequence Diagram

sequenceDiagram
    actor User
    participant Handler as Command Handler
    participant Orchestrator as Orchestrator Agent
    participant DB as Database
    participant Executor as Workflow Executor

    rect rgba(0,150,0,0.5)
    Note over User,Executor: Forced fresh run (skip resume lookup)
    User->>Handler: /workflow run my-workflow "task B" --force
    Handler->>Handler: extract --force, clean args
    Handler->>Orchestrator: dispatchOrchestratorWorkflow(options:{force:true})
    Orchestrator->>Orchestrator: skip findResumableRunByParentConversation
    Orchestrator->>Executor: dispatchBackgroundWorkflow(fresh run)
    Executor-->>Orchestrator: run dispatched
    end

    rect rgba(100,150,255,0.5)
    Note over User,Executor: Paused run auto-resume
    User->>Handler: /workflow run my-workflow "continue"
    Handler->>Orchestrator: dispatchOrchestratorWorkflow(options:{force:false})
    Orchestrator->>DB: findResumableRunByParentConversation()
    DB-->>Orchestrator: {status:'paused', working_path:'...'}
    Orchestrator->>Executor: executeWorkflow(working_path from prior run)
    Executor-->>Orchestrator: run resumed
    end

    rect rgba(255,100,100,0.5)
    Note over User,Orchestrator: Failed run → user prompt
    User->>Handler: /workflow run my-workflow "task B"
    Handler->>Orchestrator: dispatchOrchestratorWorkflow(options:{force:false})
    Orchestrator->>DB: findResumableRunByParentConversation()
    DB-->>Orchestrator: {status:'failed', id:'abc123'}
    Orchestrator->>User: prompt (resume abc123 / abandon+rerun / start fresh --force)
    User-->>Handler: selects action
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

#1549: Directly addressed — prevents silent auto-resume of failed runs and adds --force/prompt behavior.
archon workflow run auto-resumes failed runs without --resume — docs say opt-in #1392: Related — implements status-aware resume behavior (paused vs failed) matching the reported scope.
Approval/reject auto-resume — deferred review follow-ups (polish + ergonomics) #1350: Related — touches approve/reject auto-resume and orchestrator resume call sites.

Possibly related PRs

coleam00/Archon#1329: Overlapping changes to orchestrator resume logic and tests.
#1065: Related — changes to handleMessage/handleWorkflowRunCommand flows that now propagate force/resumeRunId.
#? (omitted)

Poem

🐰 I hopped through worktrees, found a paused old trail,

Failed runs now knock before they take the sail.
--force clears the path, fresh starts led by light,
Abandon hears the failed, resumes when named just right. 🥕

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title accurately summarizes the three main changes: failed-run prompt, abandoning failed runs, and --force flag addition.
Description check	✅ Passed	Description follows template structure with all required sections: Summary, UX Journey, Architecture Diagram, Labels, Validation, Security, Compatibility, Verification, and Risks.
Linked Issues check	✅ Passed	All coding requirements from `#1549` are met: failed runs prompt instead of auto-resume, --force flag skips lookup, /workflow abandon accepts failed, paused auto-resume preserved.
Out of Scope Changes check	✅ Passed	All changes are scoped to the stated objectives: orchestrator dispatch logic, workflow-operations abandon validation, command-handler parsing, and type signatures for workflow control options.
Docstring Coverage	✅ Passed	Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/core/src/operations/workflow-operations.ts (1)

112-127: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Return the post-cancel state from abandonWorkflow.

This still returns the pre-update row, so callers that read run.status will continue to see failed/paused after a successful abandon. Now that failed runs are intentionally abandonable, that stale status is much easier to surface in UI/metadata.

Suggested fix

 export async function abandonWorkflow(runId: string): Promise<WorkflowRun> {
   const run = await getRunOrThrow(runId, 'operations.workflow_abandon_lookup_failed');
   if (run.status === 'completed' || run.status === 'cancelled') {
     throw new Error(`Cannot abandon run with status '${run.status}'. Run is already terminal.`);
   }
   try {
     await workflowDb.cancelWorkflowRun(runId);
   } catch (error) {
     const err = error as Error;
     getLog().error(
       { err, errorType: err.constructor.name, runId },
       'operations.workflow_abandon_failed'
     );
     throw new Error(`Failed to abandon workflow run ${runId}: ${err.message}`);
   }
-  return run;
+  return (await workflowDb.getWorkflowRun(runId)) ?? { ...run, status: 'cancelled' };
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/operations/workflow-operations.ts` around lines 112 - 127,
abandonWorkflow currently returns the pre-cancel WorkflowRun, so update it to
return the post-cancel state: after awaiting workflowDb.cancelWorkflowRun(runId)
call get the updated run (e.g. via getRunOrThrow(runId,
'operations.workflow_abandon_lookup_failed') or the appropriate workflowDb fetch
method) and return that WorkflowRun instead of the original `run`; keep the
existing error logging around workflowDb.cancelWorkflowRun and rethrow as
before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/core/src/orchestrator/orchestrator-agent.ts`:
- Around line 281-348: The code is prompting users when a prior run.status ===
'failed' even for explicit resume attempts because resumeWorkflow() doesn't mark
the intent; update the orchestration logic to detect an explicit resume request
(e.g., accept a resumeRunId or resume flag passed into the entry point that
calls orchestrator-agent) and treat resumableRun as allowed to auto-resume when
resumableRun.id === resumeRunId (or when a resume=true flag is present), so the
branch that currently shows prompt will instead call executeWorkflow(...) for
that specific resumableRun; wire the resumeRunId from the /workflow resume <id>
handler (or make that handler dispatch directly to executeWorkflow) so
resumeWorkflow() sets that identifier before re-entering the resumable-run
check.

---

Outside diff comments:
In `@packages/core/src/operations/workflow-operations.ts`:
- Around line 112-127: abandonWorkflow currently returns the pre-cancel
WorkflowRun, so update it to return the post-cancel state: after awaiting
workflowDb.cancelWorkflowRun(runId) call get the updated run (e.g. via
getRunOrThrow(runId, 'operations.workflow_abandon_lookup_failed') or the
appropriate workflowDb fetch method) and return that WorkflowRun instead of the
original `run`; keep the existing error logging around
workflowDb.cancelWorkflowRun and rethrow as before.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 13320894-c3d7-4717-a5cc-d4a2e25a008f

📥 Commits

Reviewing files that changed from the base of the PR and between 69b2c89 and 599209c.

📒 Files selected for processing (6)

packages/core/src/db/workflows.ts
packages/core/src/handlers/command-handler.ts
packages/core/src/operations/workflow-operations.ts
packages/core/src/orchestrator/orchestrator-agent.test.ts
packages/core/src/orchestrator/orchestrator-agent.ts
packages/core/src/types/index.ts

…oleam00#1552 from coleam00#1551)

…V2a prompt for explicit resume (resumeRunId option)

ztech-gthb · 2026-05-03T13:33:25Z

Good catch — fixed in latest commit. Added options.resumeRunId to dispatchOrchestratorWorkflow; the V2a check is now status === 'paused' || resumableRun.id === options?.resumeRunId. Two callers wire it: /workflow resume <id> now dispatches directly (no more 'run same workflow again' instruction that was looping users through the prompt) and the natural-language approval handler passes the paused run's id when it transitions status to 'failed' before dispatching. New regression test: resumeRunId option: failed run DOES auto-resume when CommandResult.workflow.resumeRunId matches. 103/103 in orchestrator-agent.test.ts, type-check + lint + format clean.

Wirasm · 2026-05-04T06:59:09Z

@ztech-gthb related to #1549 — workflow resume prompt fix.

Wirasm · 2026-05-04T17:36:33Z

Thanks @ztech-gthb — the architectural primitives here are right. The three-option failure prompt captures the actual decision space, and the force / resumeRunId distinction in the orchestrator avoids the prompt loop cleanly. The doc comment on abandonWorkflow is exemplary.

Three items worth tightening before merge:

1. `--force` parsing matches the flag anywhere in args (`packages/core/src/handlers/command-handler.ts:817-828`)

const forceIndex = restArgs.findIndex(a => a === '--force');

findIndex matches --force anywhere in the user's args. If someone runs:

/workflow run deploy "deploy --force to staging"

…the parser strips --force from inside the quoted message, sets force=true, and passes deploy to staging as args. Silent corruption of the user's input.

Suggested fix: only strip --force when it's a bareword in a flag-position (e.g., immediately after the workflow name, or at the end before the quoted message), not as a substring of the quoted args. Or use -- as a verbatim-args separator. The existing "be lenient" comment hints at the intent, but lenience here is unsafe.

2. Markdown escaping in the failure prompt is incomplete (`packages/core/src/orchestrator/orchestrator-agent.ts`)

const escapedMsg = userMessage.replace(/"/g, '\\"');

Only " is escaped. If the user's original message contains a backtick, the markdown code-blocks in the prompt close early and the rest renders as plain text. If it contains \, that won't be re-escaped either. Not a security issue, but a UX paper-cut whenever the user's first attempt had backticks in it.

Suggested fix: also escape backticks and backslashes, or render the failed-message as an indented code block (4-space) so backticks inside don't terminate it.

3. `/workflow resume <id>` mutates DB before checking the YAML exists (`packages/core/src/handlers/command-handler.ts`)

Order today:

resumeWorkflow(runId) — transitions DB row to running
discoverWorkflowsWithConfig(...) — load YAML
If YAML missing, error

If step 3 fails, the DB is in running state but no execution happens. The error message ("Restore the YAML and try again") is fine UX-wise, but the run sits in a half-resumed state until the user retries.

Suggested fix: discover the workflow YAML before calling resumeWorkflow, or roll the DB back when discovery fails.

CI is green and the rest of the change looks solid. Happy to merge once these three are addressed (or to merge as-is and track them as a follow-up issue if you'd rather move on — your call).

…ks/backslashes in failed-run prompt (review feedback)

ztech-gthb · 2026-05-04T18:55:19Z

Thanks @Wirasm — addressed in 332426e:

1. --force flag-position parsing (packages/core/src/handlers/command-handler.ts)
Tightened to restArgs[0] === '--force'. Worth noting: parseCommand already splits quoted strings into a single arg before the handler sees them, so the /workflow run deploy "deploy --force to staging" example resolves to args = ['run', 'deploy', 'deploy --force to staging'] — findIndex would have returned -1 there. But the principle is right: lenient matching invites surprises if parseCommand is ever changed. Strict position-0 match is the safer contract.

2. Markdown escaping (packages/core/src/orchestrator/orchestrator-agent.ts)
Extended to [\\\\\"\`] so backticks and backslashes are also escaped. The triple-backtick code-fence in the failed-run prompt is now safe against arbitrary userMessage content.

3. /workflow resume <id> — DB mutation before YAML check
I read this twice and I believe it's a non-issue. resumeWorkflow in packages/core/src/operations/workflow-operations.ts is a pure validator:

```ts
export async function resumeWorkflow(runId: string): Promise {
const run = await getRunOrThrow(runId, 'operations.workflow_resume_lookup_failed');
if (!RESUMABLE_WORKFLOW_STATUSES.includes(run.status)) {
throw new Error(...);
}
return run;
}
```

No status transition, no `runs` row update, no event insert. The actual resume happens later in `dispatchOrchestratorWorkflow` once the workflow YAML has been resolved and validated. The function name is admittedly misleading; happy to rename to `validateResumable` (or similar) in a follow-up if you'd like — but no half-resumed-state risk in current code. Let me know if I'm missing a side-effect somewhere.

fix(workflow): prompt user on resume of failed run + abandon failed +…

599209c

… --force flag

coderabbitai Bot reviewed May 3, 2026

View reviewed changes

Comment thread packages/core/src/orchestrator/orchestrator-agent.ts

ztech-gthb pushed a commit to ztech-gthb/Archon that referenced this pull request May 3, 2026

fix(orchestrator): drop --force from slash-command catalog (decouple c…

968a262

…oleam00#1552 from coleam00#1551)

ztech-gthb mentioned this pull request May 3, 2026

fix(orchestrator): catalog /workflow slash commands in system prompt #1552

Open

fix(workflow): also dispatch /workflow resume <id> directly + bypass …

60ae397

…V2a prompt for explicit resume (resumeRunId option)

ztech-gthb mentioned this pull request May 3, 2026

fix(workflows): archon-assist runs in live checkout (closes #1546) #1555

Merged

test(handlers): update /workflow resume tests for resumeRunId-pattern

f1168a0

This was referenced May 4, 2026

/workflow run silently auto-resumes failed runs with stale args, hijacking fresh requests #1549

Open

Pi/Minimax SDK errors cascade under concurrent load — needs throttling, better classification, and richer error surface #1569

Closed

fix(workflow): tighten --force flag-position parsing + escape backtic…

332426e

…ks/backslashes in failed-run prompt (review feedback)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(workflow): prompt user on resume of failed run + allow abandoning failed + add --force flag#1551

fix(workflow): prompt user on resume of failed run + allow abandoning failed + add --force flag#1551
ztech-gthb wants to merge 4 commits intocoleam00:devfrom
ztech-gthb:fix/workflow-resume-prompts-user

ztech-gthb commented May 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

ztech-gthb commented May 3, 2026

Uh oh!

Wirasm commented May 4, 2026

Uh oh!

Wirasm commented May 4, 2026

Uh oh!

ztech-gthb commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ztech-gthb commented May 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

UX Journey

Before

After

Architecture Diagram

Before

After

Label Snapshot

Change Metadata

Linked Issue

Validation Evidence (required)

Security Impact (required)

Compatibility / Migration

Human Verification (required)

Side Effects / Blast Radius (required)

Rollback Plan (required)

Risks and Mitigations

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ztech-gthb commented May 3, 2026

Uh oh!

Wirasm commented May 4, 2026

Uh oh!

Wirasm commented May 4, 2026

1. --force parsing matches the flag anywhere in args (packages/core/src/handlers/command-handler.ts:817-828)

2. Markdown escaping in the failure prompt is incomplete (packages/core/src/orchestrator/orchestrator-agent.ts)

3. /workflow resume <id> mutates DB before checking the YAML exists (packages/core/src/handlers/command-handler.ts)

Uh oh!

ztech-gthb commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ztech-gthb commented May 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 3, 2026 •

edited

Loading

1. `--force` parsing matches the flag anywhere in args (`packages/core/src/handlers/command-handler.ts:817-828`)

2. Markdown escaping in the failure prompt is incomplete (`packages/core/src/orchestrator/orchestrator-agent.ts`)

3. `/workflow resume <id>` mutates DB before checking the YAML exists (`packages/core/src/handlers/command-handler.ts`)