Skip to content

feat(pdf): PDF enhancement with OCR, unified progress UI, and code display fixes#893

Closed
Milofax wants to merge 31 commits intocoleam00:mainfrom
Milofax:feature/pdf-enhancement-upstream
Closed

feat(pdf): PDF enhancement with OCR, unified progress UI, and code display fixes#893
Milofax wants to merge 31 commits intocoleam00:mainfrom
Milofax:feature/pdf-enhancement-upstream

Conversation

@Milofax
Copy link
Copy Markdown

@Milofax Milofax commented Dec 3, 2025

Summary

Complete PDF enhancement feature including:

  • PDF Text Extraction: pymupdf4llm as primary extractor with better markdown and code block detection, pdfplumber/PyPDF2 as fallbacks
  • OCR Support: Tesseract OCR for image-based PDFs with automatic fallback when no text detected
  • Unified Progress UI: PDF upload now shows same live stats experience as web crawl (Pages Extracted, Chunks, Code Blocks)
  • Code Display Fix: Fixed HTML entity rendering in code examples (&lt;<)
  • Re-embed Service: Settings-based batch size and proper cancellation support
  • Embedding Model Warning: Dialog when changing embedding model with existing documents

Key Changes

Backend

  • document_processing.py - pymupdf4llm integration with page-by-page progress
  • ocr_processing.py - Tesseract OCR with per-page progress
  • code_storage_service.py - HTML entity decoding before storage
  • re_embed_service.py - Improved batch processing
  • progress_mapper.py - Extended progress ranges for better UX

Frontend

  • CrawlingProgress.tsx - Dynamic stats grid for upload vs crawl
  • ContentViewer.tsx - Fixed Prism double-escaping issue
  • RAGSettings.tsx - Embedding model change warning dialog

Tests

  • 6 backend tests for HTML entity decoding
  • E2E tests with PDF fixtures (book-sample.pdf, coding-sample.pdf)
  • Platform compatibility tests for OCR on x86_64 and ARM64

Test plan

  • Upload a text-based PDF → verify progress shows pages/chunks/code blocks
  • Upload an image-based PDF → verify OCR fallback works
  • View code examples in Knowledge Base → verify no &lt; entities shown
  • Change embedding model with existing docs → verify warning dialog appears
  • Run E2E tests: npm run test:e2e in archon-ui-main

🤖 Generated with Claude Code

Milofax and others added 30 commits November 21, 2025 19:50
- Implement Ollama integration for LLM and embeddings
- Add OllamaConfigurationPanel for UI configuration
- Add model discovery service for automatic model detection
- Update RAG settings to support Ollama embeddings
- Add documentation (INFRASTRUCTURE.md, PLAN.md, QUICKSTART.md)
- Add E2E tests for Ollama functionality

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
…if block

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
- Add test for Ollama with auth token
- Add test for Ollama without auth token (local setup)
- Update fallback test to expect 'required-but-ignored' placeholder
- Add RAG settings mock to multi-provider sequence test

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
Security fixes:
- Remove hardcoded JWT token from INFRASTRUCTURE.md (replace with placeholder)
- Remove real Ollama auth tokens from PLAN.md (replace with placeholders)
- Fix secrets leak: Change RAG settings logging from info to debug level, log only keys

Functional fixes:
- Fix URL normalization for auth-token mapping (both incoming and configured URLs now use same normalization)

Best practices:
- Add test-results/ to .gitignore
- Remove test-results from version control

All 22 Python tests pass, server healthy.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
PR coleam00#875: Ollama Support with Local Model Discovery
- Security fixes: JWT tokens, auth tokens, RAG settings logging
- Bug fixes: URL normalization, None-handling
- Best practices: test-results gitignore

All 7 items marked as ERLEDIGT (2025-11-21)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add auth_token parameter to validate_provider_instance() function
- Load auth tokens from RAG settings in /validate endpoint
- Match incoming URLs against configured LLM_BASE_URL and OLLAMA_EMBEDDING_URL
- Apply correct auth token (chat or embedding) based on URL matching
- Add tests for auth token flow through all layers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
When "Use same host for embedding instance" is checked, now also copies
useAuth and authToken from LLM config to embedding config, not just the
URL and name.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Extract syncEmbeddingFromLLM() into testable utility function
- Add 6 tests covering: full sync, default name, custom default,
  auth settings copy, empty token handling, immutability
- Refactor RAGSettings to use the utility function

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Merges OpenRouter embedding provider feature from upstream.
Resolves conflict in llm_provider_service.py by keeping auth_token parameter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Addresses CodeRabbit review: CHAT_MODEL was undefined because
MODEL_CHOICE is the actual key used. Now uses helper functions
that correctly resolve the displayed model name.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add useAuth/authToken to instance config resets (RAGSettings.tsx)
- Clear auth tokens from ragSettings when deleting instances
- Fix type hint: dict[str, any] -> dict[str, Any] (llm_provider_service.py)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- ArchonChatPanel: Remove BookOpen, Search
- BugReportModal: Remove ExternalLink
- ErrorBoundaryWithBugReport: Remove React (use named imports)
- CodeViewerModal: Remove FileText
- DisconnectScreenOverlay: Remove Wifi, WifiOff
- APIKeysSection: Remove Key, Input

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add pymupdf4llm>=0.2.5 as primary PDF extraction library
- Produces clean Markdown with proper word separation
- Falls back to pdfplumber, then PyPDF2 if needed
- Validated: pymupdf4llm extracts "Since 1996, Bloomberg" vs
  pdfplumber's "Since1996,Bloomberg" (no spaces)

Phase0 validated, Phase1 complete. Phase2 (OCR) pending.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…de block protection

- Add heading boundary (\n#) as highest priority breakpoint
- Protect code blocks from being split mid-block
- Add is_inside_code_block() helper to check if position is within ```...```
- Only break at paragraphs/sentences when outside code blocks

This improves RAG quality for Markdown documents (especially from pymupdf4llm)
by preserving semantic structure during chunking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…dels

BREAKING: Search now supports all embedding dimensions (384, 768, 1024, 1536, 3072)

Previously search was hardcoded to use 1536-dim functions, which meant:
- snowflake-arctic-embed2 (1024-dim) → NO RESULTS
- nomic-embed-text (768-dim) → NO RESULTS
- Only OpenAI text-embedding-3-small (1536-dim) worked

Fix:
- Detect dimension from query_embedding: len(query_embedding)
- Use _multi RPC functions with embedding_dimension parameter
- base_search_strategy.py: match_archon_*_multi
- hybrid_search_strategy.py: hybrid_search_*_multi

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds automatic OCR extraction for scanned/image-based PDFs that don't
contain extractable text layers.

New files:
- ocr_processing.py: Tesseract OCR engine wrapper
  - extract_text_with_ocr(): Convert PDF pages to images, run OCR
  - is_ocr_available(): Check for pytesseract/pdf2image
  - get_supported_languages(): List installed Tesseract languages

Changes:
- document_processing.py: OCR fallback when text extraction fails
- pyproject.toml: Add pytesseract>=0.3.10, pdf2image>=1.17.0

Flow: pymupdf4llm → pdfplumber → PyPDF2 → Tesseract OCR

Requires system dependencies:
- macOS: brew install tesseract poppler
- Linux: apt install tesseract-ocr poppler-utils

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add tesseract-ocr and poppler-utils to Dockerfile.server for OCR support
- Add GitHub Actions platform-compatibility job for ARM64 + x86_64 testing
- Native tests on both architectures without QEMU emulation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add pytest tests for document_processing.py (6 tests)
- Add pytest tests for ocr_processing.py (6 tests)
- Add Playwright E2E tests for PDF upload flow
- Add test:e2e scripts to package.json

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- JSON-first batch response parsing with text fallback
- Character-based token estimation (~3-4 chars/token)
- Remove duplicate credential lookups
- Fix ruff linting errors (exception chaining, unused imports)
- Update test for multi-dimensional RPC function names

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add pdf2image for PDF to image conversion
- Add pytesseract for Tesseract OCR integration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Phase 4 of PDF Enhancement: RAG Code Quality + Embedding-Kompatibilität

Backend:
- Add embedding_model_filter parameter through all search layers
  (base_search_strategy, hybrid_search_strategy, agentic_rag, rag_service)
- SQL migration: 012_add_embedding_model_filter.sql with filter functions
- Automatically filter search results by current embedding model

Frontend:
- Warning Dialog when changing embedding model with existing documents
- Uses Radix AlertDialog pattern (doesn't close on overlay click)
- Checks knowledgeService for document existence before showing warning
- Options: Cancel (revert) or Change Anyway (proceed with warning)
- Toast notification on successful model change

Testing:
- 9 new backend tests for embedding_model_filter in test_rag_strategies.py
- 8 E2E tests for dialog behavior (7 passed, 1 conditional skip)
- Tests: dialog display, cancel, change anyway, no-dialog, Ollama modal,
  focus management, escape key, overlay click behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Re-embed Feature:
- Add re_embed_service.py for bulk re-embedding of documents
- Add API endpoints: POST /api/knowledge/re-embed, POST /stop/{id}, GET /stats
- Add frontend service methods: startReEmbed(), stopReEmbed(), getReEmbedStats()
- Add "Re-embed & Change" button in embedding model change warning dialog
- Progress tracking via ProgressTracker (operation_type="re_embed")

TypeScript Fixes:
- Remove unused functions: testConnection, handleDeleteLLMInstance, handleDeleteEmbeddingInstance
- Remove unused refs: lastTestedLLMConfigRef, lastTestedEmbeddingConfigRef
- Add explicit type annotations to fix implicit 'any' errors
- Add missing properties to RAGSettingsProps: OLLAMA_CHAT_AUTH_TOKEN, OLLAMA_EMBEDDING_AUTH_TOKEN, OLLAMA_API_MODE
- Align RagSettings interface in credentialsService.ts with RAGSettings.tsx
- Fix status state type annotation for responseTime: number | null

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add knowledge-base-e2e.spec.ts with 6 tests:
  - Web crawl → search flow
  - PDF upload → search flow
  - RAG search API validation
  - RAG search UI validation
  - Re-embed flow
  - Mixed content search
- Fix ES module __dirname in pdf-upload-flow.spec.ts
- Fix migration 012 to match complete_setup.sql structure:
  - Correct column name (content_search_vector)
  - Add match_type field
  - Correct parameter order for PostgREST compatibility
- Fix UI messages for remote Ollama compatibility
- Add "Checking..." status to prevent layout shift

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…d proper cancellation

- Load EMBEDDING_BATCH_SIZE from settings instead of hardcoded value (20-200 range)
- Add _is_cancelled() method for reliable cancellation checks between batches
- Add cancellation checks at multiple points: before batch, after embedding API call, during DB updates
- Handle asyncio.CancelledError properly to ensure clean task termination
- Fix stop_re_embed() to signal cancellation before task.cancel()
- Add comprehensive unit tests (17 tests covering batch size, cancellation, stats, etc.)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Changes:
- Card inline progress: Upload cards now show progress with status, percentage, and stats just like Web Crawl cards
- Rate limit display: Move "Rate limited: waiting Xs" to Progress line in parentheses (gray, not yellow)
- Backend source_id sync: Use progress_id as source_id for uploads to enable frontend matching
- Fix duplicate cards: Update optimistic entries with real progressId instead of removing them
- Add upload-specific statuses: uploading, storing, text_extraction, chunking, summarizing
- Add upload stats display: pages_extracted, chunks_created

Key implementation details:
- Backend: Include source_id in all progress tracker updates during upload processing
- Frontend: Update optimistic items/operations with response.progressId in onSuccess
- Matching: KnowledgeList matches cards to operations via source_id (not filename)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove optimistic knowledge item creation for uploads to prevent duplicates
- Fix KnowledgeCard to always use knowledge_type color regardless of active operation
- Add rate limit display in parentheses on progress bar
- Add batch progress display (e.g. Batch 5/10)
- Add comprehensive dual-PDF upload E2E test
- Fix E2E test locators for strict mode compliance

The upload flow now shows correct colors from the start and only creates
one card per uploaded file.

Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…2E test fixtures

- Frontend: Remove manual escaping before Prism (Prism escapes internally)
- Backend: Add iterative html.unescape() before storing code as safeguard
- Tests: Add 6 backend tests for HTML entity decoding (single/double/triple encoded)
- E2E: Add Code Examples Display test to pdf-upload-flow.spec.ts
- Fixtures: Add book-sample.pdf and coding-sample.pdf (4 pages each with code ≥250 chars)

The root cause was double-escaping: manual escaping + Prism's internal escaping.
DB data was already clean - no migration needed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
…support

PDF Text Extraction:
- pymupdf4llm as primary extractor (better markdown, code block detection)
- pdfplumber/PyPDF2 as fallbacks
- Page-by-page progress tracking

OCR Support:
- Tesseract OCR for image-based PDFs
- Automatic fallback when no text detected
- Per-page OCR progress

Unified Progress UI:
- PDF upload now shows same "cinema" experience as web crawl
- Live stats: Pages Extracted, Chunks, Code Blocks
- Batch progress for embeddings

Backend Improvements:
- Async document processing with progress callbacks
- Code extraction with html.unescape for clean storage
- Re-embed service with settings-based batch size

Tests:
- Backend unit tests for document processing, OCR
- E2E tests updated for new progress UI

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 3, 2025

Important

Review skipped

More than 25% of the files skipped due to max files limit. The review is being skipped to prevent a low-quality review.

39 files out of 149 files are above the max files limit of 100. Please upgrade to Pro plan to get higher limits.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Wirasm
Copy link
Copy Markdown
Collaborator

Wirasm commented Apr 6, 2026

🔄 This repository is being replaced by a new version of Archon.

The original Python/MCP codebase is being archived to the archive/v1-python-mcp branch. The new Archon (TypeScript workflow engine for AI coding agents) is replacing it.

This PR is being closed as part of the migration. Thank you for your contribution!

@Wirasm Wirasm closed this Apr 6, 2026
coleam00 pushed a commit that referenced this pull request Apr 7, 2026
…+ validation node (#893)

Two improvements to the interactive PRD workflow:

1. First-principles prompting: All research/technical/generate nodes now
   instruct the AI to start from what exists, verify claims by reading
   actual files, prefer extending over creating, and cite exact file:line.

2. Validation node: New final node reads the generated PRD and checks
   every technical claim against the codebase — file paths, endpoint
   names, DB schemas, event types, component names. Corrects inline
   and documents what was fixed.
coleam00 added a commit that referenced this pull request Apr 7, 2026
* fix: auto-select first DAG node on workflow run load (#781)

The workflow output panel showed "No output available" on initial load
because no DAG node was selected by default. Users had to manually click
a node in the Logs tab to see output.

Changes:
- Add useEffect that auto-selects the first DAG node when workflow data
  loads and no node is selected
- Prefer the currently running node for live workflows

Fixes #781

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: formatting and test fixes from validation

- Fix formatting in workflows.test.ts and api.ts (prettier)
- Replace Hono() with OpenAPIHono({ defaultHook: validationErrorHook })
  in api.conversations.test.ts to match route registration requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: extract selectInitialNode to testable pure function

Extract DAG node auto-selection logic from useEffect into a pure
function with 5 unit tests covering all branches (undefined, empty,
no running, prefer running, multiple running).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: workflow lifecycle overhaul — path-based guards, interrupted status, resume/abandon (#871)

* feat: add interrupted to WorkflowRunStatus schema

Implements US-001 from PRD.

Changes:
- Add 'interrupted' to workflowRunStatusSchema z.enum in packages/workflows/src/schemas/workflow-run.ts
- Add 'interrupted' to workflowRunStatusSchema in packages/server/src/routes/schemas/workflow.schemas.ts
- Add interrupted: z.number() to dashboardRunsResponseSchema counts object
- Add 'interrupted' to dashboardValidStatuses in API handler
- Add interrupted: 0 to DashboardRunsResult counts interface and runtime object in packages/core/src/db/workflows.ts

* feat: update IWorkflowStore interface & DB query implementations

Implements US-002 from PRD.

Changes:
- IWorkflowStore: rename getActiveWorkflowRun → getActiveWorkflowRunByPath(workingPath)
- IWorkflowStore: drop conversationId from findResumableRun signature
- IWorkflowStore: add interruptOrphanedRuns() method
- db/workflows: add getActiveWorkflowRunByPath querying status IN ('running', 'interrupted')
- db/workflows: update findResumableRun to query by workflow_name + working_path only, include 'interrupted' status
- db/workflows: add interruptOrphanedRuns() UPDATE SET status='interrupted' WHERE status='running'
- store-adapter: wire all three new/modified methods
- executor: update call sites to use renamed methods (type-check requirement)
- tests: update all mock stores and add new tests for getActiveWorkflowRunByPath and interruptOrphanedRuns

* feat: replace staleness guard with path-based lifecycle

Implements US-003 from PRD.

Changes:
- executor.ts: remove STALE_MINUTES staleness auto-kill; replace with
  status-based guard — 'running' blocks, 'interrupted' offers resume/abandon
- server/src/index.ts: replace failStaleWorkflowRuns() with
  createWorkflowStore().interruptOrphanedRuns() on startup
- executor-preamble.test.ts: replace staleness detection tests with
  concurrent run guard tests covering 'running' and 'interrupted' cases

* feat: command handler — /workflow status, resume, and abandon

Implements US-004. Replaces time-based stale heuristics with explicit
lifecycle commands for workflow management.

Changes:
- Remove WORKFLOW_SLOW_THRESHOLD_MS and WORKFLOW_STALE_THRESHOLD_MS constants
- Replace /workflow status with global view: lists all running+interrupted runs
  across all worktrees (ID, name, working path, status, started-at)
- Add /workflow resume <id>: validates state then calls resumeWorkflowRun
- Add /workflow abandon <id>: validates state then calls failWorkflowRun
- Add statuses[] filter to listWorkflowRuns for IN (...) queries
- Update /workflow help text and default case usage string
- Update /status command to remove stale warning that referenced removed constants
- Replace old /workflow status tests with new behavior coverage
- Add /workflow resume and /workflow abandon test coverage

* feat: CLI workflow status, resume, and abandon subcommands

Implements US-005 from PRD.

Changes:
- Implement workflowStatusCommand: lists all running+interrupted runs with ID, name, path, status, age; supports --json flag
- Add workflowResumeCommand: validates run state then calls resumeWorkflowRun
- Add workflowAbandonCommand: validates run state then calls failWorkflowRun('Abandoned by user')
- Replace findLastFailedRun usage in --resume path with findResumableRun(workflowName, cwd)
- Wire resume/abandon subcommands in cli.ts
- Update tests: replace "not implemented" test with status/resume/abandon coverage

* feat: Web UI interrupted status badge and dashboard support

Implements US-006 from PRD.

Changes:
- api.generated.d.ts: add 'interrupted' to WorkflowRunStatus enum and DashboardRunsResponse.counts
- api.ts: add interrupted field to DashboardCounts interface
- WorkflowExecution.tsx: add 'interrupted' to TERMINAL_STATUSES; add amber color to StatusBadge
- WorkflowRunCard.tsx: add amber dot and badge for interrupted status
- StatusSummaryBar.tsx: add 'interrupted' to STATUS_CHIPS filter list
- DashboardPage.tsx: include interrupted in activeRuns filter and counts default

* refactor: remove dead timer-based workflow staleness code

Implements US-007 from PRD.

Changes:
- Remove findLastFailedRun() from db/workflows.ts (CLI path unified on findResumableRun in US-005)
- Remove failStaleWorkflowRuns() from db/workflows.ts (replaced by interruptOrphanedRuns in US-002)
- Remove IDatabase import from db/workflows.ts (no longer needed)
- Remove failStaleWorkflowRuns tests from db/workflows.test.ts

grep -r 'STALE' packages/ (workflow-timer variant), grep -r 'findLastFailedRun' and
grep -r 'failStaleWorkflowRuns' all return zero matches.

* fix: address review feedback — truncated IDs, resume semantics, type safety

- Use full UUIDs in resume/abandon command suggestions (was .slice(0, 8))
- Add completed_at to interruptOrphanedRuns for correct duration metrics
- Fix resume commands: mark interrupted→failed to unblock path guard,
  let next workflow invocation auto-resume via findResumableRun
- Collapse dual status/statuses fields into status?: T | T[]
- Extract TERMINAL/RESUMABLE/ACTIVE_WORKFLOW_STATUSES constants
- Add explicit else-if for interrupted status in executor path guard
- Add structured logging to CLI workflow commands
- Restore conversationId to cmd.workflow_status_failed log
- Add tests: listWorkflowRuns statuses filter, interrupted auto-resume,
  DB error handling for resume/abandon
- Update docs: commands-reference, cli-user-guide, authoring-workflows, CLAUDE.md

* refactor: simplify CLI commands and status filter logic

- Extract getRunOrThrow helper for shared run lookup pattern
- Use WorkflowRun[] instead of Awaited<ReturnType<...>>
- Remove single-item special case in listWorkflowRuns (IN works for all)
- Use ?? instead of || for null-coalescing consistency
- Remove unused ACTIVE_WORKFLOW_STATUSES constant
- Add inline comment on completed_at for interrupted runs

* fix: handle SQLite string dates in formatAge

SQLite returns started_at as a string, not a Date object.
formatAge now accepts Date | string and converts accordingly.
Found during E2E testing against real SQLite database.

* feat: UX improvements — real resume, dashboard actions, cleanup command

- CLI resume now actually re-executes the workflow (calls workflowRunCommand
  with --resume internally instead of just flipping DB status)
- Remove truncated IDs from executor guard messages (full ID in commands only)
- Add Resume/Abandon/Delete buttons to dashboard workflow run cards
- Add Delete button to history table rows
- Add API endpoints: POST resume, POST abandon, DELETE workflow run
- Add CLI workflow cleanup command (deletes terminal runs older than N days)
- Add deleteWorkflowRun and deleteOldWorkflowRuns DB functions

* refactor: simplify API handlers, dashboard actions, and log conventions

- Use RESUMABLE/TERMINAL_WORKFLOW_STATUSES constants in API handlers
  (was inline string checks diverging from CLI/command-handler)
- Extract makeRunAction helper in DashboardPage (4 identical handlers → 1)
- Fix log event names to use domain prefix convention (api.workflow_run_*)
- Use Ban icon for Abandon to distinguish from Cancel's XCircle
- Use instanceof Date guard in formatAge for clarity
- Add comment on delete handler's active-status guard

* refactor: simplify workflow lifecycle — remove interrupted, single resume path

Rework the primitives for a clean foundation:

Status model: 5 statuses (pending, running, completed, failed, cancelled).
Remove 'interrupted' entirely — server restart now marks orphaned runs
as 'failed' directly (with metadata.failure_reason = 'server_restart').

Resume model: one path. The executor's implicit findResumableRun
detects prior failed runs and skips completed nodes. The CLI --resume
flag reuses the prior run's worktree but lets the executor handle
node-skipping (no more preCreatedRun bypass). Chat /workflow resume
tells the user to re-invoke (auto-resume kicks in).

Path guard: only blocks 'running' status (was running + interrupted).
Guards always run regardless of preCreatedRun.

Cancellation: generalized from status === 'cancelled' to
status !== 'running' at all 3 check points (streaming, loop iterations,
DAG layers). Ready for future 'paused' status with zero changes.

- interruptOrphanedRuns → failOrphanedRuns
- Remove preCreatedRun bypass from CLI --resume path
- Generalize 3 cancellation check points in dag-executor
- Update all API endpoints, command handlers, UI components
- Update all tests and documentation

* fix: cancel/complete race, abandon semantics, UTC dates

- Fix cancel/complete race condition: dag-executor now checks DB status
  before calling completeWorkflowRun or failWorkflowRun, preventing a
  cancel during the final layer from being overwritten to completed
- Abandon uses cancelWorkflowRun instead of failWorkflowRun, so
  abandoned runs don't get auto-resumed by findResumableRun
- Fix formatAge UTC bug: SQLite dates without Z suffix now parsed as UTC

* fix: address PR review — SQL safety, transactions, error handling, docs, tests

- Validate olderThanDays before SQL interpolation in deleteOldWorkflowRuns
- Wrap multi-statement deletes in transactions (deleteOldWorkflowRuns, deleteWorkflowRun)
- Fix deleteWorkflowRun error double-wrap (don't re-wrap "not found" errors)
- Handle null getWorkflowRunStatus in DAG executor (treat as deleted, abort)
- Fix mock name mismatch: interruptOrphanedRuns → failOrphanedRuns in 3 test files
- Fix default mock getWorkflowRunStatus to return 'running' instead of null
- Add NaN guard to formatAge (returns 'unknown' on unparseable dates)
- Fix stale 'interrupted' references in route summary and delete comment
- Include working path in /workflow resume response
- Align deleteOldWorkflowRuns return type to { count } for consistency
- Document workflow cleanup command in CLAUDE.md, CLI user guide, commands reference
- Document new API endpoints (resume, abandon, delete) in CLAUDE.md
- Add tests for deleteOldWorkflowRuns, deleteWorkflowRun, workflowCleanupCommand
- Fix workflowAbandonCommand test to assert cancelWorkflowRun call

* refactor: simplify code per review — extract helper, cleaner date parsing, consistent guards

- Extract duplicated status-check blocks into skipIfStatusChanged helper in dag-executor
- Simplify formatAge to single-pass date parsing with Z suffix (ISO 8601)
- Use TERMINAL_WORKFLOW_STATUSES constant in delete route guard
- Rename cancelError → actionError in DashboardPage (covers 4 actions now)
- Fix merge conflict: add IDatabase import, getRunningWorkflows from dev
- Fix api.conversations.test.ts: add missing workflow mocks, fix Hono → OpenAPIHono

* fix: address review findings — double-rollback, missing guards, log context, tests

- Fix double-rollback in deleteWorkflowRun by removing inner rollback()
  call and letting the outer catch handle it
- Add terminal-status guard inside deleteWorkflowRun itself, not just in
  the route handler, to prevent deletion of running workflows
- Add rollback failure logging to the rollback() helper
- Add runId to error logs in resume/abandon/delete API route handlers
- Add workingPath to getActiveWorkflowRunByPath error log
- Add workflowRunId to dag-executor status-check warn logs
- Wrap workflowRunCommand in try/catch in workflowResumeCommand with
  structured logging and null guard for user_message
- Clean up stale 'interrupted' references in JSDoc
- Fix missing / prefix on workflow cleanup in commands-reference.md
- Add API route tests for POST /resume, POST /abandon, DELETE /:runId

* refactor: apply code simplifications from review

- Replace fragile startsWith string matching in deleteWorkflowRun catch
  with a typed WorkflowRunGuardError class
- Reorder listWorkflowRuns placeholder generation: capture startIdx
  before pushing values for clarity
- Replace curried makeRunAction factory in DashboardPage with a plain
  runAction helper function
- Move skipIfStatusChanged helper definition before its call sites in
  dag-executor to match reading order

* fix(workflows): idle timeout too aggressive on DAG nodes (#854) (#886)

* fix(workflows): idle timeout too aggressive — break after result, reset on all messages (#854)

The idle timeout (5 min default) caused two problems: (1) after a node's AI
finished (result message), the loop waited for the subprocess to exit, wasting
5 min on hangs; (2) tool messages didn't reset the timer, so long Bash calls
(tests, builds) triggered false timeouts on actively working nodes.

Changes:
- Break out of the for-await loop immediately after receiving the result message
  in both command/prompt and loop node paths — no more post-completion waste
- Remove shouldResetTimer predicate so all message types (including tool) reset
  the timer — timeout only fires on complete silence
- Increase STEP_IDLE_TIMEOUT_MS from 5 min to 30 min — with every message
  resetting the timer, this is a deadlock detector, not a work limiter

Fixes #854

* fix(workflows): update withIdleTimeout JSDoc to match new timer behavior

Remove the tool-exclusion example from the shouldResetTimer docs since
that pattern was just removed from all call sites. Clarify that most
callers should omit the parameter.

* fix(workflows): address review findings — log cleanup errors, add break tests, fix stale docs

- Log generator cleanup errors in withIdleTimeout instead of silently swallowing
- Add behavioral tests for break-after-result in both command/prompt and loop nodes
- Fix stale "5 minutes" default in docs/loop-nodes.md (now 30 minutes)
- Clarify shouldResetTimer test names and comments (utility API, not executor behavior)
- Extract effectiveIdleTimeout in loop node path (matches command/prompt pattern)
- Remove redundant iterResult alias in withIdleTimeout

* feat: add approval gate node type for human-in-the-loop workflows (#888)

* feat: add approval gate node type for human-in-the-loop workflows

Add `approval` as a fifth DAG node type that pauses workflow execution
until a human approves or rejects. Built on existing resume infrastructure.

- New `approval` node type in YAML workflows with `message` field
- `paused` workflow status (non-terminal, resumable)
- Approve/reject via REST API, CLI, chat commands, and Web UI
- Approval comment available as `$node.output` in downstream nodes
- Dashboard shows amber pulsing badge with approval message for paused runs
- Path guard blocks new workflows on paused worktrees

* fix: resolve 7 bugs in approval gate implementation

Critical:
- Parse metadata JSON on SQLite (returned as TEXT string, not object)
- Suppress spurious "Workflow stopped (paused)" message after approval gates

Medium:
- CLI exits 0 on pause (not error code 1)
- workflow status shows paused runs alongside running
- Docs clarify auto-resume is CLI-only; API/chat mark resumable

Low:
- Skip completed_at when transitioning to failed for approval resume
- Warn on AI fields (model, hooks, etc.) set on approval nodes
- Add rowCount check to updateWorkflowRun

Also: extract ApprovalContext type, add .min(1) to approval.message,
fix stale "Only failed runs" error messages, fix log domain prefix,
add recovery instructions to CLI approve error message.

* feat(workflows): add interactive PRD workflow with approval gates (#891)

Guided PRD creation through 3 approval gates:
1. Foundation questions (who/what/why/why now)
2. Deep dive questions (vision/users/JTBD/constraints)
3. Scope questions (MVP/hypothesis/exclusions)

AI researches market + codebase between gates. Each gate carries
the user's answers via $gate.output to downstream nodes.
Final node generates the PRD from accumulated context.

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (#890)

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (#889)

When workflow resume or approve is called, the worktree path is passed as cwd,
but the codebase was registered under the main checkout path. This caused
findCodebaseByDefaultCwd to return null, triggering auto-registration which
tried to create a source symlink that already existed, crashing the command.

Changes:
- Add codebaseId option to WorkflowRunOptions
- Look up codebase by stored ID before falling back to auto-registration
- Pass run.codebase_id from workflowResumeCommand and workflowApproveCommand
- Add test verifying getCodebase is called with stored codebase_id on resume

Fixes #889

* fix(cli): address review findings for PR #890

- Add domain prefix to 5 log events in codebase-lookup block
  (cli.codebase_lookup_failed, cli.db_connection_hint,
  cli.codebase_id_lookup_failed, cli.codebase_auto_registered,
  cli.codebase_auto_registration_failed)
- Add inline comment clarifying intentional fallthrough in catch block
- Add test for workflowApproveCommand codebase_id forwarding
- Add test for error fallback path when getCodebase throws
- Update pre-existing test assertion to match renamed log event

* fix(workflows): improve interactive PRD — first-principles prompting + validation node (#893)

Two improvements to the interactive PRD workflow:

1. First-principles prompting: All research/technical/generate nodes now
   instruct the AI to start from what exists, verify claims by reading
   actual files, prefer extending over creating, and cite exact file:line.

2. Validation node: New final node reads the generated PRD and checks
   every technical claim against the codebase — file paths, endpoint
   names, DB schemas, event types, component names. Corrects inline
   and documents what was fixed.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
Tyone88 pushed a commit to Tyone88/Archon that referenced this pull request Apr 16, 2026
…+ validation node (coleam00#893)

Two improvements to the interactive PRD workflow:

1. First-principles prompting: All research/technical/generate nodes now
   instruct the AI to start from what exists, verify claims by reading
   actual files, prefer extending over creating, and cite exact file:line.

2. Validation node: New final node reads the generated PRD and checks
   every technical claim against the codebase — file paths, endpoint
   names, DB schemas, event types, component names. Corrects inline
   and documents what was fixed.
Tyone88 pushed a commit to Tyone88/Archon that referenced this pull request Apr 16, 2026
* fix: auto-select first DAG node on workflow run load (coleam00#781)

The workflow output panel showed "No output available" on initial load
because no DAG node was selected by default. Users had to manually click
a node in the Logs tab to see output.

Changes:
- Add useEffect that auto-selects the first DAG node when workflow data
  loads and no node is selected
- Prefer the currently running node for live workflows

Fixes coleam00#781

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: formatting and test fixes from validation

- Fix formatting in workflows.test.ts and api.ts (prettier)
- Replace Hono() with OpenAPIHono({ defaultHook: validationErrorHook })
  in api.conversations.test.ts to match route registration requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: extract selectInitialNode to testable pure function

Extract DAG node auto-selection logic from useEffect into a pure
function with 5 unit tests covering all branches (undefined, empty,
no running, prefer running, multiple running).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: workflow lifecycle overhaul — path-based guards, interrupted status, resume/abandon (coleam00#871)

* feat: add interrupted to WorkflowRunStatus schema

Implements US-001 from PRD.

Changes:
- Add 'interrupted' to workflowRunStatusSchema z.enum in packages/workflows/src/schemas/workflow-run.ts
- Add 'interrupted' to workflowRunStatusSchema in packages/server/src/routes/schemas/workflow.schemas.ts
- Add interrupted: z.number() to dashboardRunsResponseSchema counts object
- Add 'interrupted' to dashboardValidStatuses in API handler
- Add interrupted: 0 to DashboardRunsResult counts interface and runtime object in packages/core/src/db/workflows.ts

* feat: update IWorkflowStore interface & DB query implementations

Implements US-002 from PRD.

Changes:
- IWorkflowStore: rename getActiveWorkflowRun → getActiveWorkflowRunByPath(workingPath)
- IWorkflowStore: drop conversationId from findResumableRun signature
- IWorkflowStore: add interruptOrphanedRuns() method
- db/workflows: add getActiveWorkflowRunByPath querying status IN ('running', 'interrupted')
- db/workflows: update findResumableRun to query by workflow_name + working_path only, include 'interrupted' status
- db/workflows: add interruptOrphanedRuns() UPDATE SET status='interrupted' WHERE status='running'
- store-adapter: wire all three new/modified methods
- executor: update call sites to use renamed methods (type-check requirement)
- tests: update all mock stores and add new tests for getActiveWorkflowRunByPath and interruptOrphanedRuns

* feat: replace staleness guard with path-based lifecycle

Implements US-003 from PRD.

Changes:
- executor.ts: remove STALE_MINUTES staleness auto-kill; replace with
  status-based guard — 'running' blocks, 'interrupted' offers resume/abandon
- server/src/index.ts: replace failStaleWorkflowRuns() with
  createWorkflowStore().interruptOrphanedRuns() on startup
- executor-preamble.test.ts: replace staleness detection tests with
  concurrent run guard tests covering 'running' and 'interrupted' cases

* feat: command handler — /workflow status, resume, and abandon

Implements US-004. Replaces time-based stale heuristics with explicit
lifecycle commands for workflow management.

Changes:
- Remove WORKFLOW_SLOW_THRESHOLD_MS and WORKFLOW_STALE_THRESHOLD_MS constants
- Replace /workflow status with global view: lists all running+interrupted runs
  across all worktrees (ID, name, working path, status, started-at)
- Add /workflow resume <id>: validates state then calls resumeWorkflowRun
- Add /workflow abandon <id>: validates state then calls failWorkflowRun
- Add statuses[] filter to listWorkflowRuns for IN (...) queries
- Update /workflow help text and default case usage string
- Update /status command to remove stale warning that referenced removed constants
- Replace old /workflow status tests with new behavior coverage
- Add /workflow resume and /workflow abandon test coverage

* feat: CLI workflow status, resume, and abandon subcommands

Implements US-005 from PRD.

Changes:
- Implement workflowStatusCommand: lists all running+interrupted runs with ID, name, path, status, age; supports --json flag
- Add workflowResumeCommand: validates run state then calls resumeWorkflowRun
- Add workflowAbandonCommand: validates run state then calls failWorkflowRun('Abandoned by user')
- Replace findLastFailedRun usage in --resume path with findResumableRun(workflowName, cwd)
- Wire resume/abandon subcommands in cli.ts
- Update tests: replace "not implemented" test with status/resume/abandon coverage

* feat: Web UI interrupted status badge and dashboard support

Implements US-006 from PRD.

Changes:
- api.generated.d.ts: add 'interrupted' to WorkflowRunStatus enum and DashboardRunsResponse.counts
- api.ts: add interrupted field to DashboardCounts interface
- WorkflowExecution.tsx: add 'interrupted' to TERMINAL_STATUSES; add amber color to StatusBadge
- WorkflowRunCard.tsx: add amber dot and badge for interrupted status
- StatusSummaryBar.tsx: add 'interrupted' to STATUS_CHIPS filter list
- DashboardPage.tsx: include interrupted in activeRuns filter and counts default

* refactor: remove dead timer-based workflow staleness code

Implements US-007 from PRD.

Changes:
- Remove findLastFailedRun() from db/workflows.ts (CLI path unified on findResumableRun in US-005)
- Remove failStaleWorkflowRuns() from db/workflows.ts (replaced by interruptOrphanedRuns in US-002)
- Remove IDatabase import from db/workflows.ts (no longer needed)
- Remove failStaleWorkflowRuns tests from db/workflows.test.ts

grep -r 'STALE' packages/ (workflow-timer variant), grep -r 'findLastFailedRun' and
grep -r 'failStaleWorkflowRuns' all return zero matches.

* fix: address review feedback — truncated IDs, resume semantics, type safety

- Use full UUIDs in resume/abandon command suggestions (was .slice(0, 8))
- Add completed_at to interruptOrphanedRuns for correct duration metrics
- Fix resume commands: mark interrupted→failed to unblock path guard,
  let next workflow invocation auto-resume via findResumableRun
- Collapse dual status/statuses fields into status?: T | T[]
- Extract TERMINAL/RESUMABLE/ACTIVE_WORKFLOW_STATUSES constants
- Add explicit else-if for interrupted status in executor path guard
- Add structured logging to CLI workflow commands
- Restore conversationId to cmd.workflow_status_failed log
- Add tests: listWorkflowRuns statuses filter, interrupted auto-resume,
  DB error handling for resume/abandon
- Update docs: commands-reference, cli-user-guide, authoring-workflows, CLAUDE.md

* refactor: simplify CLI commands and status filter logic

- Extract getRunOrThrow helper for shared run lookup pattern
- Use WorkflowRun[] instead of Awaited<ReturnType<...>>
- Remove single-item special case in listWorkflowRuns (IN works for all)
- Use ?? instead of || for null-coalescing consistency
- Remove unused ACTIVE_WORKFLOW_STATUSES constant
- Add inline comment on completed_at for interrupted runs

* fix: handle SQLite string dates in formatAge

SQLite returns started_at as a string, not a Date object.
formatAge now accepts Date | string and converts accordingly.
Found during E2E testing against real SQLite database.

* feat: UX improvements — real resume, dashboard actions, cleanup command

- CLI resume now actually re-executes the workflow (calls workflowRunCommand
  with --resume internally instead of just flipping DB status)
- Remove truncated IDs from executor guard messages (full ID in commands only)
- Add Resume/Abandon/Delete buttons to dashboard workflow run cards
- Add Delete button to history table rows
- Add API endpoints: POST resume, POST abandon, DELETE workflow run
- Add CLI workflow cleanup command (deletes terminal runs older than N days)
- Add deleteWorkflowRun and deleteOldWorkflowRuns DB functions

* refactor: simplify API handlers, dashboard actions, and log conventions

- Use RESUMABLE/TERMINAL_WORKFLOW_STATUSES constants in API handlers
  (was inline string checks diverging from CLI/command-handler)
- Extract makeRunAction helper in DashboardPage (4 identical handlers → 1)
- Fix log event names to use domain prefix convention (api.workflow_run_*)
- Use Ban icon for Abandon to distinguish from Cancel's XCircle
- Use instanceof Date guard in formatAge for clarity
- Add comment on delete handler's active-status guard

* refactor: simplify workflow lifecycle — remove interrupted, single resume path

Rework the primitives for a clean foundation:

Status model: 5 statuses (pending, running, completed, failed, cancelled).
Remove 'interrupted' entirely — server restart now marks orphaned runs
as 'failed' directly (with metadata.failure_reason = 'server_restart').

Resume model: one path. The executor's implicit findResumableRun
detects prior failed runs and skips completed nodes. The CLI --resume
flag reuses the prior run's worktree but lets the executor handle
node-skipping (no more preCreatedRun bypass). Chat /workflow resume
tells the user to re-invoke (auto-resume kicks in).

Path guard: only blocks 'running' status (was running + interrupted).
Guards always run regardless of preCreatedRun.

Cancellation: generalized from status === 'cancelled' to
status !== 'running' at all 3 check points (streaming, loop iterations,
DAG layers). Ready for future 'paused' status with zero changes.

- interruptOrphanedRuns → failOrphanedRuns
- Remove preCreatedRun bypass from CLI --resume path
- Generalize 3 cancellation check points in dag-executor
- Update all API endpoints, command handlers, UI components
- Update all tests and documentation

* fix: cancel/complete race, abandon semantics, UTC dates

- Fix cancel/complete race condition: dag-executor now checks DB status
  before calling completeWorkflowRun or failWorkflowRun, preventing a
  cancel during the final layer from being overwritten to completed
- Abandon uses cancelWorkflowRun instead of failWorkflowRun, so
  abandoned runs don't get auto-resumed by findResumableRun
- Fix formatAge UTC bug: SQLite dates without Z suffix now parsed as UTC

* fix: address PR review — SQL safety, transactions, error handling, docs, tests

- Validate olderThanDays before SQL interpolation in deleteOldWorkflowRuns
- Wrap multi-statement deletes in transactions (deleteOldWorkflowRuns, deleteWorkflowRun)
- Fix deleteWorkflowRun error double-wrap (don't re-wrap "not found" errors)
- Handle null getWorkflowRunStatus in DAG executor (treat as deleted, abort)
- Fix mock name mismatch: interruptOrphanedRuns → failOrphanedRuns in 3 test files
- Fix default mock getWorkflowRunStatus to return 'running' instead of null
- Add NaN guard to formatAge (returns 'unknown' on unparseable dates)
- Fix stale 'interrupted' references in route summary and delete comment
- Include working path in /workflow resume response
- Align deleteOldWorkflowRuns return type to { count } for consistency
- Document workflow cleanup command in CLAUDE.md, CLI user guide, commands reference
- Document new API endpoints (resume, abandon, delete) in CLAUDE.md
- Add tests for deleteOldWorkflowRuns, deleteWorkflowRun, workflowCleanupCommand
- Fix workflowAbandonCommand test to assert cancelWorkflowRun call

* refactor: simplify code per review — extract helper, cleaner date parsing, consistent guards

- Extract duplicated status-check blocks into skipIfStatusChanged helper in dag-executor
- Simplify formatAge to single-pass date parsing with Z suffix (ISO 8601)
- Use TERMINAL_WORKFLOW_STATUSES constant in delete route guard
- Rename cancelError → actionError in DashboardPage (covers 4 actions now)
- Fix merge conflict: add IDatabase import, getRunningWorkflows from dev
- Fix api.conversations.test.ts: add missing workflow mocks, fix Hono → OpenAPIHono

* fix: address review findings — double-rollback, missing guards, log context, tests

- Fix double-rollback in deleteWorkflowRun by removing inner rollback()
  call and letting the outer catch handle it
- Add terminal-status guard inside deleteWorkflowRun itself, not just in
  the route handler, to prevent deletion of running workflows
- Add rollback failure logging to the rollback() helper
- Add runId to error logs in resume/abandon/delete API route handlers
- Add workingPath to getActiveWorkflowRunByPath error log
- Add workflowRunId to dag-executor status-check warn logs
- Wrap workflowRunCommand in try/catch in workflowResumeCommand with
  structured logging and null guard for user_message
- Clean up stale 'interrupted' references in JSDoc
- Fix missing / prefix on workflow cleanup in commands-reference.md
- Add API route tests for POST /resume, POST /abandon, DELETE /:runId

* refactor: apply code simplifications from review

- Replace fragile startsWith string matching in deleteWorkflowRun catch
  with a typed WorkflowRunGuardError class
- Reorder listWorkflowRuns placeholder generation: capture startIdx
  before pushing values for clarity
- Replace curried makeRunAction factory in DashboardPage with a plain
  runAction helper function
- Move skipIfStatusChanged helper definition before its call sites in
  dag-executor to match reading order

* fix(workflows): idle timeout too aggressive on DAG nodes (coleam00#854) (coleam00#886)

* fix(workflows): idle timeout too aggressive — break after result, reset on all messages (coleam00#854)

The idle timeout (5 min default) caused two problems: (1) after a node's AI
finished (result message), the loop waited for the subprocess to exit, wasting
5 min on hangs; (2) tool messages didn't reset the timer, so long Bash calls
(tests, builds) triggered false timeouts on actively working nodes.

Changes:
- Break out of the for-await loop immediately after receiving the result message
  in both command/prompt and loop node paths — no more post-completion waste
- Remove shouldResetTimer predicate so all message types (including tool) reset
  the timer — timeout only fires on complete silence
- Increase STEP_IDLE_TIMEOUT_MS from 5 min to 30 min — with every message
  resetting the timer, this is a deadlock detector, not a work limiter

Fixes coleam00#854

* fix(workflows): update withIdleTimeout JSDoc to match new timer behavior

Remove the tool-exclusion example from the shouldResetTimer docs since
that pattern was just removed from all call sites. Clarify that most
callers should omit the parameter.

* fix(workflows): address review findings — log cleanup errors, add break tests, fix stale docs

- Log generator cleanup errors in withIdleTimeout instead of silently swallowing
- Add behavioral tests for break-after-result in both command/prompt and loop nodes
- Fix stale "5 minutes" default in docs/loop-nodes.md (now 30 minutes)
- Clarify shouldResetTimer test names and comments (utility API, not executor behavior)
- Extract effectiveIdleTimeout in loop node path (matches command/prompt pattern)
- Remove redundant iterResult alias in withIdleTimeout

* feat: add approval gate node type for human-in-the-loop workflows (coleam00#888)

* feat: add approval gate node type for human-in-the-loop workflows

Add `approval` as a fifth DAG node type that pauses workflow execution
until a human approves or rejects. Built on existing resume infrastructure.

- New `approval` node type in YAML workflows with `message` field
- `paused` workflow status (non-terminal, resumable)
- Approve/reject via REST API, CLI, chat commands, and Web UI
- Approval comment available as `$node.output` in downstream nodes
- Dashboard shows amber pulsing badge with approval message for paused runs
- Path guard blocks new workflows on paused worktrees

* fix: resolve 7 bugs in approval gate implementation

Critical:
- Parse metadata JSON on SQLite (returned as TEXT string, not object)
- Suppress spurious "Workflow stopped (paused)" message after approval gates

Medium:
- CLI exits 0 on pause (not error code 1)
- workflow status shows paused runs alongside running
- Docs clarify auto-resume is CLI-only; API/chat mark resumable

Low:
- Skip completed_at when transitioning to failed for approval resume
- Warn on AI fields (model, hooks, etc.) set on approval nodes
- Add rowCount check to updateWorkflowRun

Also: extract ApprovalContext type, add .min(1) to approval.message,
fix stale "Only failed runs" error messages, fix log domain prefix,
add recovery instructions to CLI approve error message.

* feat(workflows): add interactive PRD workflow with approval gates (coleam00#891)

Guided PRD creation through 3 approval gates:
1. Foundation questions (who/what/why/why now)
2. Deep dive questions (vision/users/JTBD/constraints)
3. Scope questions (MVP/hypothesis/exclusions)

AI researches market + codebase between gates. Each gate carries
the user's answers via $gate.output to downstream nodes.
Final node generates the PRD from accumulated context.

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#890)

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#889)

When workflow resume or approve is called, the worktree path is passed as cwd,
but the codebase was registered under the main checkout path. This caused
findCodebaseByDefaultCwd to return null, triggering auto-registration which
tried to create a source symlink that already existed, crashing the command.

Changes:
- Add codebaseId option to WorkflowRunOptions
- Look up codebase by stored ID before falling back to auto-registration
- Pass run.codebase_id from workflowResumeCommand and workflowApproveCommand
- Add test verifying getCodebase is called with stored codebase_id on resume

Fixes coleam00#889

* fix(cli): address review findings for PR coleam00#890

- Add domain prefix to 5 log events in codebase-lookup block
  (cli.codebase_lookup_failed, cli.db_connection_hint,
  cli.codebase_id_lookup_failed, cli.codebase_auto_registered,
  cli.codebase_auto_registration_failed)
- Add inline comment clarifying intentional fallthrough in catch block
- Add test for workflowApproveCommand codebase_id forwarding
- Add test for error fallback path when getCodebase throws
- Update pre-existing test assertion to match renamed log event

* fix(workflows): improve interactive PRD — first-principles prompting + validation node (coleam00#893)

Two improvements to the interactive PRD workflow:

1. First-principles prompting: All research/technical/generate nodes now
   instruct the AI to start from what exists, verify claims by reading
   actual files, prefer extending over creating, and cite exact file:line.

2. Validation node: New final node reads the generated PRD and checks
   every technical claim against the codebase — file paths, endpoint
   names, DB schemas, event types, component names. Corrects inline
   and documents what was fixed.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
joaobmonteiro pushed a commit to joaobmonteiro/Archon that referenced this pull request Apr 26, 2026
…+ validation node (coleam00#893)

Two improvements to the interactive PRD workflow:

1. First-principles prompting: All research/technical/generate nodes now
   instruct the AI to start from what exists, verify claims by reading
   actual files, prefer extending over creating, and cite exact file:line.

2. Validation node: New final node reads the generated PRD and checks
   every technical claim against the codebase — file paths, endpoint
   names, DB schemas, event types, component names. Corrects inline
   and documents what was fixed.
joaobmonteiro pushed a commit to joaobmonteiro/Archon that referenced this pull request Apr 26, 2026
* fix: auto-select first DAG node on workflow run load (coleam00#781)

The workflow output panel showed "No output available" on initial load
because no DAG node was selected by default. Users had to manually click
a node in the Logs tab to see output.

Changes:
- Add useEffect that auto-selects the first DAG node when workflow data
  loads and no node is selected
- Prefer the currently running node for live workflows

Fixes coleam00#781

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: formatting and test fixes from validation

- Fix formatting in workflows.test.ts and api.ts (prettier)
- Replace Hono() with OpenAPIHono({ defaultHook: validationErrorHook })
  in api.conversations.test.ts to match route registration requirements

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: extract selectInitialNode to testable pure function

Extract DAG node auto-selection logic from useEffect into a pure
function with 5 unit tests covering all branches (undefined, empty,
no running, prefer running, multiple running).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: workflow lifecycle overhaul — path-based guards, interrupted status, resume/abandon (coleam00#871)

* feat: add interrupted to WorkflowRunStatus schema

Implements US-001 from PRD.

Changes:
- Add 'interrupted' to workflowRunStatusSchema z.enum in packages/workflows/src/schemas/workflow-run.ts
- Add 'interrupted' to workflowRunStatusSchema in packages/server/src/routes/schemas/workflow.schemas.ts
- Add interrupted: z.number() to dashboardRunsResponseSchema counts object
- Add 'interrupted' to dashboardValidStatuses in API handler
- Add interrupted: 0 to DashboardRunsResult counts interface and runtime object in packages/core/src/db/workflows.ts

* feat: update IWorkflowStore interface & DB query implementations

Implements US-002 from PRD.

Changes:
- IWorkflowStore: rename getActiveWorkflowRun → getActiveWorkflowRunByPath(workingPath)
- IWorkflowStore: drop conversationId from findResumableRun signature
- IWorkflowStore: add interruptOrphanedRuns() method
- db/workflows: add getActiveWorkflowRunByPath querying status IN ('running', 'interrupted')
- db/workflows: update findResumableRun to query by workflow_name + working_path only, include 'interrupted' status
- db/workflows: add interruptOrphanedRuns() UPDATE SET status='interrupted' WHERE status='running'
- store-adapter: wire all three new/modified methods
- executor: update call sites to use renamed methods (type-check requirement)
- tests: update all mock stores and add new tests for getActiveWorkflowRunByPath and interruptOrphanedRuns

* feat: replace staleness guard with path-based lifecycle

Implements US-003 from PRD.

Changes:
- executor.ts: remove STALE_MINUTES staleness auto-kill; replace with
  status-based guard — 'running' blocks, 'interrupted' offers resume/abandon
- server/src/index.ts: replace failStaleWorkflowRuns() with
  createWorkflowStore().interruptOrphanedRuns() on startup
- executor-preamble.test.ts: replace staleness detection tests with
  concurrent run guard tests covering 'running' and 'interrupted' cases

* feat: command handler — /workflow status, resume, and abandon

Implements US-004. Replaces time-based stale heuristics with explicit
lifecycle commands for workflow management.

Changes:
- Remove WORKFLOW_SLOW_THRESHOLD_MS and WORKFLOW_STALE_THRESHOLD_MS constants
- Replace /workflow status with global view: lists all running+interrupted runs
  across all worktrees (ID, name, working path, status, started-at)
- Add /workflow resume <id>: validates state then calls resumeWorkflowRun
- Add /workflow abandon <id>: validates state then calls failWorkflowRun
- Add statuses[] filter to listWorkflowRuns for IN (...) queries
- Update /workflow help text and default case usage string
- Update /status command to remove stale warning that referenced removed constants
- Replace old /workflow status tests with new behavior coverage
- Add /workflow resume and /workflow abandon test coverage

* feat: CLI workflow status, resume, and abandon subcommands

Implements US-005 from PRD.

Changes:
- Implement workflowStatusCommand: lists all running+interrupted runs with ID, name, path, status, age; supports --json flag
- Add workflowResumeCommand: validates run state then calls resumeWorkflowRun
- Add workflowAbandonCommand: validates run state then calls failWorkflowRun('Abandoned by user')
- Replace findLastFailedRun usage in --resume path with findResumableRun(workflowName, cwd)
- Wire resume/abandon subcommands in cli.ts
- Update tests: replace "not implemented" test with status/resume/abandon coverage

* feat: Web UI interrupted status badge and dashboard support

Implements US-006 from PRD.

Changes:
- api.generated.d.ts: add 'interrupted' to WorkflowRunStatus enum and DashboardRunsResponse.counts
- api.ts: add interrupted field to DashboardCounts interface
- WorkflowExecution.tsx: add 'interrupted' to TERMINAL_STATUSES; add amber color to StatusBadge
- WorkflowRunCard.tsx: add amber dot and badge for interrupted status
- StatusSummaryBar.tsx: add 'interrupted' to STATUS_CHIPS filter list
- DashboardPage.tsx: include interrupted in activeRuns filter and counts default

* refactor: remove dead timer-based workflow staleness code

Implements US-007 from PRD.

Changes:
- Remove findLastFailedRun() from db/workflows.ts (CLI path unified on findResumableRun in US-005)
- Remove failStaleWorkflowRuns() from db/workflows.ts (replaced by interruptOrphanedRuns in US-002)
- Remove IDatabase import from db/workflows.ts (no longer needed)
- Remove failStaleWorkflowRuns tests from db/workflows.test.ts

grep -r 'STALE' packages/ (workflow-timer variant), grep -r 'findLastFailedRun' and
grep -r 'failStaleWorkflowRuns' all return zero matches.

* fix: address review feedback — truncated IDs, resume semantics, type safety

- Use full UUIDs in resume/abandon command suggestions (was .slice(0, 8))
- Add completed_at to interruptOrphanedRuns for correct duration metrics
- Fix resume commands: mark interrupted→failed to unblock path guard,
  let next workflow invocation auto-resume via findResumableRun
- Collapse dual status/statuses fields into status?: T | T[]
- Extract TERMINAL/RESUMABLE/ACTIVE_WORKFLOW_STATUSES constants
- Add explicit else-if for interrupted status in executor path guard
- Add structured logging to CLI workflow commands
- Restore conversationId to cmd.workflow_status_failed log
- Add tests: listWorkflowRuns statuses filter, interrupted auto-resume,
  DB error handling for resume/abandon
- Update docs: commands-reference, cli-user-guide, authoring-workflows, CLAUDE.md

* refactor: simplify CLI commands and status filter logic

- Extract getRunOrThrow helper for shared run lookup pattern
- Use WorkflowRun[] instead of Awaited<ReturnType<...>>
- Remove single-item special case in listWorkflowRuns (IN works for all)
- Use ?? instead of || for null-coalescing consistency
- Remove unused ACTIVE_WORKFLOW_STATUSES constant
- Add inline comment on completed_at for interrupted runs

* fix: handle SQLite string dates in formatAge

SQLite returns started_at as a string, not a Date object.
formatAge now accepts Date | string and converts accordingly.
Found during E2E testing against real SQLite database.

* feat: UX improvements — real resume, dashboard actions, cleanup command

- CLI resume now actually re-executes the workflow (calls workflowRunCommand
  with --resume internally instead of just flipping DB status)
- Remove truncated IDs from executor guard messages (full ID in commands only)
- Add Resume/Abandon/Delete buttons to dashboard workflow run cards
- Add Delete button to history table rows
- Add API endpoints: POST resume, POST abandon, DELETE workflow run
- Add CLI workflow cleanup command (deletes terminal runs older than N days)
- Add deleteWorkflowRun and deleteOldWorkflowRuns DB functions

* refactor: simplify API handlers, dashboard actions, and log conventions

- Use RESUMABLE/TERMINAL_WORKFLOW_STATUSES constants in API handlers
  (was inline string checks diverging from CLI/command-handler)
- Extract makeRunAction helper in DashboardPage (4 identical handlers → 1)
- Fix log event names to use domain prefix convention (api.workflow_run_*)
- Use Ban icon for Abandon to distinguish from Cancel's XCircle
- Use instanceof Date guard in formatAge for clarity
- Add comment on delete handler's active-status guard

* refactor: simplify workflow lifecycle — remove interrupted, single resume path

Rework the primitives for a clean foundation:

Status model: 5 statuses (pending, running, completed, failed, cancelled).
Remove 'interrupted' entirely — server restart now marks orphaned runs
as 'failed' directly (with metadata.failure_reason = 'server_restart').

Resume model: one path. The executor's implicit findResumableRun
detects prior failed runs and skips completed nodes. The CLI --resume
flag reuses the prior run's worktree but lets the executor handle
node-skipping (no more preCreatedRun bypass). Chat /workflow resume
tells the user to re-invoke (auto-resume kicks in).

Path guard: only blocks 'running' status (was running + interrupted).
Guards always run regardless of preCreatedRun.

Cancellation: generalized from status === 'cancelled' to
status !== 'running' at all 3 check points (streaming, loop iterations,
DAG layers). Ready for future 'paused' status with zero changes.

- interruptOrphanedRuns → failOrphanedRuns
- Remove preCreatedRun bypass from CLI --resume path
- Generalize 3 cancellation check points in dag-executor
- Update all API endpoints, command handlers, UI components
- Update all tests and documentation

* fix: cancel/complete race, abandon semantics, UTC dates

- Fix cancel/complete race condition: dag-executor now checks DB status
  before calling completeWorkflowRun or failWorkflowRun, preventing a
  cancel during the final layer from being overwritten to completed
- Abandon uses cancelWorkflowRun instead of failWorkflowRun, so
  abandoned runs don't get auto-resumed by findResumableRun
- Fix formatAge UTC bug: SQLite dates without Z suffix now parsed as UTC

* fix: address PR review — SQL safety, transactions, error handling, docs, tests

- Validate olderThanDays before SQL interpolation in deleteOldWorkflowRuns
- Wrap multi-statement deletes in transactions (deleteOldWorkflowRuns, deleteWorkflowRun)
- Fix deleteWorkflowRun error double-wrap (don't re-wrap "not found" errors)
- Handle null getWorkflowRunStatus in DAG executor (treat as deleted, abort)
- Fix mock name mismatch: interruptOrphanedRuns → failOrphanedRuns in 3 test files
- Fix default mock getWorkflowRunStatus to return 'running' instead of null
- Add NaN guard to formatAge (returns 'unknown' on unparseable dates)
- Fix stale 'interrupted' references in route summary and delete comment
- Include working path in /workflow resume response
- Align deleteOldWorkflowRuns return type to { count } for consistency
- Document workflow cleanup command in CLAUDE.md, CLI user guide, commands reference
- Document new API endpoints (resume, abandon, delete) in CLAUDE.md
- Add tests for deleteOldWorkflowRuns, deleteWorkflowRun, workflowCleanupCommand
- Fix workflowAbandonCommand test to assert cancelWorkflowRun call

* refactor: simplify code per review — extract helper, cleaner date parsing, consistent guards

- Extract duplicated status-check blocks into skipIfStatusChanged helper in dag-executor
- Simplify formatAge to single-pass date parsing with Z suffix (ISO 8601)
- Use TERMINAL_WORKFLOW_STATUSES constant in delete route guard
- Rename cancelError → actionError in DashboardPage (covers 4 actions now)
- Fix merge conflict: add IDatabase import, getRunningWorkflows from dev
- Fix api.conversations.test.ts: add missing workflow mocks, fix Hono → OpenAPIHono

* fix: address review findings — double-rollback, missing guards, log context, tests

- Fix double-rollback in deleteWorkflowRun by removing inner rollback()
  call and letting the outer catch handle it
- Add terminal-status guard inside deleteWorkflowRun itself, not just in
  the route handler, to prevent deletion of running workflows
- Add rollback failure logging to the rollback() helper
- Add runId to error logs in resume/abandon/delete API route handlers
- Add workingPath to getActiveWorkflowRunByPath error log
- Add workflowRunId to dag-executor status-check warn logs
- Wrap workflowRunCommand in try/catch in workflowResumeCommand with
  structured logging and null guard for user_message
- Clean up stale 'interrupted' references in JSDoc
- Fix missing / prefix on workflow cleanup in commands-reference.md
- Add API route tests for POST /resume, POST /abandon, DELETE /:runId

* refactor: apply code simplifications from review

- Replace fragile startsWith string matching in deleteWorkflowRun catch
  with a typed WorkflowRunGuardError class
- Reorder listWorkflowRuns placeholder generation: capture startIdx
  before pushing values for clarity
- Replace curried makeRunAction factory in DashboardPage with a plain
  runAction helper function
- Move skipIfStatusChanged helper definition before its call sites in
  dag-executor to match reading order

* fix(workflows): idle timeout too aggressive on DAG nodes (coleam00#854) (coleam00#886)

* fix(workflows): idle timeout too aggressive — break after result, reset on all messages (coleam00#854)

The idle timeout (5 min default) caused two problems: (1) after a node's AI
finished (result message), the loop waited for the subprocess to exit, wasting
5 min on hangs; (2) tool messages didn't reset the timer, so long Bash calls
(tests, builds) triggered false timeouts on actively working nodes.

Changes:
- Break out of the for-await loop immediately after receiving the result message
  in both command/prompt and loop node paths — no more post-completion waste
- Remove shouldResetTimer predicate so all message types (including tool) reset
  the timer — timeout only fires on complete silence
- Increase STEP_IDLE_TIMEOUT_MS from 5 min to 30 min — with every message
  resetting the timer, this is a deadlock detector, not a work limiter

Fixes coleam00#854

* fix(workflows): update withIdleTimeout JSDoc to match new timer behavior

Remove the tool-exclusion example from the shouldResetTimer docs since
that pattern was just removed from all call sites. Clarify that most
callers should omit the parameter.

* fix(workflows): address review findings — log cleanup errors, add break tests, fix stale docs

- Log generator cleanup errors in withIdleTimeout instead of silently swallowing
- Add behavioral tests for break-after-result in both command/prompt and loop nodes
- Fix stale "5 minutes" default in docs/loop-nodes.md (now 30 minutes)
- Clarify shouldResetTimer test names and comments (utility API, not executor behavior)
- Extract effectiveIdleTimeout in loop node path (matches command/prompt pattern)
- Remove redundant iterResult alias in withIdleTimeout

* feat: add approval gate node type for human-in-the-loop workflows (coleam00#888)

* feat: add approval gate node type for human-in-the-loop workflows

Add `approval` as a fifth DAG node type that pauses workflow execution
until a human approves or rejects. Built on existing resume infrastructure.

- New `approval` node type in YAML workflows with `message` field
- `paused` workflow status (non-terminal, resumable)
- Approve/reject via REST API, CLI, chat commands, and Web UI
- Approval comment available as `$node.output` in downstream nodes
- Dashboard shows amber pulsing badge with approval message for paused runs
- Path guard blocks new workflows on paused worktrees

* fix: resolve 7 bugs in approval gate implementation

Critical:
- Parse metadata JSON on SQLite (returned as TEXT string, not object)
- Suppress spurious "Workflow stopped (paused)" message after approval gates

Medium:
- CLI exits 0 on pause (not error code 1)
- workflow status shows paused runs alongside running
- Docs clarify auto-resume is CLI-only; API/chat mark resumable

Low:
- Skip completed_at when transitioning to failed for approval resume
- Warn on AI fields (model, hooks, etc.) set on approval nodes
- Add rowCount check to updateWorkflowRun

Also: extract ApprovalContext type, add .min(1) to approval.message,
fix stale "Only failed runs" error messages, fix log domain prefix,
add recovery instructions to CLI approve error message.

* feat(workflows): add interactive PRD workflow with approval gates (coleam00#891)

Guided PRD creation through 3 approval gates:
1. Foundation questions (who/what/why/why now)
2. Deep dive questions (vision/users/JTBD/constraints)
3. Scope questions (MVP/hypothesis/exclusions)

AI researches market + codebase between gates. Each gate carries
the user's answers via $gate.output to downstream nodes.
Final node generates the PRD from accumulated context.

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#890)

* fix(cli): pass codebase_id to workflowRunCommand on resume/approve (coleam00#889)

When workflow resume or approve is called, the worktree path is passed as cwd,
but the codebase was registered under the main checkout path. This caused
findCodebaseByDefaultCwd to return null, triggering auto-registration which
tried to create a source symlink that already existed, crashing the command.

Changes:
- Add codebaseId option to WorkflowRunOptions
- Look up codebase by stored ID before falling back to auto-registration
- Pass run.codebase_id from workflowResumeCommand and workflowApproveCommand
- Add test verifying getCodebase is called with stored codebase_id on resume

Fixes coleam00#889

* fix(cli): address review findings for PR coleam00#890

- Add domain prefix to 5 log events in codebase-lookup block
  (cli.codebase_lookup_failed, cli.db_connection_hint,
  cli.codebase_id_lookup_failed, cli.codebase_auto_registered,
  cli.codebase_auto_registration_failed)
- Add inline comment clarifying intentional fallthrough in catch block
- Add test for workflowApproveCommand codebase_id forwarding
- Add test for error fallback path when getCodebase throws
- Update pre-existing test assertion to match renamed log event

* fix(workflows): improve interactive PRD — first-principles prompting + validation node (coleam00#893)

Two improvements to the interactive PRD workflow:

1. First-principles prompting: All research/technical/generate nodes now
   instruct the AI to start from what exists, verify claims by reading
   actual files, prefer extending over creating, and cite exact file:line.

2. Validation node: New final node reads the generated PRD and checks
   every technical claim against the codebase — file paths, endpoint
   names, DB schemas, event types, component names. Corrects inline
   and documents what was fixed.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Rasmus Widing <152263317+Wirasm@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants