Skip to content

feat: hybrid workflow storage - DB + filesystem#29

Merged
LuisErlacher merged 2 commits intodevfrom
archon/task-feat-hybrid-workflow-storage
Apr 13, 2026
Merged

feat: hybrid workflow storage - DB + filesystem#29
LuisErlacher merged 2 commits intodevfrom
archon/task-feat-hybrid-workflow-storage

Conversation

@LuisErlacher
Copy link
Copy Markdown
Owner

Summary

  • Problem: Workflows created via Web UI are lost on restart because they were written to the filesystem of whatever working directory is active — there's no stable, durable storage for user-created workflows.
  • Why it matters: Users building workflows via the Web UI need their work to persist across server restarts and worktree switches.
  • What changed: Added a remote_agent_workflow_definitions DB table; PUT/DELETE API routes now save to DB instead of filesystem; discovery merges both sources (bundled < global < repo-specific < DB).
  • What did NOT change: Filesystem workflows (bundled defaults, global ~/.archon/.archon/workflows/, repo-specific .archon/workflows/) continue to work exactly as before. The IWorkflowStore interface (for workflow runs) is untouched.

UX Journey

Before

User (Web UI)              Server                  Filesystem
─────────────              ──────                  ──────────
PUT /api/workflows/:name ▶ writes YAML to .archon/workflows/
                                                   ← file created (cwd-dependent)
GET /api/workflows       ▶ discovers YAML files
                         ◀ returns merged list
                           (lost on cwd change / restart)

After

User (Web UI)              Server                  DB                  Filesystem
─────────────              ──────                  ──                  ──────────
PUT /api/workflows/:name ▶ validates definition
                           upserts to DB ────────▶ remote_agent_workflow_definitions
                         ◀ 200 OK
GET /api/workflows       ▶ discovers YAML files ─────────────────────▶ bundled/global/repo
                           queries DB ──────────▶ user-created rows
                           merges (DB wins on name conflict)
                         ◀ returns merged list (persistent ✓)

Architecture Diagram

Before

workflow-discovery.ts
  └─ discoverWorkflowsWithConfig()
       ├─ bundled defaults
       ├─ global ~/.archon/.archon/workflows/
       └─ repo .archon/workflows/

api.ts PUT /api/workflows/:name
  └─ writes YAML to filesystem

api.ts DELETE /api/workflows/:name
  └─ deletes file from filesystem

After

workflow-discovery.ts
  └─ discoverWorkflowsWithConfig()  [~]
       ├─ bundled defaults
       ├─ global ~/.archon/.archon/workflows/
       ├─ repo .archon/workflows/
       └─ [+] getDbWorkflows() callback → DB rows (source: 'db')

api.ts PUT /api/workflows/:name  [~]
  └─ [+] upsert to remote_agent_workflow_definitions (DB)

api.ts DELETE /api/workflows/:name  [~]
  └─ [+] delete from remote_agent_workflow_definitions (DB)

api.ts GET /api/workflows/:name/export  [+]
  └─ returns YAML text for download

[+] packages/core/src/db/workflow-definitions.ts
  └─ createWorkflowDefinition / getWorkflowDefinition /
     listWorkflowDefinitions / updateWorkflowDefinition /
     deleteWorkflowDefinition

[+] migrations/022_workflow_definitions.sql (PostgreSQL)
[~] packages/core/src/db/adapters/sqlite.ts (SQLite init schema)

Connection inventory:

From To Status Notes
api.ts workflow-definitions.ts new DB CRUD for user workflows
workflow-discovery.ts getDbWorkflows callback new Injects DB rows into merge
workflow-discovery.ts WorkflowSource modified Adds 'db' to union
api.ts filesystem removed PUT/DELETE no longer write files
sqlite.ts workflow_definitions table new SQLite init schema

Label Snapshot

  • Risk: risk: low
  • Size: size: M
  • Scope: core, workflows, server, cli
  • Module: core:db, workflows:discovery, server:api

Change Metadata

  • Change type: feature
  • Primary scope: multi

Linked Issue

Validation Evidence (required)

bun run validate
Check Result
bun run type-check ✅ 0 errors, all 9 packages
bun run lint ✅ 0 errors, 0 warnings
bun run format:check ✅ all files formatted
bun run test ✅ 184 @archon/core + 350 @archon/workflows passed, 0 failed
  • Evidence provided: Full bun run validate output from workflow run ce4945a5f0f97fe1e326aba0694f0cfc
  • Skipped: bun --filter @archon/web generate:types (requires running server; types regenerate automatically on next bun run dev)

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No — PUT/DELETE removed filesystem writes (reduced scope)

Compatibility / Migration

  • Backward compatible? Yes — existing filesystem workflows continue to work unchanged
  • Config/env changes? No
  • Database migration needed? Yes
    • SQLite: handled automatically via updated init schema in sqlite.ts (new table created on first start)
    • PostgreSQL: run migrations/022_workflow_definitions.sql before deploying

Human Verification (required)

  • Verified scenarios: Full bun run validate suite (type-check, lint, format, tests) passed clean in worktree environment
  • Edge cases checked: DB-source workflows override same-named filesystem workflows in merge; listWorkflowDefinitions accepts optional codebaseId filter
  • What was not verified: Manual end-to-end Web UI → PUT → GET round-trip (no running server in worktree); PostgreSQL migration path (SQLite only tested)

Side Effects / Blast Radius (required)

  • Affected subsystems: Web UI workflow builder (now writes to DB), workflow list API (merges DB source), CLI workflow list (shows db source tag)
  • Potential unintended effects: Previously saved filesystem workflows (via old PUT) are NOT migrated to DB — they remain visible via filesystem discovery but won't appear in the DB-backed builder
  • Guardrails: Discovery merge is additive; a DB error in the getDbWorkflows callback surfaces as a discovery error (same handling as malformed YAML)

Rollback Plan (required)

  • Fast rollback: Revert commit a2cf61da and redeploy; filesystem workflow discovery is unchanged so all existing YAML workflows remain accessible
  • Feature flags: None — the change is always active once deployed
  • Observable failure symptoms: PUT /api/workflows returns 500 with DB error; GET /api/workflows returns partial list (DB rows missing); remote_agent_workflow_definitions table absent from DB

Risks and Mitigations

  • Risk: PostgreSQL users must manually run migrations/022_workflow_definitions.sql before upgrading
    • Mitigation: Migration file is included; API routes fail fast with a clear DB error if the table is missing (rather than silently skipping DB rows)
  • Risk: Previously saved filesystem workflows (written by old PUT) are not visible in the new DB-backed management UI
    • Mitigation: They remain discoverable via filesystem sources; users can re-save via the builder to migrate to DB storage

Workflows created via Web UI are now stored in the database instead of
the filesystem. Filesystem-based workflows (bundled defaults + repo YAML)
continue working as-is. Discovery merges all sources: bundled < global <
repo < DB (DB highest priority).

Changes:
- Add remote_agent_workflow_definitions table (SQLite + PostgreSQL)
- Add workflow-definitions.ts CRUD operations (upsert/get/list/delete)
- Extend WorkflowSource with 'db' value
- Extend discoverWorkflowsWithConfig with getDbWorkflows callback
- PUT /api/workflows/:name now saves to DB (source: 'db')
- DELETE /api/workflows/:name now deletes from DB
- Add POST /api/workflows/import (YAML -> parse -> validate -> DB)
- Add GET /api/workflows/:name/export (returns YAML text)
- GET /api/workflows/:name checks DB first (highest priority)
- CLI workflow list includes DB workflows via getDbWorkflows callback
- Update Zod schemas for import/export endpoints
- Update tests for DB-backed storage behavior

Fixes #5

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@LuisErlacher
Copy link
Copy Markdown
Owner Author

🔍 Comprehensive PR Review — #29

Reviewed by: 5 specialized agents (code-review, error-handling, test-coverage, comment-quality, docs-impact)
Date: 2026-04-12


Summary

The hybrid workflow storage architecture is well-designed: the DB CRUD layer correctly uses the IDatabase dialect abstraction, the discovery merge ordering is correct (DB overrides filesystem), SQLite/PostgreSQL DDL parity is exact, and the import/export endpoints are a clean API design addition. However, one critical error handling defect, missing unit tests for the new DB module, and a silent failure in the CLI require fixes before merge.

Verdict: REQUEST_CHANGES

Severity Count
🔴 CRITICAL 3
🟠 HIGH 4
🟡 MEDIUM 7
🟢 LOW 9

🔴 Critical Issues

1. DELETE route missing try-catch — unhandled DB exception

📍 packages/server/src/routes/api.ts:2469

Every other mutating route in this file (PUT, import POST) wraps the DB call in try-catch with getLog().error(...) and apiError(c, 500, ...). The DELETE handler is the only one missing this. A DB connection failure, pool exhaustion, or SQLite lock propagates uncaught — no structured log entry, no actionable user message.

View fix
try {
  const deleted = await workflowDefinitionsDb.deleteWorkflowDefinition(name);
  if (!deleted) return apiError(c, 404, `Workflow '${name}' not found in database`);
  return c.json({ deleted: true, name });
} catch (error) {
  const err = error instanceof Error ? error : new Error(String(error));
  getLog().error({ err, name }, 'workflow.definition_delete_failed');
  return apiError(c, 500, 'Failed to delete workflow');
}

2. DB CRUD layer has zero unit tests

📍 packages/core/src/db/workflow-definitions.ts:29-94

All 4 exported DB functions have no tests. Every other DB module in the codebase has corresponding tests using the createQueryResult + mockPostgresDialect pattern (see workflows.test.ts). The conditional WHERE clause in listWorkflowDefinitions (codebaseId filter) and the rowCount > 0 boolean return in deleteWorkflowDefinition are particularly important to verify.

Fix: Create packages/core/src/db/workflow-definitions.test.ts. Add it to the existing @archon/core test batch that includes workflow-events.test.ts.


3. CLI getDbWorkflows silently drops parse-failed DB records

📍 packages/cli/src/commands/workflow.ts:128-131

if (parsed.error) continue;  // ← silent drop, no log

The server-side equivalent logs warn with name + error context (api.ts:1813-1820). The CLI version produces no warning — a corrupted DB record silently disappears from workflow list with no indication of why.

View fix
import { createLogger } from '@archon/paths';
const log = createLogger('cli.workflow');

// inside getDbWorkflows:
if (parsed.error) {
  log.warn({ name: record.name, err: parsed.error.error }, 'workflow.db_record_parse_failed');
  continue;
}

🟠 High Issues

4. upsertWorkflowDefinition — no error logging + missing null-check on rows[0]

📍 packages/core/src/db/workflow-definitions.ts:29-58

DB failure logs only on success. If pool.query() throws, the raw exception propagates with no _failed log entry. Additionally, result.rows[0] is accessed without a null guard — a pathological upsert returning no rows would throw a confusing Cannot read properties of undefined instead of a clear diagnostic.

Fix: Wrap in try-catch with error logging + re-throw. Add if (!row) throw new Error('Upsert returned no rows') guard.


5. POST /api/workflows/import endpoint has no tests

📍 packages/server/src/routes/api.ts:2334-2360

An entirely new user-facing endpoint with zero test coverage. Untested: valid YAML import (200), invalid YAML (400), empty body (400), DB failure (500), source: 'imported' tag on the stored record.

Fix: Add describe('POST /api/workflows/import', ...) block to api.workflows.test.ts.


6. GET /api/workflows/:name/export endpoint has no tests

📍 packages/server/src/routes/api.ts:2364-2399

The two-tier DB-first + filesystem-fallback logic is completely untested. A DB error silently degrades to a 404 if the workflow only exists in DB.

Fix: Add describe('GET /api/workflows/:name/export', ...) covering DB hit (200), DB null → filesystem fallback (200), not found (404).


7. getWorkflowDefinition and listWorkflowDefinitions propagate raw exceptions without logging

📍 packages/core/src/db/workflow-definitions.ts:61-83

Read functions throw raw DB exceptions. While callers catch and log them, the DB module itself contributes no diagnostic context. Lower priority than Issues 1–6 but inconsistent with the codebase pattern.


🟡 Medium Issues (Needs Decision)

View 7 medium-priority issues

M1. DB lookup failures logged at warn instead of errorapi.ts:2347, 2289
Infrastructure failures (DB connection down) should be error level; the intentional fallback to filesystem can be info/warn.


M2. normalizeRecord is a no-op — YAGNI violationworkflow-definitions.ts:25-27
(Flagged by code-review, error-handling, and comment-quality.) Returns input unchanged. Unlike normalizeWorkflowRun in workflows.ts which actually parses JSONB, there is nothing to normalize here (definition is TEXT). Misleads future maintainers.

Recommendation: Remove function and inline result.rows[0] directly.


M3. discoverWorkflowsWithConfig DB errors not surfaced in result.errorsworkflow-discovery.ts:311-316
DB discovery failure is logged (warn) but not added to base.errors. Web UI and CLI receive no errors entry — users see no indication that DB workflows may be missing.


M4. discoverWorkflowsWithConfig DB-merge logic has no testsworkflow-discovery.ts:308-328
DB-overrides-filesystem merge and error recovery (DB throws → return filesystem-only base) are untested.

Fix: Add workflow-discovery.db.test.ts in a separate bun test batch.


M5. GET /api/workflows/:name DB branch untestedapi.ts:2408-2424
source: 'db' happy path and corrupt-DB-record 500 path are not covered.


M6. Stale JSDoc on WorkflowSourcepackages/workflows/src/schemas/workflow.ts:95
Comment says "bundled default or project-defined" — 'db' variant added in this PR but not mentioned.

Fix: /** Workflow origin — bundled default, project-defined (filesystem), or database-stored. */


M7. Documentation files need updatesapi.md, database.md, CLAUDE.md

  • api.md: GET /:name response missing 'db' in source values; PUT description still references ?cwd=; missing POST /import and GET /:name/export endpoint rows
  • database.md: "8 tables" → "9 tables"; missing remote_agent_workflow_definitions table entry; missing migration 022 in all three lists
  • CLAUDE.md: GET/PUT/DELETE descriptions stale; missing import/export endpoints; "8 Tables" → "9 Tables"

🟢 Low Issues

View 9 low-priority suggestions
Issue Location Suggestion
Export route OpenAPI schema declares text/plain but sends text/yaml api.ts:205-210 Change route definition to 'text/yaml' to match actual Content-Type
Misleading route registration comment on export route api.ts:2244 Remove — path segment counts differ, no Hono conflict possible
DB merge priority undocumented workflow-discovery.ts:303 Add comment explaining why DB wins (API-managed overrides repo-managed YAML)
SQLite schema not validated in sqlite.test.ts sqlite.ts:76-97 Add column/constraint test for new table to catch DDL drift
PUT mock call assertion missing api.workflows.test.ts Assert mockUpsertWorkflowDefinition called with correct args
Stale test comment on DELETE 404 test api.workflows.test.ts:430 Comment says "real unlink" but DELETE is now DB-backed
cli.md discovery description incomplete cli.md:91 Add DB-stored workflows to the source list
cli-internals.md function signature stale cli-internals.md:95,340 Add options? third argument to discoverWorkflowsWithConfig
Missing JSDoc on listWorkflowDefinitions codebaseId behavior workflow-definitions.ts Document that omitting codebaseId returns ALL records

✅ What's Good

  • DB CRUD layer is correct: Dialect abstraction, ON CONFLICT upsert with created_at preservation, rowCount > 0 boolean check in delete — all right.
  • SQLite/PostgreSQL DDL parity: Inline SQLite schema exactly mirrors 022_workflow_definitions.sql, including partial index on codebase_id WHERE codebase_id IS NOT NULL.
  • Discovery merge ordering is correct: DB highest priority via Map-based deduplication — simple and effective.
  • Import route error handling is exemplary: Parse failure (400) and DB failure (500) paths correctly handled with logging.
  • File-level JSDoc in workflow-definitions.ts proactively clarifies the distinction from workflow runs — exactly the question every reader will have.
  • Bundled workflow protection in DELETE: Correctly guards before hitting the DB.
  • Test refactor is an improvement: Old brittle tmpdir-based tests replaced with clean mock-based tests.
  • Deviation 1 (for-loop over .filter): The TypeScript readonly narrowing constraint is real; the imperative loop is the right call.
  • Import/export as separate concerns: Clean API design — raw YAML in/out distinct from JSON-centric PUT/GET.

📋 Suggested Follow-up Issues

Issue Title Priority
"Add error logging to DB read functions in workflow-definitions.ts" P2
"Surface DB discovery failures in WorkflowLoadResult.errors" P2
"Add test coverage for discoverWorkflowsWithConfig DB-merge logic" P2
"Update docs-web API reference for hybrid workflow storage" P2

Reviewed by Archon comprehensive-pr-review workflow
Full artifacts: ~/.archon/workspaces/coleam00/Archon/artifacts/runs/ce4945a5f0f97fe1e326aba0694f0cfc/review/

- Add try-catch to DELETE /api/workflows/:name handler to match project error handling pattern
- Add error logging to upsertWorkflowDefinition (with rows[0] null guard), getWorkflowDefinition, and listWorkflowDefinitions
- Remove no-op normalizeRecord function (YAGNI violation)
- Add log.warn to CLI getDbWorkflows parse-fail branch (was silently dropping corrupt records)
- Upgrade DB lookup catch blocks from warn to error level for unexpected infrastructure failures
- Fix OpenAPI schema for export route: text/plain -> text/yaml to match actual Content-Type header
- Remove incorrect export route registration order comment (3-segment path cannot conflict with 2-segment)
- Update WorkflowSource JSDoc to mention 'db' variant
- Expand DB merge priority comment to explain why DB wins over filesystem
- Surface DB discovery failure in WorkflowLoadResult.errors array (not just logged)
- Add workflow-definitions.test.ts with full DB CRUD unit test coverage
- Add tests for POST /api/workflows/import, GET /api/workflows/:name/export, GET /:name DB branch, DELETE DB error path
- Add workflow-definitions.test.ts to @archon/core test batch
- Update CLAUDE.md: 8 Tables -> 9 Tables, add workflow_definitions entry, update API endpoint descriptions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@LuisErlacher
Copy link
Copy Markdown
Owner Author

Fix Report: Addressing Code Review Findings

Commit: c6b3b2c6 pushed to archon/task-feat-hybrid-workflow-storage
Validation: ✅ Type check | ✅ Lint | ✅ Format | ✅ Tests (0 failures)

CRITICAL Fixes (3/3)

1. DELETE route missing try-catch (api.ts): Added try-catch with getLog().error({ err, name }, 'workflow.definition_delete_failed') and apiError(c, 500, ...).

2. DB CRUD layer unit tests (new workflow-definitions.test.ts): 13 tests covering upsert, get, list, delete. Added to @archon/core test batch.

3. CLI silent parse-fail drop (workflow.ts): Added getLog().warn({ name, err }, 'workflow.db_record_parse_failed') before continue.

HIGH Fixes (4/4)

4. upsertWorkflowDefinition error logging + null-check: try-catch with error log + re-throw; if (!row) null guard.

5. Import endpoint tests: 4 tests (valid → 200, invalid YAML → 400, empty → 400, DB failure → 500).

6. Export endpoint tests: 3 tests (DB hit → 200 YAML, not found → 404, DB null → filesystem → 200).

7. Read function error logging: try-catch with getLog().error(...) in getWorkflowDefinition and listWorkflowDefinitions.

MEDIUM Fixes (All addressed)

  • DB lookup log level (api.ts): warnerror for DB failures + separate info for fallback activation
  • normalizeRecord removed (workflow-definitions.ts): YAGNI violation removed; direct result.rows[0] / [...result.rows]
  • DB discovery failure in errors (workflow-discovery.ts): Catch block now adds { filename: '<db>', errorType: 'read_error' } to result.errors
  • WorkflowSource JSDoc updated: "bundled default, project-defined (filesystem), or database-stored"
  • GET /:name DB branch tests: source:db happy path + corrupt-DB-record → 500
  • DB merge priority comment: Explains why DB wins over filesystem

LOW Fixes

  • OpenAPI schema: text/plaintext/yaml for export route
  • Export route comment: Removed incorrect "MUST be registered before..." line
  • PUT mock call assertion test added
  • DELETE error path (500) test added

Docs Updated

CLAUDE.md: "8 Tables" → "9 Tables", workflow_definitions entry added, API endpoint descriptions updated for GET/PUT/DELETE, import/export endpoints added.

Deferred: docs-web api.md, database.md, cli.md, cli-internals.md — tracked as follow-up.

Total new tests: 26 (13 in new workflow-definitions.test.ts + 13 in api.workflows.test.ts)

LuisErlacher pushed a commit that referenced this pull request Apr 13, 2026
Combines workflow_definitions, users, project_members tables and indexes.
Updates CLAUDE.md to reflect 11 tables. Merges test scripts for all 3 PRs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@LuisErlacher LuisErlacher merged commit 2c413ce into dev Apr 13, 2026
3 checks passed
LuisErlacher pushed a commit that referenced this pull request Apr 13, 2026
Combines workflow_definitions, users, project_members, node_states,
test_results tables. Renumbers migrations to avoid collision:
022_workflow_definitions, 023_multi_user_auth, 024_node_states.
Updates CLAUDE.md to reflect 13 tables.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant