Skip to content
1,109 changes: 1,109 additions & 0 deletions docs/superpowers/plans/2026-04-14-analytics-hardening.md

Large diffs are not rendered by default.

195 changes: 195 additions & 0 deletions docs/superpowers/specs/2026-04-14-analytics-hardening-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
# Analytics Hardening — Design Spec

**Date**: 2026-04-14
**Scope**: Tier-1 debt from the Workflow Health Metrics (Improvement #6) review loop
**Branch**: `feat/analytics-hardening`
**Target**: PR to `dev`

---

## Problem

Three concrete issues surfaced across the three-round peer review on `feat/workflow-health-metrics`:

1. **Duplicated data fetching** — `CostSummaryCard` and `WorkflowHealthCard` each call `useQuery` with identical key, function, and stale time. Reviewer #3 flagged this as a latent bug: a third card requesting different `days` would silently collide on the shared query key.
2. **Test gap** — All three reviewers flagged that `workflow-analytics` lacks test coverage for empty results, clock-skew exclusion, the `MIN_RUNS_FOR_FAILURE_RANKING` threshold, and sort ordering.
3. **SQLite day-boundary filter bug** — `WHERE started_at >= $1` compares TEXT byte-wise in SQLite. Stored format uses a space separator (`2026-04-14 13:53:10`); callers typically pass ISO-T format (`2026-04-14T00:00:00.000Z`). Byte comparison treats `T` (0x54) as greater than space (0x20), so legitimate rows get dropped at day boundaries. Reviewer #1 flagged as worth a separate PR.

Items (1) and (3) are latent bugs. Item (2) closes out the review debt on the health metrics work.

## Non-goals

- Renaming `CostAnalyticsResponse` → `AnalyticsResponse` (Tier-3 debt #8)
- Trend line UI on the dashboard (Tier-2 #4)
- Storage format migration for `started_at` / `completed_at`
- Any new analytics queries or aggregations

## Design

### 1. `useCostAnalytics` hook

**Location**: `packages/web/src/hooks/useCostAnalytics.ts`

```ts
import { useQuery, type UseQueryResult } from '@tanstack/react-query';
import { getCostAnalytics, type CostAnalytics } from '@/lib/api';

const STALE_TIME_MS = 30_000;

export function useCostAnalytics(days: number): UseQueryResult<CostAnalytics> {
return useQuery({
queryKey: ['cost-analytics', { days }],
queryFn: () => getCostAnalytics(days),
staleTime: STALE_TIME_MS,
});
}
```

**Design choices**:
- Thin passthrough returning `UseQueryResult<CostAnalytics>` — callers can destructure any query field without the hook needing to evolve (`error`, `refetch`, `isFetching`).
- `days` parameter goes into the query key so independent cards with different windows do not share cache.
- `STALE_TIME_MS` captured as a single constant in the hook — one tuning point.

**Call-site changes**:
- `packages/web/src/components/dashboard/CostSummaryCard.tsx` — replace inline `useQuery` at lines 62–66 with `const { data, isLoading } = useCostAnalytics(30);`
- `packages/web/src/components/dashboard/WorkflowHealthCard.tsx` — replace inline `useQuery` at lines 56–60 with `const { data, isLoading } = useCostAnalytics(30);`

**Naming**: `useCostAnalytics` matches the existing API function name `getCostAnalytics`. The debt around `CostAnalytics` containing health fields is tracked separately as Tier-3 #8 and deferred.

### 2. SQLite day-boundary filter fix

**File**: `packages/core/src/db/workflow-analytics.ts`

**Add one helper**:

```ts
/**
* Dialect-aware `started_at >= param` filter.
*
* SQLite stores datetimes as TEXT with space separator
* (`2026-04-14 13:53:10`). When callers pass ISO-T format
* (`2026-04-14T00:00:00.000Z`), byte-wise comparison drops
* legitimate rows (T > space). `datetime()` normalizes both
* sides and returns NULL for unparseable input, which
* excludes the row safely.
*
* PostgreSQL's `timestamp` type handles implicit string
* casts correctly, so the wrap is only needed for SQLite.
*/
function startedAtSinceFilter(placeholder: number): string {
return getDatabaseType() === 'postgresql'
? `started_at >= $${placeholder}`
: `datetime(started_at) >= datetime($${placeholder})`;
}
```

**Change the three call sites** (lines 64, 92, 124):

```ts
// Before:
WHERE started_at >= $1

// After:
WHERE ${startedAtSinceFilter(1)}
```

**Scope**: Only the parametrized `started_at >= $1` filter. The `completed_at >= started_at` clock-skew check inside `getAvgDuration` is a column-to-column comparison where both sides share the stored format, so it already works correctly.

**No caller changes**. Input format assumptions disappear from the caller side; `datetime()` absorbs the variance.

### 3. Test coverage

#### 3a. DB-layer tests

**File**: `packages/core/src/db/workflow-analytics.test.ts` (new)

Pattern follows the existing `packages/core/src/db/workflows.test.ts` convention with one tweak — a mutable `mockDbType` variable so a single test file can cover both SQLite and PostgreSQL code paths:

```ts
import { mock, describe, test, expect, beforeEach } from 'bun:test';
import { createQueryResult, mockPostgresDialect } from '../test/mocks/database';

const mockQuery = mock(() => Promise.resolve(createQueryResult([])));
let mockDbType: 'sqlite' | 'postgresql' = 'postgresql';

mock.module('./connection', () => ({
pool: { query: mockQuery },
getDialect: () => mockPostgresDialect,
getDatabaseType: () => mockDbType,
}));

import { getCostByWorkflow, getDailyCosts, getAvgDuration } from './workflow-analytics';
```

Assertions are made against the **SQL string captured by `mockQuery.mock.calls[0][0]`** — no real database execution. KISS.

**Cases**:

| Case | Assertion |
|---|---|
| Empty result | All three functions return `[]` / `0` when mock returns zero rows |
| Day-boundary (SQLite) | `mockDbType = 'sqlite'` → captured SQL contains `datetime(started_at) >= datetime($1)` for all three functions |
| Day-boundary (Postgres) | `mockDbType = 'postgresql'` → captured SQL contains `started_at >= $1` without `datetime()` wrap |
| Clock-skew exclusion | `getAvgDuration` SQL contains `completed_at >= started_at` |
| `getCostByWorkflow` sort | SQL contains `ORDER BY cost_usd DESC` |
| `getDailyCosts` sort | SQL contains `ORDER BY date ASC` |
| Type coercion | `run_count` returned as `number` even when mock returns string `"5"` |

#### 3b. Route aggregator tests

**File**: `packages/server/src/routes/api.analytics.test.ts` (new)

Pattern follows the existing `packages/server/src/routes/api.*.test.ts` convention — mocks DB layer, exercises the route handler, asserts response shape.

**Cases**:

| Case | Assertion |
|---|---|
| `MIN_RUNS_FOR_FAILURE_RANKING` threshold (below) | Workflow with 2 runs (1 failed) → NOT in `topFailingWorkflows` |
| `MIN_RUNS_FOR_FAILURE_RANKING` threshold (at) | Workflow with 3 runs (1 failed) → IS in `topFailingWorkflows` |
| Zero-failure exclusion | Workflow with 5 runs, 0 failures → NOT in `topFailingWorkflows` |
| Sort by failure rate | Two qualifying workflows → higher `failureRate` listed first |
| `slice(0, 3)` cap | Four qualifying workflows → response contains exactly 3 |
| Response contract | Success case returns the full `CostAnalytics` shape (totals, byWorkflow, daily, successRate, avgDurationSeconds, topFailingWorkflows) |
| Query param validation | `?days=0` and `?days=-1` rejected by existing route schema |

#### 3c. Test batch update

`workflow-analytics.test.ts` uses `mock.module('./connection', ...)` which conflicts with `workflows.test.ts` in the same directory (same path, different implementation — violates the project mock pollution rule).

Update `packages/core/package.json` `scripts.test` to split `workflow-analytics.test.ts` into its own `bun test` invocation, inserted after the main DB batch:

```json
"test": "... && bun test src/db/workflow-analytics.test.ts && ..."
```

The route test requires an equivalent batch entry. The server package test script invokes each `api.*.test.ts` as its own `bun test` call, so append `&& bun test src/routes/api.analytics.test.ts` to the `packages/server/package.json` test script to match the existing convention.

## Validation

- `bun run validate` passes locally (type-check + lint + format:check + tests)
- CI green on PR before merge
- Manual spot-check: `bun run dev`, load dashboard, confirm both cards render against real data with no visual change
- No type regression — `useCostAnalytics` consumers type-check without casts

## Error handling

- `useCostAnalytics`: inherits TanStack Query's existing error semantics via `UseQueryResult`. No new error paths.
- `startedAtSinceFilter`: pure string builder; cannot fail. `datetime()` returning `NULL` for unparseable storage is the safe outcome (row excluded, not included).
- Tests: mock-driven; no real database or network I/O.

## Commit plan

Three atomic commits on `feat/analytics-hardening`:

1. `feat(web): extract useCostAnalytics hook` — hook file + two call-site swaps
2. `fix(core): SQLite day-boundary filter in workflow-analytics queries` — helper + three query call sites
3. `test(core,server): add workflow-analytics DB + route aggregator tests` — two test files + batch entries in both `packages/core/package.json` and `packages/server/package.json`

## Rollback

Each commit is independently revertable.

- Commit 1 regression → revert; cards fall back to inline `useQuery` (pre-PR state).
- Commit 2 regression → revert; Postgres users see no behavior change (Postgres was already correct). SQLite users return to the previous incorrect-but-stable state.
- Commit 3 regression → revert; no runtime impact, only test coverage removed.
2 changes: 1 addition & 1 deletion packages/core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
"./state/*": "./src/state/*.ts"
},
"scripts": {
"test": "bun test src/clients/codex-binary-guard.test.ts && bun test src/utils/codex-binary-resolver.test.ts && bun test src/utils/codex-binary-resolver-dev.test.ts && bun test src/clients/claude.test.ts src/clients/codex.test.ts src/clients/factory.test.ts && bun test src/handlers/command-handler.test.ts && bun test src/handlers/clone.test.ts && bun test src/db/adapters/postgres.test.ts && bun test src/db/adapters/sqlite.test.ts src/db/codebases.test.ts src/db/connection.test.ts src/db/conversations.test.ts src/db/env-vars.test.ts src/db/isolation-environments.test.ts src/db/messages.test.ts src/db/sessions.test.ts src/db/workflow-events.test.ts src/db/workflows.test.ts src/utils/defaults-copy.test.ts src/utils/worktree-sync.test.ts src/utils/conversation-lock.test.ts src/utils/credential-sanitizer.test.ts src/utils/port-allocation.test.ts src/utils/error.test.ts src/utils/error-formatter.test.ts src/utils/github-graphql.test.ts src/config/ src/state/ && bun test src/utils/path-validation.test.ts && bun test src/services/cleanup-service.test.ts && bun test src/services/title-generator.test.ts && bun test src/services/cron-parser.test.ts && bun test src/services/knowledge-writer.test.ts && bun test src/workflows/ && bun test src/operations/workflow-operations.test.ts && bun test src/operations/isolation-operations.test.ts && bun test src/orchestrator/orchestrator.test.ts && bun test src/orchestrator/orchestrator-agent.test.ts && bun test src/orchestrator/orchestrator-isolation.test.ts",
"test": "bun test src/clients/codex-binary-guard.test.ts && bun test src/utils/codex-binary-resolver.test.ts && bun test src/utils/codex-binary-resolver-dev.test.ts && bun test src/clients/claude.test.ts src/clients/codex.test.ts src/clients/factory.test.ts && bun test src/handlers/command-handler.test.ts && bun test src/handlers/clone.test.ts && bun test src/db/adapters/postgres.test.ts && bun test src/db/adapters/sqlite.test.ts src/db/codebases.test.ts src/db/connection.test.ts src/db/conversations.test.ts src/db/env-vars.test.ts src/db/isolation-environments.test.ts src/db/messages.test.ts src/db/sessions.test.ts src/db/workflow-events.test.ts src/db/workflows.test.ts src/utils/defaults-copy.test.ts src/utils/worktree-sync.test.ts src/utils/conversation-lock.test.ts src/utils/credential-sanitizer.test.ts src/utils/port-allocation.test.ts src/utils/error.test.ts src/utils/error-formatter.test.ts src/utils/github-graphql.test.ts src/config/ src/state/ && bun test src/utils/path-validation.test.ts && bun test src/db/workflow-analytics.test.ts && bun test src/services/cleanup-service.test.ts && bun test src/services/title-generator.test.ts && bun test src/services/cron-parser.test.ts && bun test src/services/knowledge-writer.test.ts && bun test src/workflows/ && bun test src/operations/workflow-operations.test.ts && bun test src/operations/isolation-operations.test.ts && bun test src/orchestrator/orchestrator.test.ts && bun test src/orchestrator/orchestrator-agent.test.ts && bun test src/orchestrator/orchestrator-isolation.test.ts",
"type-check": "bun x tsc --noEmit",
"build": "echo 'No build needed - Bun runs TypeScript directly'"
},
Expand Down
176 changes: 176 additions & 0 deletions packages/core/src/db/workflow-analytics.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
import { mock, describe, test, expect, beforeEach } from 'bun:test';
import { createQueryResult, mockPostgresDialect } from '../test/mocks/database';

const mockQuery = mock(() => Promise.resolve(createQueryResult([])));
// Mutable so each `describe` block can flip the dialect by mutating this in a nested
// `beforeEach`. Safe because:
// - the `mock.module('./connection', ...)` factory below captures it by closure, so
// `getDatabaseType()` reads the current value on every call;
// - Bun runs outer `beforeEach` before inner `beforeEach`, so the outer reset to
// 'postgresql' always fires before an inner flip to 'sqlite';
// - tests within a file run serially, so there's no parallel-execution race.
let mockDbType: 'sqlite' | 'postgresql' = 'postgresql';

mock.module('./connection', () => ({
pool: { query: mockQuery },
getDialect: () => mockPostgresDialect,
getDatabaseType: () => mockDbType,
}));

import { getCostByWorkflow, getDailyCosts, getAvgDuration } from './workflow-analytics';

/** Extract the captured SQL + params from the Nth mockQuery call. Throws with a clear message if the mock wasn't called. */
function getCallArgs(n: number): { sql: string; params: readonly unknown[] } {
const call = mockQuery.mock.calls[n];
if (!call) {
throw new Error(
`mockQuery was not called at index ${n} — function under test may have failed before querying`
);
}
return {
sql: call[0] as string,
params: (call[1] ?? []) as readonly unknown[],
};
}

describe('workflow-analytics db', () => {
beforeEach(() => {
mockQuery.mockReset();
mockQuery.mockImplementation(() => Promise.resolve(createQueryResult([])));
mockDbType = 'postgresql';
});

describe('day-boundary filter (SQLite)', () => {
beforeEach(() => {
mockDbType = 'sqlite';
});

test('getCostByWorkflow wraps started_at with datetime()', async () => {
await getCostByWorkflow('2026-04-14T00:00:00Z');
const { sql, params } = getCallArgs(0);
expect(sql).toContain('datetime(started_at) >= datetime($1)');
expect(params).toEqual(['2026-04-14T00:00:00Z']);
});

test('getDailyCosts wraps started_at with datetime()', async () => {
await getDailyCosts('2026-04-14T00:00:00Z');
const { sql, params } = getCallArgs(0);
expect(sql).toContain('datetime(started_at) >= datetime($1)');
expect(params).toEqual(['2026-04-14T00:00:00Z']);
});

test('getAvgDuration wraps started_at with datetime()', async () => {
await getAvgDuration('2026-04-14T00:00:00Z');
const { sql, params } = getCallArgs(0);
expect(sql).toContain('datetime(started_at) >= datetime($1)');
expect(params).toEqual(['2026-04-14T00:00:00Z']);
});
});

describe('day-boundary filter (Postgres)', () => {
test('getCostByWorkflow uses plain >= comparison', async () => {
await getCostByWorkflow('2026-04-14T00:00:00Z');
const { sql, params } = getCallArgs(0);
expect(sql).toContain('started_at >= $1');
expect(sql).not.toContain('datetime(');
expect(params).toEqual(['2026-04-14T00:00:00Z']);
});

test('getDailyCosts uses plain >= comparison', async () => {
await getDailyCosts('2026-04-14T00:00:00Z');
const { sql, params } = getCallArgs(0);
expect(sql).toContain('started_at >= $1');
expect(sql).not.toContain('datetime(');
expect(params).toEqual(['2026-04-14T00:00:00Z']);
});

test('getAvgDuration uses plain >= comparison', async () => {
await getAvgDuration('2026-04-14T00:00:00Z');
const { sql, params } = getCallArgs(0);
expect(sql).toContain('started_at >= $1');
expect(sql).not.toContain('datetime(');
expect(params).toEqual(['2026-04-14T00:00:00Z']);
});
});

describe('empty result', () => {
test('getCostByWorkflow returns [] when no rows', async () => {
const result = await getCostByWorkflow('2026-04-14T00:00:00Z');
expect(result).toEqual([]);
});

test('getDailyCosts returns [] when no rows', async () => {
const result = await getDailyCosts('2026-04-14T00:00:00Z');
expect(result).toEqual([]);
});

test('getAvgDuration returns 0 when avg_seconds is null', async () => {
mockQuery.mockResolvedValueOnce(createQueryResult([{ avg_seconds: null }]));
const result = await getAvgDuration('2026-04-14T00:00:00Z');
expect(result).toBe(0);
});

test('getAvgDuration returns 0 when result has no rows', async () => {
const result = await getAvgDuration('2026-04-14T00:00:00Z');
expect(result).toBe(0);
});

test('getAvgDuration returns 0 when avg_seconds is not finite', async () => {
mockQuery.mockResolvedValueOnce(createQueryResult([{ avg_seconds: 'not-a-number' }]));
const result = await getAvgDuration('2026-04-14T00:00:00Z');
expect(result).toBe(0);
});
});

describe('getAvgDuration clock-skew exclusion', () => {
test('SQL excludes rows with missing or earlier completed_at', async () => {
await getAvgDuration('2026-04-14T00:00:00Z');
const { sql } = getCallArgs(0);
expect(sql).toContain('completed_at IS NOT NULL');
expect(sql).toContain('completed_at >= started_at');
});
});

describe('sort ordering', () => {
test('getCostByWorkflow sorts by cost_usd DESC', async () => {
await getCostByWorkflow('2026-04-14T00:00:00Z');
const { sql } = getCallArgs(0);
expect(sql).toMatch(/ORDER BY cost_usd DESC/i);
});

test('getDailyCosts sorts by date ASC', async () => {
await getDailyCosts('2026-04-14T00:00:00Z');
const { sql } = getCallArgs(0);
expect(sql).toMatch(/ORDER BY date ASC/i);
});
});

describe('type coercion', () => {
test('getCostByWorkflow coerces string count and cost to numbers', async () => {
mockQuery.mockResolvedValueOnce(
createQueryResult([
{ workflow_name: 'foo', status: 'completed', run_count: '5', cost_usd: '1.25' },
])
);
const result = await getCostByWorkflow('2026-04-14T00:00:00Z');
expect(result[0]).toEqual({
workflow_name: 'foo',
status: 'completed',
run_count: 5,
cost_usd: 1.25,
});
expect(typeof result[0].run_count).toBe('number');
expect(typeof result[0].cost_usd).toBe('number');
});

test('getDailyCosts coerces string count and cost to numbers', async () => {
mockQuery.mockResolvedValueOnce(
createQueryResult([{ date: '2026-04-14', run_count: '3', cost_usd: '0.75' }])
);
const result = await getDailyCosts('2026-04-14T00:00:00Z');
expect(result[0]).toEqual({ date: '2026-04-14', run_count: 3, cost_usd: 0.75 });
expect(typeof result[0].run_count).toBe('number');
expect(typeof result[0].cost_usd).toBe('number');
});
});
});
Loading
Loading