diff --git a/.archon/commands/maintainer-review-code-review.md b/.archon/commands/maintainer-review-code-review.md new file mode 100644 index 0000000000..eca2c2cfed --- /dev/null +++ b/.archon/commands/maintainer-review-code-review.md @@ -0,0 +1,125 @@ +--- +description: Review the PR for code quality, CLAUDE.md compliance, project conventions, and bugs (Pi-tuned) +argument-hint: (no arguments — reads PR data and writes findings artifact) +--- + +# Maintainer Review — Code Review + +You are a focused code reviewer for one GitHub PR. **Always run** for every PR that passes the gate. Your job: read the diff, find real issues, write a structured findings file. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +### Read the PR number + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +``` + +### Read the project's rules + +Read the repo's `CLAUDE.md` (project-level). It's the source of truth for engineering principles, type-safety rules, eslint policy, error-handling conventions, and forbidden patterns. + +### Read the gate decision + +```bash +cat $ARTIFACTS_DIR/gate-decision.md +``` + +The gate already classified direction/scope. Don't re-litigate that here. Focus on **code quality** within the scope the gate accepted. + +### Read the PR diff + +```bash +gh pr diff $PR_NUMBER +``` + +If the diff is too large to reason about cleanly, sample: read the diff against each changed file individually with `gh pr diff $PR_NUMBER -- `. + +--- + +## Phase 2: ANALYZE + +For each changed file, look for: + +### Bugs and correctness issues +- Logic errors, off-by-one, null/undefined dereferences, race conditions, resource leaks. +- Incorrect or missing error handling. Silent catches that swallow errors. +- API misuse (wrong types, wrong arguments, deprecated calls). +- Concurrency bugs in async code. + +### CLAUDE.md compliance +- TypeScript: explicit return types? No `any` without justification? +- Imports: typed imports for types? Namespace imports for submodules? +- Logging: structured Pino with `{domain}.{action}_{state}` event names? +- Error handling: errors surfaced, not swallowed? `classifyIsolationError` used where appropriate? +- Database: rowCount checks on UPDATEs? Errors logged with context? +- Workflow: schema rules followed? `output_format` for `when:` consumers? + +### Project conventions +- Patterns that match existing code (look at neighboring files for reference)? +- Naming, structure, and organization aligned with the rest of the package? +- Cross-package boundaries respected (no `import * from '@archon/core'`, etc.)? + +### Bug-likelihood signals +- New conditional branches without tests? +- Hardcoded values that should be configurable? +- TODO / FIXME / HACK / XXX comments left in? + +--- + +## Phase 3: WRITE FINDINGS + +Write `$ARTIFACTS_DIR/review/code-review-findings.md` with this structure: + +```markdown +# Code Review — PR # + +## Summary +<1-2 sentences. State the overall verdict: ready-to-merge / minor-fixes-needed / blocking-issues.> + +## Findings + +### CRITICAL +- ****: + - **Why it matters**: + - **Suggested fix**: + +### HIGH +- (same format) + +### MEDIUM +- (same format) + +### LOW / NITPICK +- (same format — combine if many) + +## CLAUDE.md compliance + + +## Notes for synthesizer + +``` + +If you find nothing to flag, write the file with `## Findings\n\nNone — code looks clean.` and stop. Don't manufacture issues. + +--- + +## Phase 4: RETURN + +Return a single line summary as your response: + +``` +Code review complete. CRITICAL, HIGH, MEDIUM, LOW findings. Verdict: . +``` + +Don't return the full findings — those live in the artifact. Synthesizer reads the file. + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/code-review-findings.md` written. +- [ ] Each finding has a file path, line number when applicable, and a concrete fix. +- [ ] No invented issues. If clean, say "None." +- [ ] Single-line summary returned. diff --git a/.archon/commands/maintainer-review-comment-quality.md b/.archon/commands/maintainer-review-comment-quality.md new file mode 100644 index 0000000000..17861fed64 --- /dev/null +++ b/.archon/commands/maintainer-review-comment-quality.md @@ -0,0 +1,95 @@ +--- +description: Review the PR's added/modified comments and docstrings for accuracy, value, and long-term maintainability (Pi-tuned) +argument-hint: (no arguments — reads PR data and writes findings artifact) +--- + +# Maintainer Review — Comment Quality + +You are a comment / docstring reviewer. Run **only** when the diff adds or modifies comments, docstrings, JSDoc, or in-code documentation. Your job: keep the code's comments truthful, valuable, and unlikely to rot. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +gh pr diff $PR_NUMBER +``` + +Read the project's comment policy in `CLAUDE.md`: +- Default to writing **no comments**. +- Only add when the **WHY** is non-obvious (hidden constraint, subtle invariant, workaround). +- Don't explain WHAT (well-named identifiers do that). +- Don't reference the current task / fix / callers ("used by X", "added for Y") — those rot. +- Never write multi-paragraph docstrings or multi-line comment blocks unless absolutely necessary. + +--- + +## Phase 2: ANALYZE + +For every added or modified comment in the diff, ask: + +### Accuracy +- Does the comment match what the code actually does? +- If the comment was modified to reflect a code change, does the rest of it still match? + +### Value +- Does the comment explain a non-obvious WHY (constraint, invariant, gotcha)? +- Or does it restate WHAT the code does? (Restating WHAT = comment rot risk.) +- Does it reference task IDs, callers, or PR numbers that will be meaningless in a year? + +### Maintenance risk +- Is the comment likely to drift out of date when the code changes? +- Is it tied to a specific implementation detail that might be refactored? + +### Style +- One short line preferred. Multi-line blocks only when truly necessary. +- No trailing summaries that just describe the next line. + +--- + +## Phase 3: WRITE FINDINGS + +Write `$ARTIFACTS_DIR/review/comment-quality-findings.md`: + +```markdown +# Comment Quality Review — PR # + +## Summary +<1-2 sentences. Comment quality: good / minor-issues / significant-rot-risk.> + +## Findings + +### HIGH — inaccurate comments (don't match the code) +- ****: + - **Suggested fix**: + +### MEDIUM — comment rot risk +- (same format — references that will rot, restated-what-not-why, multi-paragraph fluff) + +### LOW — style / consistency +- (same format) + +## Comments that are actually valuable + + +## Notes for synthesizer + +``` + +If comments are clean, write `## Findings\n\nComments are accurate and capture non-obvious WHY where present.` and stop. + +--- + +## Phase 4: RETURN + +``` +Comment-quality review complete. HIGH, MEDIUM, LOW findings. Quality: . +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/comment-quality-findings.md` written. +- [ ] Each HIGH cites the exact comment text and the code it disagrees with. +- [ ] Don't flag every short comment — many are intentionally brief. diff --git a/.archon/commands/maintainer-review-docs-impact.md b/.archon/commands/maintainer-review-docs-impact.md new file mode 100644 index 0000000000..4ed5b64085 --- /dev/null +++ b/.archon/commands/maintainer-review-docs-impact.md @@ -0,0 +1,118 @@ +--- +description: Review whether the PR's user-facing changes (APIs, CLI flags, env vars, behavior) are reflected in documentation (Pi-tuned) +argument-hint: (no arguments — reads PR data and writes findings artifact) +--- + +# Maintainer Review — Docs Impact + +You are a docs-impact reviewer. Run **only** when the diff adds, removes, or renames public APIs, CLI flags, environment variables, or other user-facing behavior. Your job: catch missing or stale documentation. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +gh pr diff $PR_NUMBER +``` + +Find docs locations: + +```bash +ls packages/docs-web/src/content/docs/ 2>/dev/null +ls docs/ 2>/dev/null +ls README.md CONTRIBUTING.md CLAUDE.md 2>/dev/null +``` + +The project's docs site is at `packages/docs-web/` (Starlight). User-facing docs published to archon.diy. Repo-level docs include `CLAUDE.md`, `CONTRIBUTING.md`, and any `docs/` content. + +--- + +## Phase 2: ANALYZE + +For each user-facing change in the diff, identify the docs that should be updated: + +### What counts as user-facing +- New CLI command or flag (in `packages/cli/`). +- New environment variable. +- New / removed / renamed API route (in `packages/server/src/routes/`). +- New workflow node type, command file, or workflow YAML field. +- New configuration field in `.archon/config.yaml`. +- Change in default behavior that an existing user would notice. + +### What doesn't +- Internal refactors with no API change. +- Test-only changes. +- Bug fixes that restore documented behavior. + +### For each user-facing change + +- **New surface**: is there a docs page describing it? Is it linked from a landing page or the relevant section? +- **Changed surface**: are existing docs pages still accurate? Do they need updates? +- **Removed surface**: are existing references stale? `grep` the docs site for old name. +- **Migration**: does a breaking change need a migration note in CHANGELOG.md or docs? + +### Specific places to check +- `packages/docs-web/src/content/docs/getting-started/` — quickstart, install, concepts. +- `packages/docs-web/src/content/docs/guides/` — workflow authoring, hooks, MCP, scripts. +- `packages/docs-web/src/content/docs/reference/` — CLI, variables, configuration. +- `packages/docs-web/src/content/docs/adapters/` — Slack, Telegram, GitHub, Discord, Web. +- `packages/docs-web/src/content/docs/deployment/` — Docker, cloud. +- `CHANGELOG.md` — Keep-a-Changelog entry for user-visible changes. +- `CLAUDE.md` — only if the change affects how *agents* working in this repo should behave. + +--- + +## Phase 3: WRITE FINDINGS + +Write `$ARTIFACTS_DIR/review/docs-impact-findings.md`: + +```markdown +# Docs Impact Review — PR # + +## Summary +<1-2 sentences. Docs status: in-sync / minor-gaps / significant-gaps.> + +## User-facing changes detected +- (file:line) +- + +## Findings + +### CRITICAL — missing docs for new public surface +- ****: + - **Where to add**: + - **What to write**: + +### HIGH — stale docs from changed/removed surface +- (same format) + +### MEDIUM — minor gaps (changelog entry, examples) +- (same format) + +### LOW — nice-to-have polish +- (same format) + +## Pages that look in-sync + + +## Notes for synthesizer + +``` + +If no user-facing changes, write `## Findings\n\nNo user-facing changes — no docs updates needed.` and stop. + +--- + +## Phase 4: RETURN + +``` +Docs-impact review complete. CRITICAL, HIGH, MEDIUM, LOW findings. Status: . +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/docs-impact-findings.md` written. +- [ ] Each CRITICAL/HIGH names a specific doc file path and what's missing. +- [ ] Internal-only changes don't generate findings. diff --git a/.archon/commands/maintainer-review-error-handling.md b/.archon/commands/maintainer-review-error-handling.md new file mode 100644 index 0000000000..b45b82a0af --- /dev/null +++ b/.archon/commands/maintainer-review-error-handling.md @@ -0,0 +1,94 @@ +--- +description: Review the PR for error-handling correctness — surfaced errors, no silent swallows, consistent error patterns (Pi-tuned) +argument-hint: (no arguments — reads PR data and writes findings artifact) +--- + +# Maintainer Review — Error Handling + +You are an error-handling-focused reviewer. Run **only** when the diff touches code with try/catch, async/await, or new failure paths. Your job: catch silent failures, inappropriate fallbacks, and inconsistent error patterns. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +gh pr diff $PR_NUMBER +``` + +Read the project's error-handling principles in `CLAUDE.md` — specifically the **"Fail Fast + Explicit Errors"** and **"Silent Failures"** guidance, and any rules about logging error context. + +--- + +## Phase 2: ANALYZE + +For every `try/catch`, `async/await`, error path, or fallback in the diff, ask: + +### Silent-failure risks +- Is an error caught and ignored without logging? +- Is a fallback returned that hides the actual problem from the caller? +- Is a `try` block too broad, catching errors that should propagate? +- Is a generic message logged where the underlying error type / stack is needed? + +### Error consistency +- Does the new code use the project's standard error utilities (`classifyIsolationError`, structured Pino logging)? +- Are error events named per the `{domain}.{action}_{state}` convention? +- Are errors thrown with enough context (id, operation, parameters)? + +### Promise / async correctness +- Unhandled promise rejections? Missing `await`? +- `Promise.all` vs `Promise.allSettled` — is the choice intentional? +- Cancellation / timeout handling correct? + +### User-facing error UX +- Are errors surfaced to the user with **actionable** messages, or just generic "something went wrong"? +- For platform adapters: does the error reach the chat / web UI? + +--- + +## Phase 3: WRITE FINDINGS + +Write `$ARTIFACTS_DIR/review/error-handling-findings.md`: + +```markdown +# Error Handling Review — PR # + +## Summary +<1-2 sentences. Overall risk level: low / medium / high.> + +## Findings + +### CRITICAL — silent failures +- ****: + - **Why it matters**: + - **Suggested fix**: + +### HIGH — inconsistent error patterns +- (same format) + +### MEDIUM — context / actionability +- (same format) + +### LOW / NITPICK +- (same format) + +## Notes for synthesizer + +``` + +If no error-handling concerns, write `## Findings\n\nNone — error handling is consistent and surfaces failures appropriately.` and stop. + +--- + +## Phase 4: RETURN + +``` +Error-handling review complete. CRITICAL, HIGH, MEDIUM, LOW findings. Risk: . +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/error-handling-findings.md` written. +- [ ] Every CRITICAL/HIGH finding cites a real catch / try / promise / fallback in the diff. +- [ ] No invented issues. If clean, say "None." diff --git a/.archon/commands/maintainer-review-gate.md b/.archon/commands/maintainer-review-gate.md new file mode 100644 index 0000000000..eb01cb4325 --- /dev/null +++ b/.archon/commands/maintainer-review-gate.md @@ -0,0 +1,251 @@ +--- +description: Gate a single PR on direction alignment, scope focus, and PR-template fill quality before any deep review +argument-hint: (no arguments — reads upstream node outputs and writes artifacts) +--- + +# Maintainer Review — Gate + +You are the **gatekeeper** for a single GitHub PR. Your job is to decide whether the PR is worth a comprehensive review or whether the maintainer should politely decline / request a split. You do **not** review code quality here — that happens downstream if you say "review." + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD INPUTS + +Three sources of upstream context, all already gathered. Each is provided inline below — no extra tool calls needed to fetch them. + +### PR data (gh pr view JSON) + +```json +$fetch-pr.output +``` + +### PR diff (truncated to 2500 lines) + +``` +$fetch-diff.output +``` + +### Maintainer context (direction.md, profile.md, prior state, recent briefs) + +```json +$read-context.output +``` + +Inside `read-context.output`: +- `direction` — the project's committed direction.md (what Archon IS / IS NOT, open questions) +- `profile` — the running maintainer's profile.md (role, scope, current focus) +- `prior_state` — last morning-standup state.json (carry_over may already mention this PR) +- `recent_briefs` — last 3 daily briefs (look here if this PR was previously flagged) + +--- + +## Phase 2: EVALUATE THREE GATES + +You're checking three gates. **All three** inform the verdict. + +### Gate A — Direction alignment + +Does the PR align with `direction.md`? + +- **aligned**: PR clearly fits one of the "What Archon IS" clauses, or extends an existing pattern. +- **conflict**: PR clearly violates a "What Archon is NOT" clause. Cite the specific clause (e.g. `direction.md §single-developer-tool`). +- **unclear**: PR raises a question `direction.md` doesn't answer (touches an "Open question" or a new concern). Note it for later direction-doc evolution. + +### Gate B — Scope focus + +Does the PR do **one thing**? + +- **focused**: PR has a single feature, single fix, or single coherent refactor. Size is fine — a 2000-line PR can be focused if it's all one feature. +- **multiple_concerns**: PR mixes 2+ unrelated changes (e.g. "fix the bug + add new feature + bump deps + reformat"). The right action is to ask the contributor to split it. +- **too_broad**: One ostensibly-coherent change but with sprawling collateral edits across unrelated subsystems. Fixable by tighter scope, but currently too much to review. + +To assess scope, look at: +- Diff structure: do the changed files cluster around a single concern, or sprawl? +- Title + body: does the contributor describe one change, or several "while I was here" changes? +- Commit history if visible in `gh pr view`: is the PR a single coherent story, or accreted fixes? + +### Gate C — Template quality + +Was `.github/PULL_REQUEST_TEMPLATE.md` filled in? + +- **good**: All template sections completed thoughtfully (Summary, Validation, Security, Rollback, etc.). +- **partial**: Template structure present but several sections empty or perfunctory ("N/A", "TBD", or single-word answers where prose is expected). +- **empty**: No template, or template skeleton with all sections blank. + +The PR body is in `pr_data.body`. The template lives at `.github/PULL_REQUEST_TEMPLATE.md` — read it if needed to compare structure. + +--- + +## Phase 3: DECIDE VERDICT + +Combine the three gates into a single verdict. + +| Direction | Scope | Template | → Verdict | +|-----------|-------|----------|-----------| +| aligned | focused | good or partial | **review** — proceed to deep review | +| aligned | focused | empty | **review** with note in synthesis to nudge template | +| aligned | multiple_concerns | * | **needs_split** — draft "split this up" comment | +| aligned | too_broad | * | **needs_split** — same | +| conflict | * | * | **decline** — draft polite-decline citing direction clause | +| unclear | * | * | **unclear** — surface to maintainer for manual call | + +When the gate is `unclear`, do NOT draft a decline comment. The maintainer needs to decide. + +When the verdict is `decline` or `needs_split`, draft the comment per Phase 4. + +--- + +## Phase 4: DRAFT THE DECLINE COMMENT (only if verdict in [decline, needs_split]) + +The drafted comment is the **bot's voice** — polite, specific, citing direction.md when relevant, and giving the contributor a clear path forward. + +### Tone rules + +- Open with thanks for the contribution. Always. +- Be **specific** about why — cite the direction.md clause, name the multiple concerns, list the empty template sections. Vague "this isn't a fit" is not acceptable. +- Offer a concrete path forward when one exists (split into PRs A + B + C; pick a different scope; fill in template sections X/Y/Z). +- Include a **3-day reply window**: state the date 3 days from today. If the contributor doesn't reply by then with reasoning to keep the PR open, it will be closed. Don't say "automatically" — the maintainer will close manually. +- No corporate-speak, no emoji, no AI-attribution. + +### Templates by category + +**For `decline` (direction conflict)**: + +```markdown +Thanks for putting this together, @! + +Unfortunately this isn't a direction we're taking with Archon. Specifically, this conflicts with `direction.md §`: . + +If you disagree with that direction call, reply here by **** and we'll discuss. Otherwise this PR will be closed after that date so the queue stays focused. + +For context, the project's stated scope lives at [`.archon/maintainer-standup/direction.md`](../blob/dev/.archon/maintainer-standup/direction.md). Open questions there are fair game for proposals — feel free to raise an issue if you'd like to push for a direction change. +``` + +**For `needs_split` (multiple concerns)**: + +```markdown +Thanks for the work here, @! + +This PR bundles several independent changes: . Each is potentially valuable but reviewing them together makes regressions hard to isolate and reverts hard to scope. + +Could you split this into focused PRs, one per concern? Suggested split: +1. +2. +3. + +If you'd rather discuss the split approach first, reply here by ****. Otherwise this PR will be closed in favor of the split versions after that date. +``` + +**For `needs_split` (too broad / sprawling)**: + +```markdown +Thanks for the contribution, @! + +The change touches a wide range of subsystems () which makes it hard to review as a single unit. Could you tighten the scope — focus on first and split the collateral edits into a follow-up PR? + +If you think the current scope is necessary, reply here by **** with reasoning. Otherwise this PR will be closed after that date so a tighter version can land. +``` + +Adapt the wording. Don't paste the templates verbatim if the situation is more nuanced — they're starting points. + +### Compute DATE-3-DAYS-OUT + +Today is the date in `read-context.output.prior_state.last_run_at` if available, otherwise today's actual date. Add 3 calendar days. Format as `YYYY-MM-DD` (e.g. `2026-04-30`). + +--- + +## Phase 5: WRITE ARTIFACTS + +You **must** write two files using the Write tool before returning your structured output: + +### `$ARTIFACTS_DIR/gate-decision.md` + +Full reasoning for the maintainer's review: + +```markdown +# Gate Decision — PR # + +## Verdict + + +## Direction alignment + + + +## Scope assessment + + + +## Template quality + + + +## Cited direction clauses +- direction.md § +- direction.md § + +## Reasoning +<2-3 sentence summary> + +## Drafted decline comment (if applicable) + + +``` + +### `$ARTIFACTS_DIR/decline-comment.md` + +Only the decline comment body (used directly by the `post-decline` bash node as `--body-file`): + +If verdict is `review` or `unclear`, write a single line: `(no decline comment — verdict was )`. + +If verdict is `decline` or `needs_split`, write the drafted comment in markdown — exactly as it should appear on the PR. + +--- + +## Phase 6: RETURN STRUCTURED OUTPUT + +**This is the final step. After the artifacts are written, your entire response must be ONE JSON object — nothing else.** + +Allowed output shapes (Pi's parser handles either): + +1. **Bare JSON** — preferred: + ``` + {"verdict":"review","direction_alignment":"aligned",...} + ``` + +2. **Fenced JSON** — also fine: + ```` + ```json + {"verdict":"review","direction_alignment":"aligned",...} + ``` + ```` + +**NOT ALLOWED:** +- Prose before the JSON ("Looking at this PR..." / "Here is my analysis..."). +- Prose after the JSON ("This concludes the gate decision."). +- Bullet-point summaries restating fields. +- Markdown headers like `**Gate A**`. +- Any text outside the single JSON object or its fences. + +If you find yourself wanting to explain — that explanation belongs in `$ARTIFACTS_DIR/gate-decision.md`, NOT in your response. + +### Required fields + +- `verdict`: one of `review` / `decline` / `needs_split` / `unclear` +- `direction_alignment`: `aligned` / `conflict` / `unclear` +- `scope_assessment`: `focused` / `multiple_concerns` / `too_broad` +- `template_quality`: `good` / `partial` / `empty` +- `decline_categories`: array of strings, e.g. `["direction"]` or `["scope", "template"]`. Empty array `[]` when verdict is `review` or `unclear`. +- `cited_direction_clauses`: array of strings, e.g. `["direction.md §single-developer-tool"]`. Empty `[]` if none. +- `reasoning`: 1-3 sentence summary (string). + +### CHECKPOINT — before returning + +- [ ] Direction.md was actually read (not assumed). +- [ ] Decline comment cites a specific direction clause OR specific scope concerns OR specific empty template sections — never vague. +- [ ] Decline comment has a concrete `YYYY-MM-DD` 3-day deadline. +- [ ] `$ARTIFACTS_DIR/gate-decision.md` written. +- [ ] `$ARTIFACTS_DIR/decline-comment.md` written (placeholder line if not declining). +- [ ] **Final response is ONE JSON object — no prose, no headers, no bullet summary. Bare JSON or fenced JSON only.** diff --git a/.archon/commands/maintainer-review-report.md b/.archon/commands/maintainer-review-report.md new file mode 100644 index 0000000000..6e892b4f16 --- /dev/null +++ b/.archon/commands/maintainer-review-report.md @@ -0,0 +1,86 @@ +--- +description: Produce the final summary across all branches of maintainer-review-pr (review / decline / unclear) for the workflow log +argument-hint: (no arguments — reads upstream artifacts) +--- + +# Maintainer Review — Final Report + +You are the final reporter. The workflow has finished one of three branches (review / decline / unclear). Your job: produce a one-screen summary that tells the maintainer what just happened and what's pending. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: DETECT WHICH BRANCH RAN + +Check what artifacts exist: + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number 2>/dev/null) +ls $ARTIFACTS_DIR/ +ls $ARTIFACTS_DIR/review/ 2>/dev/null +cat $ARTIFACTS_DIR/gate-decision.md 2>/dev/null | head -30 +``` + +Three possibilities: + +1. **Review branch ran**: `$ARTIFACTS_DIR/review/synthesis.md` exists. +2. **Decline branch ran**: `$ARTIFACTS_DIR/decline-comment.md` exists with non-placeholder content; the post-decline bash node already posted to GitHub. +3. **Unclear branch ran**: gate verdict was `unclear` and the maintainer was prompted to decide manually. + +--- + +## Phase 2: WRITE THE FINAL REPORT + +Write `$ARTIFACTS_DIR/final-report.md`: + +```markdown +# Maintainer Review — PR # — Final + +## Branch taken + + +## Gate decision + + +## Outcome + +### If review branch: +- Synthesized verdict: +- Findings: +- Aspects run: +- **Draft comment**: $ARTIFACTS_DIR/review/review-comment.md (copy-paste or edit before posting to PR) +- **Full synthesis**: $ARTIFACTS_DIR/review/synthesis.md + +### If decline branch: +- Decline categories: +- Cited direction clauses: +- Comment posted to PR: yes +- Reply window: +- Awaiting-author label added: yes/no + +### If unclear branch: +- Gate could not classify confidently. +- Maintainer prompted manually — outcome recorded in approval-gate response. + +## Next steps for the maintainer +<2-3 short bullets. e.g.: +- "Read $ARTIFACTS_DIR/review/review-comment.md and post to PR." +- "Wait for contributor reply by ; if no reply, close PR." +- "Update direction.md to address the open question this PR raised: ".> +``` + +--- + +## Phase 3: RETURN + +Return a single-line outcome: + +``` +PR # — branch=, verdict=, action=. +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/final-report.md` written. +- [ ] Correctly identifies which branch ran (don't pretend the review branch ran when it didn't). +- [ ] Lists concrete next steps for the maintainer. diff --git a/.archon/commands/maintainer-review-synthesize.md b/.archon/commands/maintainer-review-synthesize.md new file mode 100644 index 0000000000..bfdd3abb28 --- /dev/null +++ b/.archon/commands/maintainer-review-synthesize.md @@ -0,0 +1,156 @@ +--- +description: Synthesize findings from all review aspects into a single maintainer-ready review report (Pi-tuned) +argument-hint: (no arguments — reads review/*.md artifacts and writes synthesis) +--- + +# Maintainer Review — Synthesize + +You are the synthesizer. Read all available review-aspect findings, deduplicate overlap, prioritize, and produce a single maintainer-ready review summary plus a draft GitHub comment. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +### PR number +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +``` + +### Read every available review findings file +```bash +ls $ARTIFACTS_DIR/review/ +``` + +Then read each one: +- `code-review-findings.md` (always present if review branch ran) +- `error-handling-findings.md` (present if classifier said yes) +- `test-coverage-findings.md` (present if classifier said yes) +- `comment-quality-findings.md` (present if classifier said yes) +- `docs-impact-findings.md` (present if classifier said yes) + +Some files may be missing — that's expected. Don't error. + +### Read the gate decision (for context) +```bash +cat $ARTIFACTS_DIR/gate-decision.md +``` + +The gate may have noted things ("template was empty — nudge in synthesis"). Carry those notes forward. + +--- + +## Phase 2: AGGREGATE + DEDUPLICATE + +Issues often surface in multiple aspects (e.g. a missing test for an error path shows up in error-handling AND test-coverage). Don't list the same finding twice. Pick the most actionable wording and merge. + +Group findings by **severity** across all aspects, not by aspect: + +- **CRITICAL** (across aspects): merge / blocking / data-loss / silent-failure issues. +- **HIGH**: real bugs, missing test for a fix, missing docs for a new public surface, CLAUDE.md violation. +- **MEDIUM**: edge cases, comment rot risks, minor docs polish. +- **LOW / NITPICK**: style, naming, optional improvements. + +Within each tier, order by file path so the maintainer can scan top-to-bottom. + +--- + +## Phase 3: WRITE THE SYNTHESIS + +Write `$ARTIFACTS_DIR/review/synthesis.md`: + +```markdown +# Maintainer Review — PR # + +## Verdict + + +## Summary +<2-3 sentence overview. What the PR does, what's good, what's blocking.> + +## Findings + +### CRITICAL (N) +- ****: + - From: + - **Suggested fix**: + +### HIGH (N) +- (same format) + +### MEDIUM (N) +- (same format) + +### LOW / NITPICK (N) +- (consolidated) + +## CLAUDE.md compliance + + +## Gate-decision notes + + +## Aspects run +- code-review: +- error-handling: +- test-coverage: +- comment-quality: +- docs-impact: + +## Aspects skipped + +``` + +--- + +## Phase 4: WRITE THE DRAFT PR COMMENT + +Write `$ARTIFACTS_DIR/review/review-comment.md` — this is the markdown body that would be posted to the PR. The maintainer can copy-paste it or hand-edit before posting. + +Format: + +```markdown +## Review Summary + +**Verdict**: + +<2-3 sentence overview written for the PR author, not for the maintainer.> + +### Blocking issues +- (list CRITICAL findings, file:line, fix suggestion) + +### Suggested fixes +- (list HIGH findings) + +### Minor / nice-to-have +- (list MEDIUM + LOW combined) + +### Compliments + + +--- +*Reviewed via maintainer-review-pr workflow (Pi/Minimax). Aspects run: .* +``` + +Tone for the PR comment: +- Address the contributor directly ("you", "your change"). +- Be **specific** — file:line + concrete fix. +- No corporate-speak, no excessive praise, no AI-attribution-by-name (the footer line is enough). + +--- + +## Phase 5: RETURN + +Return a single-line summary: + +``` +Synthesized: . CRITICAL / HIGH / MEDIUM / LOW findings across aspects. Comment drafted at $ARTIFACTS_DIR/review/review-comment.md. +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/synthesis.md` written. +- [ ] `$ARTIFACTS_DIR/review/review-comment.md` written. +- [ ] Findings deduplicated across aspects. +- [ ] Severity ordering correct. +- [ ] Skipped aspects listed with reason. diff --git a/.archon/commands/maintainer-review-test-coverage.md b/.archon/commands/maintainer-review-test-coverage.md new file mode 100644 index 0000000000..5b91c4ef9b --- /dev/null +++ b/.archon/commands/maintainer-review-test-coverage.md @@ -0,0 +1,101 @@ +--- +description: Review the PR for test coverage — does new behavior have tests, are critical paths exercised, do existing tests still cover what they should (Pi-tuned) +argument-hint: (no arguments — reads PR data and writes findings artifact) +--- + +# Maintainer Review — Test Coverage + +You are a test-focused reviewer. Run **only** when the diff touches source code (not pure docs / config / tests). Your job: assess whether the new behavior is properly tested. + +**Workflow ID**: $WORKFLOW_ID + +--- + +## Phase 1: LOAD + +```bash +PR_NUMBER=$(cat $ARTIFACTS_DIR/.pr-number) +gh pr diff $PR_NUMBER +``` + +Read the project's testing conventions in `CLAUDE.md`: +- Mock isolation rules (Bun `mock.module` is process-global; spyOn is preferred for internal modules) +- Per-package test isolation (split bun test invocations to avoid mock pollution) +- `bun run test` (not `bun test` from repo root) + +--- + +## Phase 2: ANALYZE + +For each non-trivial code change, ask: + +### Behavioral coverage +- Is the **happy path** covered? +- Are **edge cases** covered? (Empty input, oversized input, malformed input, concurrent calls, etc.) +- Are **error paths** covered? (Throws when expected, returns null when expected.) +- Is the test asserting on the **right thing**? (Output value? Side effect? Both?) + +### Test quality +- Are tests deterministic? No timing, no real network, no real filesystem unless intentional? +- Mock pollution: does the file use `mock.module()` in a way that conflicts with other test files in the same package? +- Test isolation: does each test set up and tear down its own state? + +### Coverage gaps to flag +- New public function with no test → flag. +- New conditional branch with no test → flag. +- Bug fix without a regression test → flag (the test should fail before the fix). +- New error path with no test → flag. + +### Don't flag +- Trivial getters/setters with no logic. +- Internal helpers tested transitively through public API tests. +- Documentation-only or formatting-only changes. + +--- + +## Phase 3: WRITE FINDINGS + +Write `$ARTIFACTS_DIR/review/test-coverage-findings.md`: + +```markdown +# Test Coverage Review — PR # + +## Summary +<1-2 sentences. Coverage: adequate / minor-gaps / significant-gaps.> + +## Findings + +### CRITICAL — bug fix without regression test +- ****: + - **Suggested test**: + +### HIGH — new behavior without coverage +- (same format) + +### MEDIUM — edge cases / error paths missing +- (same format) + +### LOW — improvements +- (same format) + +## Mock isolation concerns + + +## Notes for synthesizer + +``` + +If coverage is adequate, write `## Findings\n\nAdequate coverage for the changed behavior.` and stop. + +--- + +## Phase 4: RETURN + +``` +Test-coverage review complete. CRITICAL, HIGH, MEDIUM, LOW findings. Coverage: . +``` + +### CHECKPOINT +- [ ] `$ARTIFACTS_DIR/review/test-coverage-findings.md` written. +- [ ] Each CRITICAL/HIGH cites a specific function / branch and proposes a concrete test. +- [ ] No invented gaps. If coverage is good, say so. diff --git a/.archon/workflows/maintainer/maintainer-review-pr.yaml b/.archon/workflows/maintainer/maintainer-review-pr.yaml new file mode 100644 index 0000000000..4cb14b645e --- /dev/null +++ b/.archon/workflows/maintainer/maintainer-review-pr.yaml @@ -0,0 +1,306 @@ +name: maintainer-review-pr +description: | + Use when: Maintainer wants to review a SINGLE PR with direction-and-scope + gating before any deep review. Skips deep review entirely when the PR is + off-direction, too broad, or has multiple concerns; instead drafts a polite- + decline comment for human approval. + Triggers: "maintainer review", "maintainer review pr ", "review and gate", + "should i review this PR", "gate this PR", "review pr as maintainer". + Does: Loads maintainer direction + profile + state -> gates the PR on + direction alignment, scope focus, and PR-template fill -> if review- + worthy, runs comprehensive review (5 parallel review aspects); if + decline-worthy, drafts a polite-decline comment that you approve before + it posts. + Provider: Pi (Minimax M2.7) — runs cheaper than Claude. Each review aspect + is its own Archon node, so Pi handles them as independent calls. + NOT for: Comprehensive review of a PR you've already decided to merge + (use archon-comprehensive-pr-review). Quick triage of all open PRs + (use maintainer-standup). + +provider: pi +model: minimax/MiniMax-M2.7 + +interactive: true # Required for the decline-approval gate + +worktree: + enabled: false # Live checkout — needs to read .archon/maintainer-standup/ + +nodes: + # ═══════════════════════════════════════════════════════════════ + # PHASE 1: EXTRACT PR NUMBER FROM ARGUMENTS + # ═══════════════════════════════════════════════════════════════ + + - id: extract-pr-number + prompt: | + Find the GitHub PR number for this request. + + Request: $ARGUMENTS + + Rules: + - If the message contains an explicit PR number (e.g., "#1428", "PR 1428", "1428"), extract that number. + - If the message contains a PR URL (https://github.com/.../pull/N), extract N. + - If you cannot determine a single PR number, output ERROR. + + CRITICAL: Output ONLY the bare number with no quotes, markdown, or explanation. + Example correct output: 1428 + allowed_tools: [] + idle_timeout: 30000 + + # ═══════════════════════════════════════════════════════════════ + # PHASE 2: GATHER PR DATA + MAINTAINER CONTEXT (parallel) + # ═══════════════════════════════════════════════════════════════ + + - id: fetch-pr + bash: | + PR_NUM=$(echo "$extract-pr-number.output" | tr -d "'\"\`\n " | grep -oE '[0-9]+' | head -1) + if [ -z "$PR_NUM" ]; then + echo "Failed to extract PR number from: $extract-pr-number.output" >&2 + exit 1 + fi + echo "$PR_NUM" > "$ARTIFACTS_DIR/.pr-number" + gh pr view "$PR_NUM" --json number,title,body,labels,comments,reviews,state,mergeable,mergeStateStatus,additions,deletions,changedFiles,files,author,createdAt,updatedAt,baseRefName,headRefName,reviewDecision,reviewRequests,isDraft + depends_on: [extract-pr-number] + timeout: 30000 + + - id: fetch-diff + bash: | + PR_NUM=$(cat "$ARTIFACTS_DIR/.pr-number") + # Cap at 2500 lines to keep prompt size bounded; gate cares about shape, not every line. + gh pr diff "$PR_NUM" 2>/dev/null | head -2500 + depends_on: [fetch-pr] + timeout: 30000 + + - id: read-context + # Reuses the maintainer-standup script — same direction.md / profile.md / + # state.json / recent briefs we want for gate decisions. + script: maintainer-standup-read-context + runtime: bun + timeout: 10000 + depends_on: [extract-pr-number] + + # ═══════════════════════════════════════════════════════════════ + # PHASE 3: GATE — direction + scope + template check + # ═══════════════════════════════════════════════════════════════ + + - id: gate + command: maintainer-review-gate + depends_on: [fetch-pr, fetch-diff, read-context] + context: fresh + output_format: + type: object + properties: + verdict: + type: string + enum: [review, decline, needs_split, unclear] + description: | + 'review' = passes gates, proceed to deep review. + 'decline' = wrong direction; draft polite-decline comment. + 'needs_split' = scope is multiple concerns; draft split-up request. + 'unclear' = gate cannot decide confidently; ask maintainer manually. + direction_alignment: + type: string + enum: [aligned, conflict, unclear] + scope_assessment: + type: string + enum: [focused, multiple_concerns, too_broad] + template_quality: + type: string + enum: [good, partial, empty] + decline_categories: + type: array + items: + type: string + description: e.g. ['direction', 'scope', 'template']. Empty when verdict == 'review'. + cited_direction_clauses: + type: array + items: + type: string + description: | + Specific direction.md clauses cited (e.g., 'direction.md §single-developer-tool'). + Empty when verdict == 'review'. + reasoning: + type: string + description: 1-3 sentences summarizing why this verdict. + required: + - verdict + - direction_alignment + - scope_assessment + - template_quality + - decline_categories + - cited_direction_clauses + - reasoning + + # ═══════════════════════════════════════════════════════════════ + # PHASE 4a: REVIEW BRANCH (verdict == 'review') + # ═══════════════════════════════════════════════════════════════ + + - id: review-classify + prompt: | + Determine which review aspects to run for this PR. + + ## PR Metadata + $fetch-pr.output + + ## Diff (truncated) + $fetch-diff.output + + ## Rules + - **Code review**: ALWAYS run. Mandatory for every PR. + - **Error handling**: Run if diff touches code with try/catch, async/await, or new failure paths. + - **Test coverage**: Run if diff touches source code (not just tests, docs, or config). + - **Comment quality**: Run if diff adds/modifies comments, docstrings, JSDoc, or in-code documentation. + - **Docs impact**: Run if diff adds/removes/renames public APIs, CLI flags, env vars, or user-facing features. + + Provide reasoning for each decision. Output JSON only. + depends_on: [gate] + when: "$gate.output.verdict == 'review'" + allowed_tools: [] + context: fresh + idle_timeout: 60000 + output_format: + type: object + properties: + run_code_review: + type: string + enum: ['true', 'false'] + run_error_handling: + type: string + enum: ['true', 'false'] + run_test_coverage: + type: string + enum: ['true', 'false'] + run_comment_quality: + type: string + enum: ['true', 'false'] + run_docs_impact: + type: string + enum: ['true', 'false'] + reasoning: + type: string + required: + - run_code_review + - run_error_handling + - run_test_coverage + - run_comment_quality + - run_docs_impact + - reasoning + + - id: code-review + command: maintainer-review-code-review + depends_on: [review-classify] + when: "$review-classify.output.run_code_review == 'true'" + context: fresh + + - id: error-handling + command: maintainer-review-error-handling + depends_on: [review-classify] + when: "$review-classify.output.run_error_handling == 'true'" + context: fresh + + - id: test-coverage + command: maintainer-review-test-coverage + depends_on: [review-classify] + when: "$review-classify.output.run_test_coverage == 'true'" + context: fresh + + - id: comment-quality + command: maintainer-review-comment-quality + depends_on: [review-classify] + when: "$review-classify.output.run_comment_quality == 'true'" + context: fresh + + - id: docs-impact + command: maintainer-review-docs-impact + depends_on: [review-classify] + when: "$review-classify.output.run_docs_impact == 'true'" + context: fresh + + - id: synthesize-review + command: maintainer-review-synthesize + depends_on: [code-review, error-handling, test-coverage, comment-quality, docs-impact] + trigger_rule: one_success + context: fresh + + # Auto-post — once the gate said 'review', the deep review is feedback worth + # delivering. No approval required; the maintainer can always edit/delete on + # GitHub. (Approval gates are reserved for the higher-stakes decline branch + # where the comment closes the door on the contribution.) + - id: post-review + bash: | + PR_NUM=$(cat "$ARTIFACTS_DIR/.pr-number") + if [ ! -f "$ARTIFACTS_DIR/review/review-comment.md" ]; then + echo "ERROR: review-comment.md missing — synthesize did not write it" >&2 + exit 1 + fi + gh pr comment "$PR_NUM" --body-file "$ARTIFACTS_DIR/review/review-comment.md" + echo "Posted review comment to PR #$PR_NUM" + depends_on: [synthesize-review] + timeout: 30000 + + # ═══════════════════════════════════════════════════════════════ + # PHASE 4b: DECLINE BRANCH (verdict in ['decline', 'needs_split']) + # ═══════════════════════════════════════════════════════════════ + + - id: approve-decline + approval: + message: | + Gate flagged this PR for polite-decline. Review the gate decision and the + drafted decline comment in the workflow output above (and in + $ARTIFACTS_DIR/gate-decision.md). + + Approve to post the drafted comment to the PR. + Reject with a reason to redraft (max 3 attempts). + capture_response: true + on_reject: + prompt: | + Reviewer feedback on the previous decline draft: + $REJECTION_REASON + + Re-read the gate decision at `$ARTIFACTS_DIR/gate-decision.md` and the + current drafted comment at `$ARTIFACTS_DIR/decline-comment.md`. Revise + the decline comment based on the feedback, then OVERWRITE + `$ARTIFACTS_DIR/decline-comment.md` with the new version. + + Output the revised decline comment as raw markdown — no JSON wrapper. + max_attempts: 3 + depends_on: [gate] + when: "$gate.output.verdict == 'decline' || $gate.output.verdict == 'needs_split'" + + - id: post-decline + bash: | + PR_NUM=$(cat "$ARTIFACTS_DIR/.pr-number") + if [ ! -f "$ARTIFACTS_DIR/decline-comment.md" ]; then + echo "ERROR: decline-comment.md missing — gate command did not write it" >&2 + exit 1 + fi + gh pr comment "$PR_NUM" --body-file "$ARTIFACTS_DIR/decline-comment.md" + # Optional: tag the PR so the morning brief can surface "awaiting author" + gh pr edit "$PR_NUM" --add-label awaiting-author 2>/dev/null || true + echo "Posted decline comment to PR #$PR_NUM" + depends_on: [approve-decline] + timeout: 30000 + + # ═══════════════════════════════════════════════════════════════ + # PHASE 4c: UNCLEAR BRANCH (verdict == 'unclear') + # ═══════════════════════════════════════════════════════════════ + + - id: approve-unclear + approval: + message: | + Gate could not classify this PR confidently. Read the raw gate output + and any artifacts in $ARTIFACTS_DIR/, then decide manually. + + Approve = workflow ends here (no comment posted, no review run). + Reject = same outcome but log your reasoning in the run record. + depends_on: [gate] + when: "$gate.output.verdict == 'unclear'" + + # ═══════════════════════════════════════════════════════════════ + # PHASE 5: FINAL REPORT (whichever branch ran) + # ═══════════════════════════════════════════════════════════════ + + - id: report + command: maintainer-review-report + depends_on: [post-review, post-decline, approve-unclear] + trigger_rule: one_success + context: fresh diff --git a/.archon/workflows/maintainer-standup.yaml b/.archon/workflows/maintainer/maintainer-standup.yaml similarity index 100% rename from .archon/workflows/maintainer-standup.yaml rename to .archon/workflows/maintainer/maintainer-standup.yaml diff --git a/.archon/workflows/repo-triage.yaml b/.archon/workflows/maintainer/repo-triage.yaml similarity index 100% rename from .archon/workflows/repo-triage.yaml rename to .archon/workflows/maintainer/repo-triage.yaml