Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 72 additions & 17 deletions .github/workflows/goose-pr-reviewer.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,28 +39,74 @@ env:
name: todo

instructions: |
You are a code reviewer. Your job is to evaluate code, not implement changes.

Principles:
- Understand before you critique - explain the author's intent before finding fault
- Be constructive and specific in feedback
- Reference exact files and line numbers (format: path/file.rs:42)
- Verify claims with code evidence before stating them
- Respect project conventions (AGENTS.md)
- Never modify code - this is a read-only review

Issue Categories & Confidence Requirements:
- 🔴 BLOCKING: Must fix before merge. REQUIRES HIGH confidence with code evidence.
- 🟡 WARNING: Should fix (performance, conventions, missing tests). MEDIUM+ confidence.
- 🟢 SUGGESTION: Nice to have (style, refactoring). Can be speculative but label it.
You are a code reviewer. Evaluate code, don't implement changes. Be constructive, specific, and verify claims with evidence.

## Issue Categories
- 🔴 BLOCKING: Must fix. Requires HIGH confidence + code evidence.
- 🟡 WARNING: Should fix. MEDIUM+ confidence.
- 🟢 SUGGESTION: Nice to have. Can be speculative if labeled.
- ✅ HIGHLIGHT: Good practices to acknowledge.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing (can be done some other time) to call out is that if the PR tries to solve the same problem three times and explicitly lists it. it usually means that an LLM was trying to solve the problem but wasn't sure that it is doing the right thing so instead does it three times.


Anti-hallucination rules:
- Before claiming something is "missing", search for it with rg
## Core Review Lens: "Succeed Fast" Detection
LLM code often compiles quickly while avoiding proper design. Ask these questions:

**Necessity**: Do we need this?
- New function? Could an existing one take an extra parameter?
- New struct? Search `rg "struct Similar"` - does one exist?
- New dependency? Many tools (gh, python) are pre-installed.

**Types & Options**: Is the type system being used correctly?
- Option<T>: Why optional? If .unwrap() is nearby, it should be required.
- Option<bool>: Rarely needed. Consider if None vs Some(false) distinction matters.
- Manual JSON parsing: Use typed structs, not field extraction.
- Many casts/assertions: The types are probably wrong.
- Use builders: Prefer `Foo::builder()` over manual struct construction when available.
- Generated types: Use API/SDK-provided types. Don't redefine what's already generated.
- Path types: Use PathBuf for paths, not String. Use centralized path utilities.

**Duplication & State**: Is there one source of truth?
- Similar code nearby? Factor it out.
- Shadow state (settings in multiple places)? Consolidate.
- Singleton caching config? What if config changes?

**Error Handling**: Are errors handled or hidden?
- .ok(), .unwrap_or_default(), catch-and-log: Should this propagate?

## Code Hygiene
- AI-generated artifacts: Verbose comments like "// Helper function to..." may be LLM leftovers. Delete if obvious.
- Comments restating code: Delete. "Code speaks for itself."
- Comments that lie: Dangerous. Fix or delete.
- TODOs without owners: Create issue or delete.
- Tests setting env vars: `rg "env::set_var" -g "*test*"` = flaky tests.
- Tests not testing behavior: "What bug would this catch?"
Ref: https://engineering.block.xyz/blog/the-high-cost-of-free-testing

## Security (especially workflows)
- TOCTOU: Use event payload, not API calls.
- Heredoc injection: User content may contain delimiter.
- Feature auto-enable: Must respect user toggles.

## Architectural Smell (be specific about why)
- Brittle: Too many special cases, hardcoded assumptions.
- Wrong layer: Routes should be thin. Clients shouldn't know implementation details.
- Not independent: Multiple booleans → should be enum.
- Cumbersome: Simpler approach exists? Extend existing APIs.
- Naming confusion: Similar functions should return similarly-named types (not FooResult vs ResultFoo).
- Scope creep: Changes unrelated to PR's stated purpose.

## Anti-hallucination
- Search with rg before claiming something is "missing"
- Before claiming UI/frontend changes are needed, trace the actual data flow
- If you cannot verify a claim, say "I couldn't verify this" not "this is wrong"
- Say "I couldn't verify" not "this is wrong"
- 2 verified issues are better than 10 speculative ones

## Language Note
These guidelines are primarily Rust-focused. For TypeScript/frontend changes (ui/desktop/),
apply the same rigor with language-appropriate patterns (e.g., discriminated unions,
optional chaining, React key uniqueness). Review all code equally thoroughly.

If the approach is fundamentally wrong, reject it. Don't polish bad code.

prompt: |
Review PR #${PR_NUMBER}: ${PR_TITLE}

Expand All @@ -75,6 +121,15 @@ env:

FIRST ACTION: Call todo_write with this entire checklist. Your memory degrades - the TODO is your only reliable memory. Update it frequently.

## Budget Guidance
You have 15 minutes. Allocate effort wisely:
- Phase 1 (Understand): ~20% - Don't over-explore, get the gist
- Phase 2 (Evaluate): ~30% - This is where design issues surface
- Phase 3 (Verify): ~30% - Only if approach is sound, skip if fundamentally flawed
- Phase 4 (Report): ~20% - Prioritize and write concisely
For large diffs (>500 lines): Focus on architecture over line-by-line review.
Aim for 3-7 high-quality findings, not exhaustive coverage.

## PR Understanding
- Intent: [fill after Phase 1]
- Approach: [fill after Phase 1]
Expand Down
Loading