Skip to content

Conversation

@tlongwell-block
Copy link
Collaborator

Analyzed 2,341 lines of review comments from 6 goose engineers across 86 PRs to identify what experienced reviewers actually flag.

Changes

  • Added "Succeed Fast" detection for LLM code anti-patterns (Option, widened types, error swallowing)
  • Added security checks (TOCTOU, heredoc injection)
  • Added code hygiene rules (AI-generated comments, lying comments)
  • Added budget guidance for time allocation

@tlongwell-block
Copy link
Collaborator Author

/goose this review is a bit meta, but I trust you can handle it impartially

@github-actions
Copy link
Contributor

Summary: This PR improves the goose PR reviewer workflow by adding domain-specific review heuristics derived from analyzing 2,341 review comments from 6 goose engineers. The changes are well-structured, align with AGENTS.md conventions, and add practical value. One guidance item is too absolute.

🟡 Warnings

  1. .github/workflows/goose-pr-reviewer.yml:46 - "Option: Never" is overly absolute

    The guidance states:

    - Option<bool>: Never. Booleans default to false.
    

    However, the goose codebase itself uses Option<bool> for legitimate 3-state configuration patterns where:

    • None = inherit/use default
    • Some(true) = explicitly enabled
    • Some(false) = explicitly disabled

    Evidence from crates/goose/src/recipe/recipe_extension_adapter.rs:22:

    #[serde(default)]
    bundled: Option<bool>,

    This is common in config deserialization where you need to distinguish "not specified" from "explicitly set to false".

    Suggestion: Soften to: Option<bool>: Rarely needed. Consider if None vs Some(false) distinction matters. Usually booleans should default to false.

🟢 Suggestions

  1. Removed UI/frontend anti-hallucination rule - The previous instructions included "Before claiming UI/frontend changes are needed, trace the actual data flow". This was useful guidance for reviews involving ui/desktop/ (TypeScript/Electron). Consider retaining it or adding a note that guidance is Rust-centric.

  2. Language-centric guidance - Items like PathBuf for paths, Option<T>, and typed structs are Rust-specific. For PRs touching ui/desktop/, the reviewer might be confused. Consider adding a brief note that guidance is primarily for Rust code.

✅ Highlights

  • Budget guidance is well-calibrated: The 15-minute budget with phase allocations (20%/30%/30%/20%) aligns perfectly with the workflow's timeout-minutes: 15 setting.

  • Code Hygiene section aligns with AGENTS.md: The "Comments restating code: Delete" guidance matches AGENTS.md's "Never add comments that restate what code does".

  • Security section is relevant and well-placed: The TOCTOU and heredoc injection warnings address real workflow security concerns. The workflow header already warns about TOCTOU attacks, and this reinforces it in the review guidance.

  • Test flakiness guidance is practical: The rg "env::set_var" -g "*test*" pattern is useful - the codebase has this pattern in crates/goose/tests/mcp_integration_test.rs.

  • External reference is valid: The link to https://engineering.block.xyz/blog/the-high-cost-of-free-testing works and adds context for test quality guidance.

  • "Succeed Fast" framing is insightful: Naming the anti-pattern helps reviewers recognize when LLM-generated code compiles but avoids proper design.


Review generated by goose

@tlongwell-block
Copy link
Collaborator Author

/goose this review is a bit meta, but I trust you can handle it impartially

@github-actions
Copy link
Contributor

Summary: This PR improves the goose PR reviewer workflow by adding structured guidance derived from empirical analysis of 2,341 review comments from 6 goose engineers across 86 PRs. The changes are well-aligned with existing project conventions and add valuable detection patterns for LLM-generated code issues. Recommend approval.

🟢 Suggestions

  1. Consider clarifying TOCTOU guidance (.github/workflows/goose-pr-reviewer.yml:85)

    The guidance says "TOCTOU: Use event payload, not API calls" which is correct as a general principle. However, the workflow itself uses gh pr diff (line 262) because the event payload doesn't include the full diff content. This is a necessary exception. Consider adding a brief note like "when the data is available in the payload" to prevent confusion when reviewers see this guidance while reviewing workflows that legitimately need API calls.

  2. Minor wording change (.github/workflows/goose-pr-reviewer.yml:101)

    Changed from "2 verified issues are better than 10 speculative ones" to "3-7 verified issues > 10 speculative ones". The new range is reasonable but the original phrasing ("better than") was clearer. The > symbol could be misread. Minor nit.

✅ Highlights

  • "Succeed Fast" Detection (lines 50-73): The concept of detecting LLM code that "compiles quickly while avoiding proper design" is novel and valuable. The specific patterns (Option, .unwrap() near Option, manual JSON parsing) directly target common AI-generated code smells.

  • AGENTS.md Alignment: The new guidance aligns well with existing project conventions:

    • Option<bool> guidance matches AGENTS.md's "Booleans should default to false, not be optional"
    • Comment hygiene rules match AGENTS.md's "Never add comments that restate what code does"
  • Budget Guidance (lines 124-131): Adding explicit time allocation helps reviewers prioritize and avoid rabbit holes. The "skip Phase 3 if fundamentally flawed" advice is pragmatic.

  • External Reference: Linking to engineering.block.xyz/blog/the-high-cost-of-free-testing adds credibility and allows reviewers to dive deeper if interested.

  • Language Note (lines 103-106): Acknowledging that TypeScript/frontend code needs equivalent rigor with appropriate patterns (discriminated unions, etc.) prevents the Rust-centric guidance from being misapplied.

  • Evidence-based approach: The PR description states this was derived from analysis of 2,341 real review comments, which gives confidence that the guidance reflects actual review practices rather than theoretical ideals.


Review generated by goose

@tlongwell-block tlongwell-block marked this pull request as ready for review December 23, 2025 19:51
- 🔴 BLOCKING: Must fix. Requires HIGH confidence + code evidence.
- 🟡 WARNING: Should fix. MEDIUM+ confidence.
- 🟢 SUGGESTION: Nice to have. Can be speculative if labeled.
- ✅ HIGHLIGHT: Good practices to acknowledge.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing (can be done some other time) to call out is that if the PR tries to solve the same problem three times and explicitly lists it. it usually means that an LLM was trying to solve the problem but wasn't sure that it is doing the right thing so instead does it three times.

@tlongwell-block tlongwell-block merged commit 5835c8f into main Jan 6, 2026
20 checks passed
@tlongwell-block tlongwell-block deleted the pr_rev_improv branch January 6, 2026 17:39
zanesq added a commit that referenced this pull request Jan 7, 2026
* 'main' of github.com:block/goose:
  Claude 3.7 is out. we had some harcoded stuff (#6197)
  Release 1.19.0
  chore: upgrade to node v24 as engine (#6361)
  chore(deps): bump rsa from 0.9.9 to 0.9.10 (#6358)
  Bump rust toolchain to 1.92 (current stable) (#6356)
  Hide advanced recipe options under expandable content (#6021)
  fix: use .config/agents (plural) for skills directory (#6357)
  fix: prevent KaTeX from treating underscores as subscripts in plain text (#6242)
  fix: make goose review PRs more like goose contributors do (#6240)
  fix : preserve provider engine type when editing custom providers (#6106)
  feat(providers): add retry for model fetching (#6347)
  allow goose issue solver to react to activation comments (#6239)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants