Skip to content

Enhance pr-finalize skill with code review phase and safety rules#33861

Merged
PureWeen merged 12 commits intomainfrom
pr-finalize-multi-model
Feb 4, 2026
Merged

Enhance pr-finalize skill with code review phase and safety rules#33861
PureWeen merged 12 commits intomainfrom
pr-finalize-multi-model

Conversation

@PureWeen
Copy link
Member

@PureWeen PureWeen commented Feb 3, 2026

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Summary

Enhances the pr-finalize skill with a two-phase workflow that includes code review, and adds critical safety rules preventing agents from approving or blocking PRs.

What This PR Does

  1. Adds Phase 2: Code Review - After verifying title/description, the skill now performs a code review for best practices
  2. Adds critical safety rule - Agents must NEVER use --approve or --request-changes flags
  3. Keeps skill focused - Users can invoke with different models themselves if they want multi-model coverage

Commits

1. 2ad480c77d - Add multi-model code review to pr-finalize skill

Original commit that added the code review phase to pr-finalize.

2. 71eb8c2216 - Add critical rule: agents must NEVER approve/request-changes on PRs

Why: AI agents should never make approval decisions on PRs. That's a human responsibility.

Changes:

  • Added prominent 🚨 CRITICAL warning section at top of skill
  • Added CRITICAL note to copilot-instructions.md
  • Only gh pr review --comment is allowed

3. 335be127cb - Simplify pr-finalize: remove multi-model orchestration, keep code review

Why: After review, decided that multi-model orchestration should be user-controlled, not built into the skill. Users can invoke pr-finalize with different models themselves if they want multi-model coverage.

Changes:

  • Removed 5-model orchestration logic (model table, parallel execution, consensus synthesis)
  • Renamed "Multi-Model Code Review" → "Code Review"
  • Fixed contradiction: removed --approve/--request-changes examples that conflicted with the NEVER rule
  • Kept the code review focus areas and output format
  • Reduced skill from 422 → 373 lines

The Two-Phase Workflow

Phase 1: Title & Description Review

  • Verify PR metadata matches actual implementation
  • Check for required NOTE block
  • Ensure title is searchable and informative

Phase 2: Code Review

  • Review code for best practices
  • Focus on: code quality, error handling, performance, platform concerns, breaking changes
  • Post findings as comment only (never approve/request-changes)

Key Safety Rule

## 🚨 CRITICAL: NEVER Approve or Request Changes

**AI agents must NEVER use `--approve` or `--request-changes` flags.**

| Action | Allowed? | Why |
|--------|----------|-----|
| `gh pr review --approve` |**NEVER** | Approval is a human decision |
| `gh pr review --request-changes` |**NEVER** | Blocking PRs is a human decision |
| `gh pr review --comment` |**OK** | Findings for human review |

Files Changed

File Changes
.github/skills/pr-finalize/SKILL.md Added Phase 2 code review, safety rules, simplified orchestration
.github/copilot-instructions.md Updated skill description, added CRITICAL note

Why Not Multi-Model Orchestration?

The original approach had the skill orchestrate 5 different AI models in parallel. This was removed because:

  1. User control - Users can invoke pr-finalize with model parameter themselves
  2. Simplicity - Skill focuses on what to review, not how to orchestrate
  3. Flexibility - Users choose when multi-model is worth the cost
  4. Avoids contradictions - Original had consensus logic that conflicted with "never approve" rule

Issues Fixed

N/A - Enhancement to agent workflow

- Added Phase 2: Multi-Model Code Review workflow
- Uses 5 models (claude-sonnet-4, claude-opus-4.5, gpt-5.2, gpt-5.2-codex, gemini-3-pro-preview)
- Synthesizes findings by consensus level (4-5 models = critical, 3 = high, 2 = minor)
- Includes output format for posting review comments
- Updated copilot-instructions.md to reflect new capability
@kubaflo kubaflo added the area-ai-agents Copilot CLI agents, agent skills, AI-assisted development label Feb 3, 2026
kubaflo
kubaflo previously approved these changes Feb 3, 2026
Copy link
Contributor

@kubaflo kubaflo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Finalization Review

Phase 1: Title & Description ✅

Title: Accurate and descriptive
Description: Excellent - well-structured with NOTE block, file table, and clear "why" explanation

Phase 2: Multi-Model Code Review Consensus (5 models)

🔴 Critical (5/5 models agree) - Consider Adding

Missing Error Handling Guidance
No documentation for when models fail/timeout. Suggested addition:

### Error Handling
- If 1-2 models fail: Continue with remaining models
- If 3+ models fail: Skip Phase 2, note in output

🟡 High Priority (3/5 models agree) - Nice to Have

  1. Skip criteria for non-code PRs - Save resources on docs-only changes
  2. Define "agreement" - Clarify what constitutes model consensus (same file + issue category)
  3. Large diff handling - Guidance for context limit issues

✅ Positive Feedback (all models)

  • Well-structured two-phase workflow
  • Consensus classification effectively reduces false positives
  • Diverse model selection for broad coverage
  • Practical gh CLI examples

Verdict:APPROVED - The multi-model approach is valuable and well-documented. The suggestions above would enhance robustness but are not blocking for merge.

@kubaflo
Copy link
Contributor

kubaflo commented Feb 3, 2026

📋 PR Finalization Review

Title: ✅ Good

Current: Add multi-model code review to pr-finalize skill

Description: ✅ Excellent

Phase 1: Title & Description ✅

The PR description is well-structured with:


Phase 2: Multi-Model Code Review Consensus (5/5 models responded)

Severity Finding Agreement
🔴 Critical Missing error handling/timeout guidance 5/5 models
🟡 High Add skip criteria for docs-only PRs 3/5 models
🟡 High Define what "agreement" means 3/5 models
🟡 High Large diff/context limit handling 3/5 models
🟢 Minor Model list may become stale 2/5 models
🟢 Minor Add performance expectations (~3-5 min) 2/5 models

✅ Positive Feedback (all models agreed):

  • Well-structured two-phase workflow
  • Consensus classification reduces false positives
  • Diverse model selection for broad coverage
  • Practical gh CLI examples included

Verdict: Approved with suggestions. The multi-model approach is valuable and well-documented. The critical gap (error handling) is an enhancement opportunity, not a blocking issue.

- Add prominent warning section at top of pr-finalize skill
- Add CRITICAL note to copilot-instructions.md

AI agents should only post comments with findings. Approval and
rejection decisions are reserved for human reviewers.
Copilot AI review requested due to automatic review settings February 3, 2026 23:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to enhance the pr-finalize skill by adding a multi-model code review phase that would consult 5 different AI models in parallel to achieve consensus-based code review findings. The goal is to improve code review quality by leveraging multiple AI perspectives before merging PRs.

Changes:

  • Added Phase 2 multi-model review workflow to the pr-finalize skill
  • Updated skill description to reflect new two-phase approach
  • Added critical warning section prohibiting AI agents from approving/blocking PRs

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
.github/skills/pr-finalize/SKILL.md Added 125 lines describing multi-model review workflow including model selection, execution pattern, consensus synthesis, and output formatting
.github/copilot-instructions.md Updated pr-finalize skill description to mention multi-model review capability and added critical warning about approval restrictions

Comment on lines 399 to 416
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section directly contradicts the critical warning at lines 19-32 which states "AI agents must NEVER use --approve or --request-changes flags."

Lines 401-403 instruct the agent to:

  • "Request changes" when critical issues are found
  • "Approve with suggestions" when only minor issues exist
  • "Approve" when no issues are found

Lines 408-412 then provide example commands using --request-changes and --approve flags, which are explicitly forbidden in the CRITICAL section above.

This creates a fundamental contradiction in the skill's instructions where the agent is told both to NEVER approve/request-changes AND to approve/request-changes based on findings.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed - This contradiction was fixed in commit 335be127cb. We removed the approve/request-changes examples and kept only the --comment example, aligning with the CRITICAL rule at lines 19-32.

Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description claims this skill "performs multi-model code review (5 models)" but this functionality references nonexistent AI models (gpt-5.2, claude-sonnet-4, claude-opus-4.5, gpt-5.2-codex, gemini-3-pro-preview) and a nonexistent task tool. The multi-model review workflow described in SKILL.md cannot actually be executed.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed - The multi-model orchestration was removed in commit 335be127cb. The skill no longer references specific models or orchestrates multiple calls. Users who want multi-model review can invoke pr-finalize with different model parameters themselves.

- **Trigger phrases**: "finalize PR #XXXXX", "check PR description for #XXXXX", "review commit message"
- **Used by**: Before merging any PR, when description may be stale
- **Note**: Does NOT require agent involvement or session markdown - works on any PR
- **🚨 CRITICAL**: NEVER use `--approve` or `--request-changes` - only post comments. Approval is a human decision.
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning contradicts the instructions in SKILL.md lines 399-416, which tell agents to use --approve and --request-changes based on review findings. The skill should be updated to align with this critical restriction: agents should only post comments using gh pr review XXXXX --repo dotnet/maui --comment --body "$reviewBody" and never approve or request changes.

Suggested change
- **🚨 CRITICAL**: NEVER use `--approve` or `--request-changes` - only post comments. Approval is a human decision.
- **🚨 CRITICAL**: NEVER use `--approve` or `--request-changes` — always submit feedback using `gh pr review XXXXX --repo dotnet/maui --comment --body "$reviewBody"`. Approval is a human decision.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disagree with suggestion - The current text at line 245 is appropriately concise for an instructions file:

🚨 CRITICAL: NEVER use --approve or --request-changes - only post comments. Approval is a human decision.

The detailed gh pr review command example already exists in SKILL.md where it belongs. Adding the full command here would duplicate information. Instructions files should be reference summaries; skills contain the detailed procedures.

Comment on lines 319 to 330
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description claims "In testing on PR #33687, this approach identified: Event handler memory leak (4/5 models agreed), Potential crash with disjoint date ranges (3/5 models agreed)." However, the multi-model review workflow cannot have been tested because it references nonexistent AI models (gpt-5.2, claude-sonnet-4, claude-opus-4.5, gpt-5.2-codex, gemini-3-pro-preview) and a nonexistent task tool. These testing claims appear to be fabricated or based on a different implementation than what is in the PR.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed - The PR description was updated in commit 335be127cb to reflect the actual changes. The multi-model testing claims were removed along with the multi-model orchestration code.

@PureWeen PureWeen removed the do-not-merge Don't merge this PR label Feb 4, 2026
- Remove 5-model orchestration logic (users can invoke with different models themselves)
- Rename 'Multi-Model Code Review' to 'Code Review'
- Fix contradiction: remove approve/request-changes examples, keep only --comment
- Keep NEVER approve/request-changes rule
- Reduce skill from 422 to 373 lines

Phase 2 now focuses on what to review, not how to orchestrate multiple models.
@PureWeen PureWeen changed the title Add multi-model code review to pr-finalize skill Enhance pr-finalize skill with code review phase and safety rules Feb 4, 2026
kubaflo
kubaflo previously approved these changes Feb 4, 2026
PureWeen and others added 2 commits February 4, 2026 11:19
Introduce a Code Review section to PR finalize comments and update docs and script accordingly. Updated post-pr-finalize-comment.ps1 to accept CodeReviewStatus and CodeReviewFindings, auto-load code-review.md or extract findings from the summary file, build a collapsible Code Review section, and emit warnings when RecommendedTitle/RecommendedDescription are missing. Also extract RecommendedTitle earlier and mark TitleStatus as NeedsUpdate when a recommendation exists. Documentation (ai-summary-comment/SKILL.md) now documents usage, parameters, file layout, and required fields for finalize comments; pr-finalize/SKILL.md clarifies rules (never approve/request changes and never post comments directly).
@PureWeen PureWeen force-pushed the pr-finalize-multi-model branch from eda0e1e to 670f4e4 Compare February 4, 2026 17:19
PureWeen and others added 2 commits February 4, 2026 12:02
Prepend a newline when appending trimmed code review findings or the default message in .github/skills/ai-summary-comment/scripts/post-pr-finalize-comment.ps1 to ensure proper spacing and line breaks in the generated code review section.
@kubaflo kubaflo force-pushed the pr-finalize-multi-model branch from 54a4545 to 680305c Compare February 4, 2026 19:54
@PureWeen PureWeen force-pushed the pr-finalize-multi-model branch from 2d48e88 to d51da80 Compare February 4, 2026 22:05
This was referenced Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai-agents Copilot CLI agents, agent skills, AI-assisted development copilot

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments