Enhance pr-finalize skill with code review phase and safety rules by PureWeen · Pull Request #33861 · dotnet/maui

PureWeen · 2026-02-03T15:35:22Z

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Summary

Enhances the pr-finalize skill with a two-phase workflow that includes code review, and adds critical safety rules preventing agents from approving or blocking PRs.

What This PR Does

Adds Phase 2: Code Review - After verifying title/description, the skill now performs a code review for best practices
Adds critical safety rule - Agents must NEVER use --approve or --request-changes flags
Keeps skill focused - Users can invoke with different models themselves if they want multi-model coverage

Commits

1. `2ad480c77d` - Add multi-model code review to pr-finalize skill

Original commit that added the code review phase to pr-finalize.

2. `71eb8c2216` - Add critical rule: agents must NEVER approve/request-changes on PRs

Why: AI agents should never make approval decisions on PRs. That's a human responsibility.

Changes:

Added prominent 🚨 CRITICAL warning section at top of skill
Added CRITICAL note to copilot-instructions.md
Only gh pr review --comment is allowed

3. `335be127cb` - Simplify pr-finalize: remove multi-model orchestration, keep code review

Why: After review, decided that multi-model orchestration should be user-controlled, not built into the skill. Users can invoke pr-finalize with different models themselves if they want multi-model coverage.

Changes:

Removed 5-model orchestration logic (model table, parallel execution, consensus synthesis)
Renamed "Multi-Model Code Review" → "Code Review"
Fixed contradiction: removed --approve/--request-changes examples that conflicted with the NEVER rule
Kept the code review focus areas and output format
Reduced skill from 422 → 373 lines

The Two-Phase Workflow

Phase 1: Title & Description Review

Verify PR metadata matches actual implementation
Check for required NOTE block
Ensure title is searchable and informative

Phase 2: Code Review

Review code for best practices
Focus on: code quality, error handling, performance, platform concerns, breaking changes
Post findings as comment only (never approve/request-changes)

Key Safety Rule

## 🚨 CRITICAL: NEVER Approve or Request Changes

**AI agents must NEVER use `--approve` or `--request-changes` flags.**

| Action | Allowed? | Why |
|--------|----------|-----|
| `gh pr review --approve` | ❌ **NEVER** | Approval is a human decision |
| `gh pr review --request-changes` | ❌ **NEVER** | Blocking PRs is a human decision |
| `gh pr review --comment` | ✅ **OK** | Findings for human review |

Files Changed

File	Changes
`.github/skills/pr-finalize/SKILL.md`	Added Phase 2 code review, safety rules, simplified orchestration
`.github/copilot-instructions.md`	Updated skill description, added CRITICAL note

Why Not Multi-Model Orchestration?

The original approach had the skill orchestrate 5 different AI models in parallel. This was removed because:

User control - Users can invoke pr-finalize with model parameter themselves
Simplicity - Skill focuses on what to review, not how to orchestrate
Flexibility - Users choose when multi-model is worth the cost
Avoids contradictions - Original had consensus logic that conflicted with "never approve" rule

Issues Fixed

N/A - Enhancement to agent workflow

- Added Phase 2: Multi-Model Code Review workflow - Uses 5 models (claude-sonnet-4, claude-opus-4.5, gpt-5.2, gpt-5.2-codex, gemini-3-pro-preview) - Synthesizes findings by consensus level (4-5 models = critical, 3 = high, 2 = minor) - Includes output format for posting review comments - Updated copilot-instructions.md to reflect new capability

kubaflo

PR Finalization Review

Phase 1: Title & Description ✅

Title: Accurate and descriptive
Description: Excellent - well-structured with NOTE block, file table, and clear "why" explanation

Phase 2: Multi-Model Code Review Consensus (5 models)

🔴 Critical (5/5 models agree) - Consider Adding

Missing Error Handling Guidance
No documentation for when models fail/timeout. Suggested addition:

### Error Handling
- If 1-2 models fail: Continue with remaining models
- If 3+ models fail: Skip Phase 2, note in output

🟡 High Priority (3/5 models agree) - Nice to Have

Skip criteria for non-code PRs - Save resources on docs-only changes
Define "agreement" - Clarify what constitutes model consensus (same file + issue category)
Large diff handling - Guidance for context limit issues

✅ Positive Feedback (all models)

Well-structured two-phase workflow
Consensus classification effectively reduces false positives
Diverse model selection for broad coverage
Practical gh CLI examples

Verdict: ✅ APPROVED - The multi-model approach is valuable and well-documented. The suggestions above would enhance robustness but are not blocking for merge.

kubaflo · 2026-02-03T23:29:21Z

📋 PR Finalization Review

Title: ✅ Good

Current: Add multi-model code review to pr-finalize skill

Description: ✅ Excellent

Phase 1: Title & Description ✅

The PR description is well-structured with:

✅ NOTE block for testing artifacts
✅ Clear sections with headers
✅ File-by-file breakdown with line counts (+125 lines)
✅ "Why Multi-Model?" section with concrete test results (PR [Android] Fix DatePicker MinimumDate/MaximumDate not updating dynamically #33687)
✅ Matches actual implementation

Phase 2: Multi-Model Code Review Consensus (5/5 models responded)

Severity	Finding	Agreement
🔴 Critical	Missing error handling/timeout guidance	5/5 models
🟡 High	Add skip criteria for docs-only PRs	3/5 models
🟡 High	Define what "agreement" means	3/5 models
🟡 High	Large diff/context limit handling	3/5 models
🟢 Minor	Model list may become stale	2/5 models
🟢 Minor	Add performance expectations (~3-5 min)	2/5 models

✅ Positive Feedback (all models agreed):

Well-structured two-phase workflow
Consensus classification reduces false positives
Diverse model selection for broad coverage
Practical gh CLI examples included

Verdict: Approved with suggestions. The multi-model approach is valuable and well-documented. The critical gap (error handling) is an enhancement opportunity, not a blocking issue.

- Add prominent warning section at top of pr-finalize skill - Add CRITICAL note to copilot-instructions.md AI agents should only post comments with findings. Approval and rejection decisions are reserved for human reviewers.

Copilot

Pull request overview

This PR attempts to enhance the pr-finalize skill by adding a multi-model code review phase that would consult 5 different AI models in parallel to achieve consensus-based code review findings. The goal is to improve code review quality by leveraging multiple AI perspectives before merging PRs.

Changes:

Added Phase 2 multi-model review workflow to the pr-finalize skill
Updated skill description to reflect new two-phase approach
Added critical warning section prohibiting AI agents from approving/blocking PRs

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
`.github/skills/pr-finalize/SKILL.md`	Added 125 lines describing multi-model review workflow including model selection, execution pattern, consensus synthesis, and output formatting
`.github/copilot-instructions.md`	Updated pr-finalize skill description to mention multi-model review capability and added critical warning about approval restrictions

Copilot · 2026-02-04T00:00:55Z

.github/skills/pr-finalize/SKILL.md

This section directly contradicts the critical warning at lines 19-32 which states "AI agents must NEVER use --approve or --request-changes flags."

Lines 401-403 instruct the agent to:

"Request changes" when critical issues are found

"Approve with suggestions" when only minor issues exist

"Approve" when no issues are found

Lines 408-412 then provide example commands using --request-changes and --approve flags, which are explicitly forbidden in the CRITICAL section above.

This creates a fundamental contradiction in the skill's instructions where the agent is told both to NEVER approve/request-changes AND to approve/request-changes based on findings.

✅ Already addressed - This contradiction was fixed in commit 335be127cb. We removed the approve/request-changes examples and kept only the --comment example, aligning with the CRITICAL rule at lines 19-32.

Copilot · 2026-02-04T00:00:56Z

.github/copilot-instructions.md

The description claims this skill "performs multi-model code review (5 models)" but this functionality references nonexistent AI models (gpt-5.2, claude-sonnet-4, claude-opus-4.5, gpt-5.2-codex, gemini-3-pro-preview) and a nonexistent task tool. The multi-model review workflow described in SKILL.md cannot actually be executed.

✅ Already addressed - The multi-model orchestration was removed in commit 335be127cb. The skill no longer references specific models or orchestrates multiple calls. Users who want multi-model review can invoke pr-finalize with different model parameters themselves.

Copilot · 2026-02-04T00:00:56Z

.github/copilot-instructions.md

   - **Trigger phrases**: "finalize PR #XXXXX", "check PR description for #XXXXX", "review commit message"
   - **Used by**: Before merging any PR, when description may be stale
   - **Note**: Does NOT require agent involvement or session markdown - works on any PR
+   - **🚨 CRITICAL**: NEVER use `--approve` or `--request-changes` - only post comments. Approval is a human decision.


This warning contradicts the instructions in SKILL.md lines 399-416, which tell agents to use --approve and --request-changes based on review findings. The skill should be updated to align with this critical restriction: agents should only post comments using gh pr review XXXXX --repo dotnet/maui --comment --body "$reviewBody" and never approve or request changes.

Suggested change

- **🚨 CRITICAL**: NEVER use `--approve` or `--request-changes` - only post comments. Approval is a human decision.

- **🚨 CRITICAL**: NEVER use `--approve` or `--request-changes` — always submit feedback using `gh pr review XXXXX --repo dotnet/maui --comment --body "$reviewBody"`. Approval is a human decision.

Disagree with suggestion - The current text at line 245 is appropriately concise for an instructions file:

🚨 CRITICAL: NEVER use --approve or --request-changes - only post comments. Approval is a human decision.

The detailed gh pr review command example already exists in SKILL.md where it belongs. Adding the full command here would duplicate information. Instructions files should be reference summaries; skills contain the detailed procedures.

Copilot · 2026-02-04T00:00:57Z

.github/skills/pr-finalize/SKILL.md

The PR description claims "In testing on PR #33687, this approach identified: Event handler memory leak (4/5 models agreed), Potential crash with disjoint date ranges (3/5 models agreed)." However, the multi-model review workflow cannot have been tested because it references nonexistent AI models (gpt-5.2, claude-sonnet-4, claude-opus-4.5, gpt-5.2-codex, gemini-3-pro-preview) and a nonexistent task tool. These testing claims appear to be fabricated or based on a different implementation than what is in the PR.

✅ Already addressed - The PR description was updated in commit 335be127cb to reflect the actual changes. The multi-model testing claims were removed along with the multi-model orchestration code.

- Remove 5-model orchestration logic (users can invoke with different models themselves) - Rename 'Multi-Model Code Review' to 'Code Review' - Fix contradiction: remove approve/request-changes examples, keep only --comment - Keep NEVER approve/request-changes rule - Reduce skill from 422 to 373 lines Phase 2 now focuses on what to review, not how to orchestrate multiple models.

Introduce a Code Review section to PR finalize comments and update docs and script accordingly. Updated post-pr-finalize-comment.ps1 to accept CodeReviewStatus and CodeReviewFindings, auto-load code-review.md or extract findings from the summary file, build a collapsible Code Review section, and emit warnings when RecommendedTitle/RecommendedDescription are missing. Also extract RecommendedTitle earlier and mark TitleStatus as NeedsUpdate when a recommendation exists. Documentation (ai-summary-comment/SKILL.md) now documents usage, parameters, file layout, and required fields for finalize comments; pr-finalize/SKILL.md clarifies rules (never approve/request changes and never post comments directly).

….ps1

Prepend a newline when appending trimmed code review findings or the default message in .github/skills/ai-summary-comment/scripts/post-pr-finalize-comment.ps1 to ensure proper spacing and line breaks in the generated code review section.

kubaflo added the area-ai-agents Copilot CLI agents, agent skills, AI-assisted development label Feb 3, 2026

kubaflo previously approved these changes Feb 3, 2026

View reviewed changes

PureWeen added the do-not-merge Don't merge this PR label Feb 3, 2026

kubaflo mentioned this pull request Feb 3, 2026

Add critical rule: agents must NEVER approve/request-changes on PRs #33885

Closed

Copilot AI review requested due to automatic review settings February 3, 2026 23:57

kubaflo dismissed their stale review via 71eb8c2 February 3, 2026 23:57

Copilot started reviewing on behalf of kubaflo February 3, 2026 23:58 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

PureWeen removed the do-not-merge Don't merge this PR label Feb 4, 2026

PureWeen changed the title ~~Add multi-model code review to pr-finalize skill~~ Enhance pr-finalize skill with code review phase and safety rules Feb 4, 2026

PureWeen added 2 commits February 3, 2026 23:38

Make Review-PR.ps1 non-interactive by default

211e03b

Add PostSummaryComment and RunFinalize options from copilot-ci

e357fd8

kubaflo previously approved these changes Feb 4, 2026

View reviewed changes

PureWeen added the s/ai-reproduction-confirmed label Feb 4, 2026

Clarify Review-PR.ps1 workflow: 3 sequential phases

a5d082d

PureWeen dismissed kubaflo’s stale review via a5d082d February 4, 2026 15:57

PureWeen force-pushed the pr-finalize-multi-model branch from 7ea809d to 8949edc Compare February 4, 2026 16:54

kubaflo force-pushed the pr-finalize-multi-model branch from 8949edc to eda0e1e Compare February 4, 2026 17:11

PureWeen and others added 2 commits February 4, 2026 11:19

Simplify Phase 3 prompt - let skill handle commenting details

c1ff526

PureWeen force-pushed the pr-finalize-multi-model branch from eda0e1e to 670f4e4 Compare February 4, 2026 17:19

PureWeen and others added 2 commits February 4, 2026 12:02

Fix undefined $ReviewDescription variable in post-pr-finalize-comment…

ac69a12

….ps1

Add newline before appended review findings

680305c

Prepend a newline when appending trimmed code review findings or the default message in .github/skills/ai-summary-comment/scripts/post-pr-finalize-comment.ps1 to ensure proper spacing and line breaks in the generated code review section.

kubaflo force-pushed the pr-finalize-multi-model branch from 54a4545 to 680305c Compare February 4, 2026 19:54

Add explicit NEVER CHECKOUT warning to Review-PR.ps1 prompt

d3d541b

PureWeen force-pushed the pr-finalize-multi-model branch from 2d48e88 to d51da80 Compare February 4, 2026 22:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance pr-finalize skill with code review phase and safety rules#33861

Enhance pr-finalize skill with code review phase and safety rules#33861
PureWeen merged 12 commits intomainfrom
pr-finalize-multi-model

PureWeen commented Feb 3, 2026 •

edited

Loading

Uh oh!

kubaflo left a comment

Uh oh!

kubaflo commented Feb 3, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

PureWeen Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

PureWeen Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

PureWeen Feb 4, 2026

Uh oh!

Copilot AI Feb 4, 2026

Uh oh!

PureWeen Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

	- 🚨 CRITICAL: NEVER use `--approve` or `--request-changes` - only post comments. Approval is a human decision.
	- 🚨 CRITICAL: NEVER use `--approve` or `--request-changes` — always submit feedback using `gh pr review XXXXX --repo dotnet/maui --comment --body "$reviewBody"`. Approval is a human decision.

Conversation

PureWeen commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What This PR Does

Commits

1. 2ad480c77d - Add multi-model code review to pr-finalize skill

2. 71eb8c2216 - Add critical rule: agents must NEVER approve/request-changes on PRs

3. 335be127cb - Simplify pr-finalize: remove multi-model orchestration, keep code review

The Two-Phase Workflow

Phase 1: Title & Description Review

Phase 2: Code Review

Key Safety Rule

Files Changed

Why Not Multi-Model Orchestration?

Issues Fixed

Uh oh!

kubaflo left a comment

Choose a reason for hiding this comment

PR Finalization Review

Phase 1: Title & Description ✅

Phase 2: Multi-Model Code Review Consensus (5 models)

🔴 Critical (5/5 models agree) - Consider Adding

🟡 High Priority (3/5 models agree) - Nice to Have

✅ Positive Feedback (all models)

Uh oh!

kubaflo commented Feb 3, 2026

📋 PR Finalization Review

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

PureWeen Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

PureWeen Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

PureWeen Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

PureWeen Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

PureWeen commented Feb 3, 2026 •

edited

Loading

1. `2ad480c77d` - Add multi-model code review to pr-finalize skill

2. `71eb8c2216` - Add critical rule: agents must NEVER approve/request-changes on PRs

3. `335be127cb` - Simplify pr-finalize: remove multi-model orchestration, keep code review