Enhance pr-finalize skill with code review phase and safety rules#33861
Enhance pr-finalize skill with code review phase and safety rules#33861
Conversation
- Added Phase 2: Multi-Model Code Review workflow - Uses 5 models (claude-sonnet-4, claude-opus-4.5, gpt-5.2, gpt-5.2-codex, gemini-3-pro-preview) - Synthesizes findings by consensus level (4-5 models = critical, 3 = high, 2 = minor) - Includes output format for posting review comments - Updated copilot-instructions.md to reflect new capability
kubaflo
left a comment
There was a problem hiding this comment.
PR Finalization Review
Phase 1: Title & Description ✅
Title: Accurate and descriptive
Description: Excellent - well-structured with NOTE block, file table, and clear "why" explanation
Phase 2: Multi-Model Code Review Consensus (5 models)
🔴 Critical (5/5 models agree) - Consider Adding
Missing Error Handling Guidance
No documentation for when models fail/timeout. Suggested addition:
### Error Handling
- If 1-2 models fail: Continue with remaining models
- If 3+ models fail: Skip Phase 2, note in output🟡 High Priority (3/5 models agree) - Nice to Have
- Skip criteria for non-code PRs - Save resources on docs-only changes
- Define "agreement" - Clarify what constitutes model consensus (same file + issue category)
- Large diff handling - Guidance for context limit issues
✅ Positive Feedback (all models)
- Well-structured two-phase workflow
- Consensus classification effectively reduces false positives
- Diverse model selection for broad coverage
- Practical gh CLI examples
Verdict: ✅ APPROVED - The multi-model approach is valuable and well-documented. The suggestions above would enhance robustness but are not blocking for merge.
📋 PR Finalization ReviewTitle: ✅ GoodCurrent: Description: ✅ ExcellentPhase 1: Title & Description ✅ The PR description is well-structured with:
Phase 2: Multi-Model Code Review Consensus (5/5 models responded)
✅ Positive Feedback (all models agreed):
Verdict: Approved with suggestions. The multi-model approach is valuable and well-documented. The critical gap (error handling) is an enhancement opportunity, not a blocking issue. |
- Add prominent warning section at top of pr-finalize skill - Add CRITICAL note to copilot-instructions.md AI agents should only post comments with findings. Approval and rejection decisions are reserved for human reviewers.
There was a problem hiding this comment.
Pull request overview
This PR attempts to enhance the pr-finalize skill by adding a multi-model code review phase that would consult 5 different AI models in parallel to achieve consensus-based code review findings. The goal is to improve code review quality by leveraging multiple AI perspectives before merging PRs.
Changes:
- Added Phase 2 multi-model review workflow to the pr-finalize skill
- Updated skill description to reflect new two-phase approach
- Added critical warning section prohibiting AI agents from approving/blocking PRs
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
.github/skills/pr-finalize/SKILL.md |
Added 125 lines describing multi-model review workflow including model selection, execution pattern, consensus synthesis, and output formatting |
.github/copilot-instructions.md |
Updated pr-finalize skill description to mention multi-model review capability and added critical warning about approval restrictions |
.github/skills/pr-finalize/SKILL.md
Outdated
There was a problem hiding this comment.
This section directly contradicts the critical warning at lines 19-32 which states "AI agents must NEVER use --approve or --request-changes flags."
Lines 401-403 instruct the agent to:
- "Request changes" when critical issues are found
- "Approve with suggestions" when only minor issues exist
- "Approve" when no issues are found
Lines 408-412 then provide example commands using --request-changes and --approve flags, which are explicitly forbidden in the CRITICAL section above.
This creates a fundamental contradiction in the skill's instructions where the agent is told both to NEVER approve/request-changes AND to approve/request-changes based on findings.
There was a problem hiding this comment.
✅ Already addressed - This contradiction was fixed in commit 335be127cb. We removed the approve/request-changes examples and kept only the --comment example, aligning with the CRITICAL rule at lines 19-32.
.github/copilot-instructions.md
Outdated
There was a problem hiding this comment.
The description claims this skill "performs multi-model code review (5 models)" but this functionality references nonexistent AI models (gpt-5.2, claude-sonnet-4, claude-opus-4.5, gpt-5.2-codex, gemini-3-pro-preview) and a nonexistent task tool. The multi-model review workflow described in SKILL.md cannot actually be executed.
There was a problem hiding this comment.
✅ Already addressed - The multi-model orchestration was removed in commit 335be127cb. The skill no longer references specific models or orchestrates multiple calls. Users who want multi-model review can invoke pr-finalize with different model parameters themselves.
| - **Trigger phrases**: "finalize PR #XXXXX", "check PR description for #XXXXX", "review commit message" | ||
| - **Used by**: Before merging any PR, when description may be stale | ||
| - **Note**: Does NOT require agent involvement or session markdown - works on any PR | ||
| - **🚨 CRITICAL**: NEVER use `--approve` or `--request-changes` - only post comments. Approval is a human decision. |
There was a problem hiding this comment.
This warning contradicts the instructions in SKILL.md lines 399-416, which tell agents to use --approve and --request-changes based on review findings. The skill should be updated to align with this critical restriction: agents should only post comments using gh pr review XXXXX --repo dotnet/maui --comment --body "$reviewBody" and never approve or request changes.
| - **🚨 CRITICAL**: NEVER use `--approve` or `--request-changes` - only post comments. Approval is a human decision. | |
| - **🚨 CRITICAL**: NEVER use `--approve` or `--request-changes` — always submit feedback using `gh pr review XXXXX --repo dotnet/maui --comment --body "$reviewBody"`. Approval is a human decision. |
There was a problem hiding this comment.
Disagree with suggestion - The current text at line 245 is appropriately concise for an instructions file:
🚨 CRITICAL: NEVER use --approve or --request-changes - only post comments. Approval is a human decision.
The detailed gh pr review command example already exists in SKILL.md where it belongs. Adding the full command here would duplicate information. Instructions files should be reference summaries; skills contain the detailed procedures.
.github/skills/pr-finalize/SKILL.md
Outdated
There was a problem hiding this comment.
The PR description claims "In testing on PR #33687, this approach identified: Event handler memory leak (4/5 models agreed), Potential crash with disjoint date ranges (3/5 models agreed)." However, the multi-model review workflow cannot have been tested because it references nonexistent AI models (gpt-5.2, claude-sonnet-4, claude-opus-4.5, gpt-5.2-codex, gemini-3-pro-preview) and a nonexistent task tool. These testing claims appear to be fabricated or based on a different implementation than what is in the PR.
There was a problem hiding this comment.
✅ Already addressed - The PR description was updated in commit 335be127cb to reflect the actual changes. The multi-model testing claims were removed along with the multi-model orchestration code.
- Remove 5-model orchestration logic (users can invoke with different models themselves) - Rename 'Multi-Model Code Review' to 'Code Review' - Fix contradiction: remove approve/request-changes examples, keep only --comment - Keep NEVER approve/request-changes rule - Reduce skill from 422 to 373 lines Phase 2 now focuses on what to review, not how to orchestrate multiple models.
7ea809d to
8949edc
Compare
8949edc to
eda0e1e
Compare
Introduce a Code Review section to PR finalize comments and update docs and script accordingly. Updated post-pr-finalize-comment.ps1 to accept CodeReviewStatus and CodeReviewFindings, auto-load code-review.md or extract findings from the summary file, build a collapsible Code Review section, and emit warnings when RecommendedTitle/RecommendedDescription are missing. Also extract RecommendedTitle earlier and mark TitleStatus as NeedsUpdate when a recommendation exists. Documentation (ai-summary-comment/SKILL.md) now documents usage, parameters, file layout, and required fields for finalize comments; pr-finalize/SKILL.md clarifies rules (never approve/request changes and never post comments directly).
eda0e1e to
670f4e4
Compare
Prepend a newline when appending trimmed code review findings or the default message in .github/skills/ai-summary-comment/scripts/post-pr-finalize-comment.ps1 to ensure proper spacing and line breaks in the generated code review section.
54a4545 to
680305c
Compare
2d48e88 to
d51da80
Compare
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Summary
Enhances the
pr-finalizeskill with a two-phase workflow that includes code review, and adds critical safety rules preventing agents from approving or blocking PRs.What This PR Does
--approveor--request-changesflagsCommits
1.
2ad480c77d- Add multi-model code review to pr-finalize skillOriginal commit that added the code review phase to pr-finalize.
2.
71eb8c2216- Add critical rule: agents must NEVER approve/request-changes on PRsWhy: AI agents should never make approval decisions on PRs. That's a human responsibility.
Changes:
🚨 CRITICALwarning section at top of skillcopilot-instructions.mdgh pr review --commentis allowed3.
335be127cb- Simplify pr-finalize: remove multi-model orchestration, keep code reviewWhy: After review, decided that multi-model orchestration should be user-controlled, not built into the skill. Users can invoke pr-finalize with different models themselves if they want multi-model coverage.
Changes:
--approve/--request-changesexamples that conflicted with the NEVER ruleThe Two-Phase Workflow
Phase 1: Title & Description Review
Phase 2: Code Review
Key Safety Rule
Files Changed
.github/skills/pr-finalize/SKILL.md.github/copilot-instructions.mdWhy Not Multi-Model Orchestration?
The original approach had the skill orchestrate 5 different AI models in parallel. This was removed because:
modelparameter themselvesIssues Fixed
N/A - Enhancement to agent workflow