From 80d7e412c2bc9acf64e2bc69359c6d3b854bfdc2 Mon Sep 17 00:00:00 2001 From: Shane Neuville Date: Sat, 31 Jan 2026 07:59:13 -0600 Subject: [PATCH 01/16] Update pr agent with multi-model try-fix workflow Phase 4 now uses exhaustive multi-model exploration: Round 1: Run try-fix with 5 models sequentially: - claude-sonnet-4.5 - claude-opus-4.5 - gpt-5.2 - gpt-5.2-codex - gemini-3-pro-preview Round 2+: Cross-pollination loop: - Share all results with all 5 models - Ask for NEW ideas based on learnings - Run try-fix for each new idea - Repeat until all models confirm 'no new ideas' Key constraints: - SEQUENTIAL ONLY (same files/device) - Exhaustion = all 5 models confirm no new ideas - Never skip models in Round 1 --- .github/agents/pr.md | 18 ++++--- .github/agents/pr/post-gate.md | 98 +++++++++++++++++++++++----------- 2 files changed, 78 insertions(+), 38 deletions(-) diff --git a/.github/agents/pr.md b/.github/agents/pr.md index 53e323d82c30..edc98ba472af 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -56,16 +56,20 @@ After Gate passes, read `.github/agents/pr/post-gate.md` for **Phases 4-5**. --- -### 🚨 CRITICAL: Phase 4 Always Uses `try-fix` Skill +### 🚨 CRITICAL: Phase 4 Uses Multi-Model try-fix -**Even when a PR already has a fix**, Phase 4 requires running the `try-fix` skill to: -1. **Independently explore alternative solutions** - Generate fix ideas WITHOUT looking at the PR's solution -2. **Test alternatives empirically** - Actually implement and run tests, don't just theorize -3. **Compare with PR's fix** - PR's fix is already validated by Gate; try-fix explores if there's something better +**Even when a PR already has a fix**, Phase 4 requires running the `try-fix` skill with **5 different AI models** to: +1. **Maximize fix diversity** - Each model brings different perspectives +2. **Cross-pollinate ideas** - Share results between models to spark new ideas +3. **Ensure exhaustive exploration** - Only stop when ALL models confirm "no new ideas" -The PR's fix is NOT tested by try-fix (Gate already did that). try-fix generates and tests YOUR independent ideas. +**The multi-model workflow:** +- **Round 1**: Run try-fix 5 times sequentially with: `claude-sonnet-4.5`, `claude-opus-4.5`, `gpt-5.2`, `gpt-5.2-codex`, `gemini-3-pro-preview` +- **Round 2+**: Share all results with all 5 models, run try-fix for any new ideas, repeat until exhaustion -This ensures independent analysis rather than rubber-stamping the PR. +**âš ī¸ SEQUENTIAL ONLY**: try-fix runs modify the same files and use the same device. Never run in parallel. + +See `post-gate.md` for detailed Phase 4 instructions. --- diff --git a/.github/agents/pr/post-gate.md b/.github/agents/pr/post-gate.md index 7b42ace936c7..3578aa333b28 100644 --- a/.github/agents/pr/post-gate.md +++ b/.github/agents/pr/post-gate.md @@ -48,61 +48,92 @@ The purpose of Phase 4 is NOT to re-test the PR's fix, but to: **Do NOT let the PR's fix influence your thinking.** Generate ideas as if you hadn't seen the PR. -### Step 1: Agent Orchestrates try-fix Loop +### Step 1: Multi-Model try-fix Exploration -Invoke the `try-fix` skill repeatedly. The skill handles one fix attempt per invocation. +Phase 4 uses a **multi-model approach** to maximize fix diversity. Each AI model brings different perspectives and may find solutions others miss. -**IMPORTANT:** Always pass the `state_file` parameter so try-fix can record its results: +**âš ī¸ SEQUENTIAL ONLY**: try-fix runs MUST execute one at a time. They modify the same files and use the same test device. Never run try-fix attempts in parallel. + +#### Round 1: Run try-fix with Each Model + +Run the `try-fix` skill **5 times sequentially**, once with each model: + +| Order | Model | Invocation | +|-------|-------|------------| +| 1 | `claude-sonnet-4.5` | `task` agent with `model: "claude-sonnet-4.5"` | +| 2 | `claude-opus-4.5` | `task` agent with `model: "claude-opus-4.5"` | +| 3 | `gpt-5.2` | `task` agent with `model: "gpt-5.2"` | +| 4 | `gpt-5.2-codex` | `task` agent with `model: "gpt-5.2-codex"` | +| 5 | `gemini-3-pro-preview` | `task` agent with `model: "gemini-3-pro-preview"` | + +**For each model**, invoke the try-fix skill: ``` -state_file: CustomAgentLogsTmp/PRState/pr-XXXXX.md +Invoke the try-fix skill for PR #XXXXX: +- Platform: [android/ios] +- TestFilter: "IssueXXXXX" +- state_file: CustomAgentLogsTmp/PRState/pr-XXXXX.md + +Generate ONE independent fix idea and test it empirically. +Do NOT look at the PR's fix - generate ideas independently. ``` -try-fix will automatically append rows to the Fix Candidates table and set the "Exhausted" field. You remain responsible for: -- Setting "Selected Fix" field with reasoning -- Updating phase status to ✅ COMPLETE +**Wait for each to complete before starting the next.** + +#### Round 2+: Cross-Pollination Loop + +After Round 1 completes, share ALL results with ALL 5 models and ask for NEW ideas: ``` ┌─────────────────────────────────────────────────────────────┐ -│ Agent orchestration loop │ +│ Cross-Pollination Loop │ ├─────────────────────────────────────────────────────────────┤ │ │ -│ attempts = 0 │ -│ max_attempts = 5 │ -│ state_file = "CustomAgentLogsTmp/PRState/pr-XXXXX.md" │ +│ LOOP until no new ideas: │ │ │ -│ while (attempts < max_attempts): │ -│ result = invoke try-fix skill (with state_file) │ -│ attempts++ │ +│ 1. Compile summary of ALL try-fix attempts so far: │ +│ - Approach tried │ +│ - Pass/Fail result │ +│ - Key learnings (why it worked or failed) │ │ │ -│ if result.exhausted: │ -│ break # try-fix has no more ideas │ +│ 2. Share summary with ALL 5 models, ask each: │ +│ "Given these results, do you have any NEW fix ideas │ +│ that haven't been tried? If yes, describe briefly." │ │ │ -│ # result.passed indicates if this attempt worked │ -│ # try-fix already recorded to state file │ -│ # Continue loop to explore more alternatives │ +│ 3. For each model with a new idea: │ +│ → Run try-fix with that model (SEQUENTIAL) │ +│ → Wait for completion before next │ │ │ -│ # After loop: compare all try-fix results vs PR's fix │ -│ # Update "Exhausted" and "Selected Fix" fields │ +│ 4. If ANY new ideas were tested → repeat loop │ +│ If NO new ideas from ANY model → exit loop │ │ │ └─────────────────────────────────────────────────────────────┘ ``` -**Stop the loop when:** -- `try-fix` returns `exhausted=true` (no more ideas) -- 5 try-fix attempts have been made -- User requests to stop +**Exhaustion criteria**: The loop exits when ALL 5 models confirm they have no new ideas to try. + +#### try-fix Invocation Details + +Each `try-fix` invocation (via task agent): +- Reads state file to learn from prior attempts +- Reverts PR's fix to get broken baseline +- Proposes and implements ONE fix idea +- Runs tests to validate +- Records result with failure analysis +- Reverts changes (restores PR's fix) +- Updates state file with attempt results + +See `.github/skills/try-fix/SKILL.md` for full details. ### What try-fix Does (Each Invocation) -Each `try-fix` invocation: +Each `try-fix` invocation (run via task agent with specific model): 1. Reads state file to learn from prior failed attempts 2. Reverts PR's fix to get a broken baseline 3. Proposes ONE new independent fix idea 4. Implements and tests it 5. Records result (with failure analysis if it failed) -6. **Updates state file** (appends row to Fix Candidates table if state_file provided) +6. **Updates state file** (appends row to Fix Candidates table) 7. Reverts all changes (restores PR's fix) -8. Returns `{passed: bool, exhausted: bool}` See `.github/skills/try-fix/SKILL.md` for full details. @@ -143,6 +174,7 @@ Update the state file: - **try-fix found a simpler/better alternative** → Request changes with suggestion - **try-fix found same solution independently** → Strong validation, approve PR - **All try-fix attempts failed** → PR's fix is the only working solution, approve PR +- **Multiple passing alternatives** → Select simplest/most robust ### Step 4: Apply Selected Fix (if different from PR) @@ -165,10 +197,11 @@ Update the state file: 5. Change 📋 Report status to `â–ļī¸ IN PROGRESS` **Before marking ✅ COMPLETE, verify state file contains:** -- [ ] Root Cause Analysis filled in (if applicable) +- [ ] Round 1 completed: All 5 models ran try-fix +- [ ] Cross-pollination completed: All 5 models confirmed "no new ideas" - [ ] Fix Candidates table has numbered rows for each try-fix attempt - [ ] Each row has: approach, test result, files changed, notes -- [ ] "Exhausted" field set (Yes/No) +- [ ] "Exhausted" field set to Yes (all models confirmed no new ideas) - [ ] "Selected Fix" populated with reasoning - [ ] No âŗ PENDING markers remain in Fix section - [ ] State file committed @@ -287,8 +320,11 @@ Update all phase statuses to complete. - ❌ **Looking at PR's fix before generating ideas** - Generate fix ideas independently first - ❌ **Re-testing the PR's fix in try-fix** - Gate already validated it; try-fix tests YOUR ideas -- ❌ **Skipping the try-fix loop** - Always explore at least one independent alternative +- ❌ **Skipping models in Round 1** - All 5 models must run try-fix before cross-pollination +- ❌ **Running try-fix in parallel** - SEQUENTIAL ONLY - they modify same files and use same device +- ❌ **Stopping before cross-pollination** - Must share results and check for new ideas - ❌ **Not analyzing why fixes failed** - Record the flawed reasoning to help future attempts - ❌ **Selecting a failing fix** - Only select from passing candidates - ❌ **Forgetting to revert between attempts** - Each try-fix must start from broken baseline, end with PR restored +- ❌ **Declaring exhaustion prematurely** - All 5 models must confirm "no new ideas" - ❌ **Rushing the report** - Take time to write clear justification From 69cc6af4030c2b43cb7793dffebcdf0d44a8d021 Mon Sep 17 00:00:00 2001 From: Jakub Florkowski Date: Sat, 31 Jan 2026 15:59:02 +0100 Subject: [PATCH 02/16] Address Copilot review suggestions - Clarify model parameter is passed to task tool (not agent frontmatter) - Add complete try-fix invocation fields: problem, test_command, target_files - Fix 'Do NOT look at PR fix' to 'Review PR fix to ensure approach is DIFFERENT' - Add summary size limit guidance for cross-pollination (30k char limit) - Add MAX ROUNDS limit (3) to prevent infinite loops - Separate coordination loop stop condition from per-invocation Exhausted flag - Add root cause analysis checklist item in Phase 4 completion --- .github/agents/pr.md | 4 +++- .github/agents/pr/post-gate.md | 39 ++++++++++++++++++++++------------ 2 files changed, 29 insertions(+), 14 deletions(-) diff --git a/.github/agents/pr.md b/.github/agents/pr.md index edc98ba472af..7b840807adb7 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -64,11 +64,13 @@ After Gate passes, read `.github/agents/pr/post-gate.md` for **Phases 4-5**. 3. **Ensure exhaustive exploration** - Only stop when ALL models confirm "no new ideas" **The multi-model workflow:** -- **Round 1**: Run try-fix 5 times sequentially with: `claude-sonnet-4.5`, `claude-opus-4.5`, `gpt-5.2`, `gpt-5.2-codex`, `gemini-3-pro-preview` +- **Round 1**: Run try-fix 5 times sequentially using the `task` agent with `model` parameter: `claude-sonnet-4.5`, `claude-opus-4.5`, `gpt-5.2`, `gpt-5.2-codex`, `gemini-3-pro-preview` - **Round 2+**: Share all results with all 5 models, run try-fix for any new ideas, repeat until exhaustion **âš ī¸ SEQUENTIAL ONLY**: try-fix runs modify the same files and use the same device. Never run in parallel. +**Note:** The `model` parameter is passed to the `task` tool, which supports model selection. This is separate from agent YAML frontmatter (which is VS Code-only). + See `post-gate.md` for detailed Phase 4 instructions. --- diff --git a/.github/agents/pr/post-gate.md b/.github/agents/pr/post-gate.md index 3578aa333b28..07099719665a 100644 --- a/.github/agents/pr/post-gate.md +++ b/.github/agents/pr/post-gate.md @@ -60,21 +60,26 @@ Run the `try-fix` skill **5 times sequentially**, once with each model: | Order | Model | Invocation | |-------|-------|------------| -| 1 | `claude-sonnet-4.5` | `task` agent with `model: "claude-sonnet-4.5"` | -| 2 | `claude-opus-4.5` | `task` agent with `model: "claude-opus-4.5"` | -| 3 | `gpt-5.2` | `task` agent with `model: "gpt-5.2"` | -| 4 | `gpt-5.2-codex` | `task` agent with `model: "gpt-5.2-codex"` | -| 5 | `gemini-3-pro-preview` | `task` agent with `model: "gemini-3-pro-preview"` | +| 1 | `claude-sonnet-4.5` | `task` tool with `model: "claude-sonnet-4.5"` parameter | +| 2 | `claude-opus-4.5` | `task` tool with `model: "claude-opus-4.5"` parameter | +| 3 | `gpt-5.2` | `task` tool with `model: "gpt-5.2"` parameter | +| 4 | `gpt-5.2-codex` | `task` tool with `model: "gpt-5.2-codex"` parameter | +| 5 | `gemini-3-pro-preview` | `task` tool with `model: "gemini-3-pro-preview"` parameter | + +**Note:** The `model` parameter is passed to the `task` tool, which supports model selection. This is separate from agent YAML frontmatter (which is VS Code-only). **For each model**, invoke the try-fix skill: ``` Invoke the try-fix skill for PR #XXXXX: -- Platform: [android/ios] -- TestFilter: "IssueXXXXX" +- problem: [Description of the bug from issue/PR - what's broken and expected behavior] +- platform: [android/ios] +- test_command: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform [android/ios] -TestFilter "IssueXXXXX" +- target_files: + - src/[area]/[likely-affected-file-1].cs + - src/[area]/[likely-affected-file-2].cs - state_file: CustomAgentLogsTmp/PRState/pr-XXXXX.md -Generate ONE independent fix idea and test it empirically. -Do NOT look at the PR's fix - generate ideas independently. +Generate ONE independent fix idea. Review the PR's fix first to ensure your approach is DIFFERENT. ``` **Wait for each to complete before starting the next.** @@ -83,6 +88,11 @@ Do NOT look at the PR's fix - generate ideas independently. After Round 1 completes, share ALL results with ALL 5 models and ask for NEW ideas: +**âš ī¸ Summary Size Limit**: To stay within Copilot CLI's 30,000 char prompt limit, keep attempt summaries bounded: +- Max 3-4 bullet points per attempt +- Focus on: approach name, result (✅/❌), one-line key learning +- Omit verbose stack traces or full code diffs + ``` ┌─────────────────────────────────────────────────────────────┐ │ Cross-Pollination Loop │ @@ -90,10 +100,10 @@ After Round 1 completes, share ALL results with ALL 5 models and ask for NEW ide │ │ │ LOOP until no new ideas: │ │ │ -│ 1. Compile summary of ALL try-fix attempts so far: │ -│ - Approach tried │ +│ 1. Compile BOUNDED summary of ALL try-fix attempts: │ +│ - Attempt #, approach (1 line) │ │ - Pass/Fail result │ -│ - Key learnings (why it worked or failed) │ +│ - Key learning (1 line - why it worked or failed) │ │ │ │ 2. Share summary with ALL 5 models, ask each: │ │ "Given these results, do you have any NEW fix ideas │ @@ -106,10 +116,12 @@ After Round 1 completes, share ALL results with ALL 5 models and ask for NEW ide │ 4. If ANY new ideas were tested → repeat loop │ │ If NO new ideas from ANY model → exit loop │ │ │ +│ MAX ROUNDS: 3 (to prevent infinite loops) │ +│ │ └─────────────────────────────────────────────────────────────┘ ``` -**Exhaustion criteria**: The loop exits when ALL 5 models confirm they have no new ideas to try. +**Coordination loop stop condition**: Exit when a full round completes and NO model proposes any new fix ideas. This is separate from the per-invocation `Exhausted` flag that each `try-fix` run sets in the state file. #### try-fix Invocation Details @@ -203,6 +215,7 @@ Update the state file: - [ ] Each row has: approach, test result, files changed, notes - [ ] "Exhausted" field set to Yes (all models confirmed no new ideas) - [ ] "Selected Fix" populated with reasoning +- [ ] Root cause analysis documented for the selected fix (to be surfaced in 📋 Report phase "### Root Cause" section) - [ ] No âŗ PENDING markers remain in Fix section - [ ] State file committed From fe55c3fd215680464b3bd35795698dc3d8b9c505 Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 09:40:19 -0600 Subject: [PATCH 03/16] Add rules for template formatting and skill script usage - Rule: Follow Templates EXACTLY - don't add attributes or modify format - Rule: Use Skills' Scripts - Don't Bypass with manual commands - These rules prevent issues with downstream script regex patterns --- .github/agents/pr.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/.github/agents/pr.md b/.github/agents/pr.md index 7b840807adb7..ea51ad0fdee7 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -56,6 +56,24 @@ After Gate passes, read `.github/agents/pr/post-gate.md` for **Phases 4-5**. --- +### 🚨 CRITICAL: Follow Templates EXACTLY + +When creating state files, use the EXACT format from this document: +- **Do NOT add attributes** like `open` to `
` tags +- **Do NOT "improve"** the template format +- **Do NOT deviate** from documented structure +- Downstream scripts depend on exact formatting (regex patterns expect specific structure) + +### 🚨 CRITICAL: Use Skills' Scripts - Don't Bypass + +When a skill provides a PowerShell script: +- **Run the script** - don't interpret what it does and do it manually +- **Fix inputs if script fails** - don't bypass with manual `gh` commands +- **Use `-DryRun` to debug** - see what the script would produce before posting +- Scripts handle formatting, API calls, and section management correctly + +--- + ### 🚨 CRITICAL: Phase 4 Uses Multi-Model try-fix **Even when a PR already has a fix**, Phase 4 requires running the `try-fix` skill with **5 different AI models** to: From debbee608e2bb99567471eb75f136e41c5f04c26 Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 10:57:06 -0600 Subject: [PATCH 04/16] Add 'Stop on Environment Blockers' rule to PR agent When environment setup is missing (Appium, devices, drivers, etc.), the agent must STOP and ask the user before continuing. This prevents marking phases as BLOCKED and continuing without actual verification. --- .github/agents/pr.md | 49 ++++++++++++++++++++++++++++++++++ .github/agents/pr/post-gate.md | 10 +++++++ 2 files changed, 59 insertions(+) diff --git a/.github/agents/pr.md b/.github/agents/pr.md index ea51ad0fdee7..6774313cc5ea 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -72,6 +72,55 @@ When a skill provides a PowerShell script: - **Use `-DryRun` to debug** - see what the script would produce before posting - Scripts handle formatting, API calls, and section management correctly +### 🚨 CRITICAL: Stop on Environment Blockers + +If you encounter an environment or system setup blocker that prevents completing a phase: + +**STOP IMMEDIATELY. Do NOT continue to the next phase.** + +**Common blockers:** +- Missing Appium drivers (Windows, iOS, Android) +- WinAppDriver not installed +- Xcode/iOS simulators not available (on Windows) +- Android emulator not running or not configured +- Developer Mode not enabled +- Port conflicts (e.g., 4723 in use) +- Missing SDKs or tools + +**When blocked, you MUST:** +1. **Stop all work** - Do not proceed to the next phase +2. **Report the blocker** clearly: + - What step failed + - What tool/driver/device is missing + - What error message was shown +3. **Ask the user** how to proceed: + - Install the missing component? + - Switch to a different platform? + - Skip this phase with documented limitations? +4. **Wait for user response** - Do not assume or work around + +**Example blocker report:** +``` +⛔ BLOCKED: Cannot complete Gate phase + +**What failed:** Running verify-tests-fail-without-fix skill +**Missing:** Appium Windows driver not installed +**Error:** "Could not find a driver for automationName 'Windows'" + +**Options:** +1. Install Windows Appium driver: `appium driver install windows` +2. Switch to Android platform (if available) +3. Skip automated verification (note limitation in report) + +Which would you like me to do? +``` + +**Never:** +- ❌ Mark a phase as âš ī¸ BLOCKED and continue to the next phase +- ❌ Claim "verification passed" when tests couldn't actually run +- ❌ Skip device/emulator testing and proceed with code review only +- ❌ Assume the user wants you to install things without asking + --- ### 🚨 CRITICAL: Phase 4 Uses Multi-Model try-fix diff --git a/.github/agents/pr/post-gate.md b/.github/agents/pr/post-gate.md index 07099719665a..3d23fe3bfc4a 100644 --- a/.github/agents/pr/post-gate.md +++ b/.github/agents/pr/post-gate.md @@ -28,6 +28,16 @@ If Gate is not passed, go back to `.github/agents/pr.md` and complete phases 1-3 **Rule:** Status ✅ means "documentation complete", not "I finished thinking about it" +### 🚨 CRITICAL: Stop on Environment Blockers (Applies to Phase 4) + +The same "Stop on Environment Blockers" rule from `pr.md` applies here. If try-fix cannot run due to: +- Missing Appium drivers +- Device/emulator not available +- WinAppDriver not installed +- Platform tools missing + +**STOP and ask the user** before continuing. Do NOT mark try-fix attempts as "BLOCKED" and continue. Either fix the environment issue or get explicit user permission to skip. + --- ## 🔧 FIX: Explore and Select Fix (Phase 4) From ad29f6a796908ee518065d9784aeabca6e240428 Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 11:00:14 -0600 Subject: [PATCH 05/16] Add PR review plan template and Review-PR.ps1 script - PLAN-TEMPLATE.md: Reusable 5-phase review plan with all critical rules - Review-PR.ps1: Script to invoke Copilot CLI with the PR agent Usage: pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -Platform ios --- .github/agents/pr/PLAN-TEMPLATE.md | 182 +++++++++++++++++++++++++ .github/scripts/Review-PR.ps1 | 212 +++++++++++++++++++++++++++++ 2 files changed, 394 insertions(+) create mode 100644 .github/agents/pr/PLAN-TEMPLATE.md create mode 100644 .github/scripts/Review-PR.ps1 diff --git a/.github/agents/pr/PLAN-TEMPLATE.md b/.github/agents/pr/PLAN-TEMPLATE.md new file mode 100644 index 000000000000..eaf3ddaad2df --- /dev/null +++ b/.github/agents/pr/PLAN-TEMPLATE.md @@ -0,0 +1,182 @@ +# PR Review Plan Template + +## Overview + +This is a **reusable template** for reviewing PRs using the 5-phase PR Agent workflow. Copy this plan and fill in the PR-specific details. + +**Source documents:** +- `.github/agents/pr.md` - Phases 1-3 (Pre-Flight, Tests, Gate) +- `.github/agents/pr/post-gate.md` - Phases 4-5 (Fix, Report) + +--- + +## 🚨 Critical Rules + +### Rule 1: Stop on Environment Blockers +If ANY phase cannot complete due to missing tools/drivers/devices: +1. **STOP immediately** - Do NOT continue to next phase +2. **Report the blocker** - What failed, what's missing, error message +3. **Ask user** for resolution options +4. **Wait for response** - Do not assume or work around + +**Common blockers:** Appium drivers, WinAppDriver, Xcode, emulators, Developer Mode, port conflicts, SDKs + +### Rule 2: Gate via Task Agent Only +Gate verification MUST run as a `task` agent invocation, NOT inline commands. +- The script does TWO test runs (revert fix → test, restore fix → test) +- Running inline allows fabrication of results + +### Rule 3: Multi-Model try-fix (Phase 4) +Run try-fix with 5 different models SEQUENTIALLY: +1. `claude-sonnet-4.5` +2. `claude-opus-4.5` +3. `gpt-5.2` +4. `gpt-5.2-codex` +5. `gemini-3-pro-preview` + +Then cross-pollinate: Share ALL results with ALL models, ask for NEW ideas, repeat until exhaustion. + +### Rule 4: Follow Templates Exactly +- Do NOT add attributes like `open` to `
` tags +- Do NOT "improve" template formats +- Downstream scripts depend on exact regex patterns + +### Rule 5: Use Skills' Scripts +- Run the provided PowerShell scripts, don't bypass with manual commands +- If script fails, fix inputs rather than using manual `gh` commands + +--- + +## Work Plan + +### Phase 1: Pre-Flight - Context Gathering +- [ ] Create state file: `CustomAgentLogsTmp/PRState/pr-XXXXX.md` +- [ ] Gather PR metadata (title, body, labels, author) +- [ ] Fetch and read linked issue +- [ ] Fetch PR comments and review feedback +- [ ] Check for prior agent reviews (if found, import and resume) +- [ ] Document platforms affected +- [ ] Identify changed files and classify (fix vs test) +- [ ] Document PR's fix approach in Fix Candidates table +- [ ] Update state file: Pre-Flight → ✅ COMPLETE +- [ ] Commit state file + +**Boundaries:** No code analysis, no opinions on fix, no test running + +### Phase 2: Tests - Verify Test Existence +- [ ] Check if PR includes UI tests +- [ ] Verify tests follow naming conventions (`IssueXXXXX` pattern) +- [ ] If tests exist: Verify they compile +- [ ] If tests missing: Invoke `write-ui-tests` skill +- [ ] Document test files in state file +- [ ] Update state file: Tests → ✅ COMPLETE +- [ ] Commit state file + +### Phase 3: Gate - Verify Tests Catch Bug ⛔ +**🚨 BLOCKER PHASE - Cannot continue if Gate fails** + +- [ ] Invoke verification via **task agent** (NOT inline): + ``` + Invoke task agent with: "Run verify-tests-fail-without-fix skill + - Platform: [android/ios/windows] + - TestFilter: 'IssueXXXXX' + - RequireFullVerification: true + Report: Did tests FAIL without fix? PASS with fix? Final status?" + ``` +- [ ] ⛔ **If environment blocker**: STOP, report, ask user +- [ ] Verify output: + - ✅ Tests FAIL without fix (bug reproduced) + - ✅ Tests PASS with fix (fix works) +- [ ] If Gate fails: STOP, request test fixes from author +- [ ] Update state file: Gate → ✅ PASSED +- [ ] Commit state file + +### Phase 4: Fix - Multi-Model Exploration 🔧 +*(Only proceed if Gate ✅ PASSED)* + +**Round 1: Run try-fix with each model (SEQUENTIAL)** +- [ ] Attempt 1: `claude-sonnet-4.5` +- [ ] Attempt 2: `claude-opus-4.5` +- [ ] Attempt 3: `gpt-5.2` +- [ ] Attempt 4: `gpt-5.2-codex` +- [ ] Attempt 5: `gemini-3-pro-preview` +- [ ] ⛔ **If environment blocker on any attempt**: STOP, report, ask user +- [ ] Record each: approach, test result, files changed, failure analysis + +**Round 2+: Cross-Pollination Loop** +- [ ] Share bounded summary of ALL attempts with ALL 5 models +- [ ] Ask each: "Any NEW fix ideas not yet tried?" +- [ ] For each new idea: Run try-fix with that model +- [ ] Repeat until NO model proposes new ideas (max 3 rounds) + +**Completion:** +- [ ] Mark Exhausted: Yes (all models confirm no new ideas) +- [ ] Compare passing candidates with PR's fix +- [ ] Select best fix (test results → simplicity → robustness → style) +- [ ] Update state file: Fix → ✅ COMPLETE +- [ ] Commit state file + +### Phase 5: Report - Final Recommendation 📋 +*(Only proceed if Phases 1-4 complete)* + +- [ ] Run `pr-finalize` skill to verify title/description accuracy +- [ ] Generate comprehensive review: + - Root cause analysis + - All fix candidates with results + - Final recommendation (APPROVE / REQUEST CHANGES) + - Reasoning +- [ ] Post review via `ai-summary-comment` skill (NOT manual gh command) +- [ ] Update state file: Report → ✅ COMPLETE +- [ ] Commit final state file + +--- + +## Success Criteria + +✅ **APPROVE if:** +- Gate passed (tests catch bug) +- PR's fix works and is equal/better than alternatives +- Tests are comprehensive +- Code changes are minimal and correct + +âš ī¸ **REQUEST CHANGES if:** +- Gate failed (tests don't catch bug) +- Alternative fix is significantly better +- Tests are missing or inadequate +- Code has issues + +--- + +## Environment Blocker Report Template + +When blocked, use this format: + +``` +⛔ BLOCKED: Cannot complete [Phase] phase + +**What failed:** [Step description] +**Missing:** [Tool/driver/device] +**Error:** "[Error message]" + +**Options:** +1. Install [component]: `[command]` +2. Switch to [alternative platform] +3. Skip with documented limitation + +Which would you like me to do? +``` + +--- + +## Quick Reference + +| Phase | Key Action | Blocker Response | +|-------|------------|------------------| +| Pre-Flight | Create state file, gather context | N/A (no env needed) | +| Tests | Verify/create tests, compile check | N/A (no device needed) | +| Gate | Task agent → verify script | ⛔ STOP, report, ask | +| Fix | Multi-model try-fix (5+ attempts) | ⛔ STOP, report, ask | +| Report | Post via ai-summary-comment skill | ⛔ STOP, report, ask | + +**State file:** `CustomAgentLogsTmp/PRState/pr-XXXXX.md` +**Never:** Mark BLOCKED and continue, claim success without running tests, bypass scripts diff --git a/.github/scripts/Review-PR.ps1 b/.github/scripts/Review-PR.ps1 new file mode 100644 index 000000000000..5824b4433789 --- /dev/null +++ b/.github/scripts/Review-PR.ps1 @@ -0,0 +1,212 @@ +<# +.SYNOPSIS + Runs the PR Agent workflow to review a GitHub Pull Request using Copilot CLI. + +.DESCRIPTION + This script invokes GitHub Copilot CLI with the PR agent to perform a comprehensive + 5-phase review of a pull request: + + Phase 1: Pre-Flight - Context gathering + Phase 2: Tests - Verify test existence + Phase 3: Gate - Verify tests catch the bug + Phase 4: Fix - Multi-model exploration of alternatives + Phase 5: Report - Final recommendation + + The script uses the plan template at .github/agents/pr/PLAN-TEMPLATE.md. + +.PARAMETER PRNumber + The GitHub PR number to review (e.g., 33687) + +.PARAMETER Platform + The platform to use for testing. Default is 'android'. + Valid values: android, ios, windows, maccatalyst + +.PARAMETER SkipCheckout + If specified, skips checking out the PR branch (useful if already on the branch) + +.PARAMETER DryRun + If specified, shows what would be done without actually invoking Copilot CLI + +.EXAMPLE + .\Review-PR.ps1 -PRNumber 33687 + Reviews PR #33687 using the default platform (android) + +.EXAMPLE + .\Review-PR.ps1 -PRNumber 33687 -Platform ios + Reviews PR #33687 using iOS for testing + +.EXAMPLE + .\Review-PR.ps1 -PRNumber 33687 -SkipCheckout + Reviews PR #33687 without checking out (assumes already on correct branch) + +.NOTES + Prerequisites: + - GitHub Copilot CLI installed and authenticated + - GitHub CLI (gh) installed and authenticated + - For testing: Appropriate platform tools (Appium, emulators, etc.) +#> + +[CmdletBinding()] +param( + [Parameter(Mandatory = $true, Position = 0)] + [int]$PRNumber, + + [Parameter(Mandatory = $false)] + [ValidateSet('android', 'ios', 'windows', 'maccatalyst')] + [string]$Platform = 'android', + + [Parameter(Mandatory = $false)] + [switch]$SkipCheckout, + + [Parameter(Mandatory = $false)] + [switch]$DryRun +) + +$ErrorActionPreference = 'Stop' + +# Get repository root +$RepoRoot = git rev-parse --show-toplevel 2>$null +if (-not $RepoRoot) { + Write-Error "Not in a git repository" + exit 1 +} + +Write-Host "" +Write-Host "╔═══════════════════════════════════════════════════════════╗" -ForegroundColor Cyan +Write-Host "║ PR Review with Copilot CLI ║" -ForegroundColor Cyan +Write-Host "â• â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•Ŗ" -ForegroundColor Cyan +Write-Host "║ PR: #$PRNumber ║" -ForegroundColor Cyan +Write-Host "║ Platform: $Platform ║" -ForegroundColor Cyan +Write-Host "╚═══════════════════════════════════════════════════════════╝" -ForegroundColor Cyan +Write-Host "" + +# Step 1: Verify prerequisites +Write-Host "📋 Checking prerequisites..." -ForegroundColor Yellow + +# Check GitHub CLI +$ghVersion = gh --version 2>$null | Select-Object -First 1 +if (-not $ghVersion) { + Write-Error "GitHub CLI (gh) is not installed. Install from: https://cli.github.com/" + exit 1 +} +Write-Host " ✅ GitHub CLI: $ghVersion" -ForegroundColor Green + +# Check Copilot CLI +$copilotVersion = ghcs --version 2>$null +if (-not $copilotVersion) { + # Try alternative command + $copilotVersion = gh copilot --version 2>$null + if (-not $copilotVersion) { + Write-Error "GitHub Copilot CLI is not installed. Install with: gh extension install github/gh-copilot" + exit 1 + } +} +Write-Host " ✅ Copilot CLI available" -ForegroundColor Green + +# Check PR exists +Write-Host " 🔍 Verifying PR #$PRNumber exists..." -ForegroundColor Gray +$prInfo = gh pr view $PRNumber --json title,state,url 2>$null | ConvertFrom-Json +if (-not $prInfo) { + Write-Error "PR #$PRNumber not found or not accessible" + exit 1 +} +Write-Host " ✅ PR: $($prInfo.title)" -ForegroundColor Green +Write-Host " ✅ State: $($prInfo.state)" -ForegroundColor Green + +# Step 2: Checkout PR (unless skipped) +if (-not $SkipCheckout) { + Write-Host "" + Write-Host "đŸ“Ĩ Checking out PR #$PRNumber..." -ForegroundColor Yellow + + if ($DryRun) { + Write-Host " [DRY RUN] Would run: gh pr checkout $PRNumber" -ForegroundColor Magenta + } else { + gh pr checkout $PRNumber + if ($LASTEXITCODE -ne 0) { + Write-Error "Failed to checkout PR #$PRNumber" + exit 1 + } + Write-Host " ✅ Checked out PR branch" -ForegroundColor Green + } +} else { + Write-Host "" + Write-Host "â­ī¸ Skipping checkout (using current branch)" -ForegroundColor Yellow + $currentBranch = git branch --show-current + Write-Host " Current branch: $currentBranch" -ForegroundColor Gray +} + +# Step 3: Ensure state directory exists +$stateDir = Join-Path $RepoRoot "CustomAgentLogsTmp/PRState" +if (-not (Test-Path $stateDir)) { + New-Item -ItemType Directory -Path $stateDir -Force | Out-Null + Write-Host " 📁 Created state directory: $stateDir" -ForegroundColor Gray +} + +# Step 4: Build the prompt for Copilot CLI +$planTemplatePath = ".github/agents/pr/PLAN-TEMPLATE.md" + +$prompt = @" +Review PR #$PRNumber using the PR agent workflow. + +**Instructions:** +1. Read the plan template at `$planTemplatePath` for the 5-phase workflow +2. Read `.github/agents/pr.md` for Phases 1-3 instructions +3. Follow ALL critical rules, especially: + - STOP on environment blockers and ask before continuing + - Use task agent for Gate verification + - Run multi-model try-fix in Phase 4 + +**Platform for testing:** $Platform + +**Start with Phase 1: Pre-Flight** +- Create state file: CustomAgentLogsTmp/PRState/pr-$PRNumber.md +- Gather context from PR #$PRNumber +- Proceed through all 5 phases + +Begin the review now. +"@ + +Write-Host "" +Write-Host "🤖 Launching Copilot CLI with PR agent..." -ForegroundColor Yellow +Write-Host "" +Write-Host "═══════════════════════════════════════════════════════════" -ForegroundColor DarkGray +Write-Host "" + +if ($DryRun) { + Write-Host "[DRY RUN] Would invoke Copilot CLI with prompt:" -ForegroundColor Magenta + Write-Host "" + Write-Host $prompt -ForegroundColor Gray + Write-Host "" + Write-Host "═══════════════════════════════════════════════════════════" -ForegroundColor DarkGray + Write-Host "" + Write-Host "To run for real, remove the -DryRun flag" -ForegroundColor Yellow +} else { + # Invoke Copilot CLI + # Note: The exact command may vary based on Copilot CLI version + # Using ghcs (GitHub Copilot Shell) or gh copilot + + # Write prompt to temp file to avoid escaping issues + $promptFile = Join-Path $env:TEMP "copilot-pr-review-prompt.txt" + $prompt | Out-File -FilePath $promptFile -Encoding utf8 + + Write-Host "Prompt saved to: $promptFile" -ForegroundColor Gray + Write-Host "" + Write-Host "Run the following command to start the review:" -ForegroundColor Yellow + Write-Host "" + Write-Host " ghcs `"Review PR #$PRNumber using the pr agent. Platform: $Platform`"" -ForegroundColor Cyan + Write-Host "" + Write-Host "Or copy this full prompt:" -ForegroundColor Yellow + Write-Host "" + Write-Host $prompt -ForegroundColor White + Write-Host "" + Write-Host "═══════════════════════════════════════════════════════════" -ForegroundColor DarkGray + + # Optionally, try to invoke directly (may not work in all environments) + # Uncomment below to auto-launch: + # ghcs $prompt +} + +Write-Host "" +Write-Host "📝 State file will be at: CustomAgentLogsTmp/PRState/pr-$PRNumber.md" -ForegroundColor Gray +Write-Host "📋 Plan template at: $planTemplatePath" -ForegroundColor Gray +Write-Host "" From 886ea2aa8e582127d0d482b415a737ee73bce1f1 Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 13:00:32 -0600 Subject: [PATCH 06/16] Improve blocker handling and fix Review-PR.ps1 for Copilot CLI Changes: 1. Fix Review-PR.ps1 for interactive Copilot CLI - Copilot CLI is interactive (launched with 'copilot' command), not scriptable - Updated script to prepare environment and output context - Provides clear instructions for user to invoke Copilot CLI manually 2. Strengthen environment blocker handling with strict retry limits - Added explicit retry limits table: - Server errors (500, timeout): 0 retries - STOP immediately - Missing tools: 1 install attempt then STOP - Port conflicts: 1 kill attempt then STOP - WinAppDriver errors: 0 retries - STOP immediately - Added 'What I tried' section to blocker report template - New prohibitions: - Never keep troubleshooting after retry limit exceeded - Never spend more than 2-3 tool calls on same blocker - Never install things without asking between each - Updated PLAN-TEMPLATE.md with Rule 0 (highest priority) These changes ensure the agent stops promptly on environment issues rather than burning context on extended troubleshooting attempts. --- .github/agents/pr.md | 41 ++++++++--- .github/agents/pr/PLAN-TEMPLATE.md | 35 +++++++-- .github/scripts/Review-PR.ps1 | 110 +++++++++++------------------ 3 files changed, 101 insertions(+), 85 deletions(-) diff --git a/.github/agents/pr.md b/.github/agents/pr.md index 6774313cc5ea..3daf4002f594 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -80,46 +80,65 @@ If you encounter an environment or system setup blocker that prevents completing **Common blockers:** - Missing Appium drivers (Windows, iOS, Android) -- WinAppDriver not installed +- WinAppDriver not installed or returning errors - Xcode/iOS simulators not available (on Windows) - Android emulator not running or not configured - Developer Mode not enabled - Port conflicts (e.g., 4723 in use) - Missing SDKs or tools +- Server errors (500, timeout, "unknown error occurred") + +**Retry Limits - STRICT ENFORCEMENT:** + +| Blocker Type | Max Retries | Then Do | +|--------------|-------------|---------| +| Missing tool/driver | 1 install attempt | STOP and ask user | +| Server errors (500, timeout) | 0 | STOP immediately and report | +| Port conflicts | 1 (kill process) | STOP and ask user | +| Configuration issues | 1 fix attempt | STOP and ask user | **When blocked, you MUST:** 1. **Stop all work** - Do not proceed to the next phase -2. **Report the blocker** clearly: +2. **Do NOT keep troubleshooting** - After the retry limit, STOP +3. **Report the blocker** clearly: - What step failed - What tool/driver/device is missing - What error message was shown -3. **Ask the user** how to proceed: + - What you already tried (if any retries were used) +4. **Ask the user** how to proceed: - Install the missing component? - Switch to a different platform? - Skip this phase with documented limitations? -4. **Wait for user response** - Do not assume or work around +5. **Wait for user response** - Do not assume or work around **Example blocker report:** ``` ⛔ BLOCKED: Cannot complete Gate phase **What failed:** Running verify-tests-fail-without-fix skill -**Missing:** Appium Windows driver not installed -**Error:** "Could not find a driver for automationName 'Windows'" +**Blocker:** WinAppDriver returns 500 errors +**Error:** "WinAppDriver server fails to respond with proper status" + +**What I tried:** +- Ran Appium provisioning script +- Updated Windows driver to v5.1.8 -**Options:** -1. Install Windows Appium driver: `appium driver install windows` -2. Switch to Android platform (if available) -3. Skip automated verification (note limitation in report) +**I am STOPPING here. Options:** +1. User investigates WinAppDriver setup manually +2. Switch to Android/iOS platform +3. Accept Sandbox manual verification as sufficient +4. Skip automated verification (note limitation in report) Which would you like me to do? ``` **Never:** +- ❌ Keep trying different fixes after retry limit exceeded - ❌ Mark a phase as âš ī¸ BLOCKED and continue to the next phase - ❌ Claim "verification passed" when tests couldn't actually run - ❌ Skip device/emulator testing and proceed with code review only -- ❌ Assume the user wants you to install things without asking +- ❌ Install multiple tools/drivers without asking between each +- ❌ Spend more than 2-3 tool calls troubleshooting the same blocker --- diff --git a/.github/agents/pr/PLAN-TEMPLATE.md b/.github/agents/pr/PLAN-TEMPLATE.md index eaf3ddaad2df..d63e26088806 100644 --- a/.github/agents/pr/PLAN-TEMPLATE.md +++ b/.github/agents/pr/PLAN-TEMPLATE.md @@ -15,11 +15,38 @@ This is a **reusable template** for reviewing PRs using the 5-phase PR Agent wor ### Rule 1: Stop on Environment Blockers If ANY phase cannot complete due to missing tools/drivers/devices: 1. **STOP immediately** - Do NOT continue to next phase -2. **Report the blocker** - What failed, what's missing, error message -3. **Ask user** for resolution options -4. **Wait for response** - Do not assume or work around +2. **Do NOT keep troubleshooting** - Strict retry limits apply +3. **Report the blocker** - What failed, what's missing, error message +4. **Ask user** for resolution options +5. **Wait for response** - Do not assume or work around + +**Retry Limits (STRICT):** +| Blocker Type | Max Retries | Then Do | +|--------------|-------------|---------| +| Missing tool/driver | 1 install attempt | STOP and ask | +| Server errors (500, timeout) | 0 retries | STOP immediately | +| Port conflicts | 1 (kill process) | STOP and ask | +| Configuration issues | 1 fix attempt | STOP and ask | + +**Common blockers:** Appium drivers, WinAppDriver errors, Xcode, emulators, Developer Mode, port conflicts, SDKs + +**Blocker Report Template:** +``` +⛔ BLOCKED: Cannot complete [Phase Name] + +**What failed:** [Step/skill that failed] +**Blocker:** [Tool/driver/error type] +**Error:** "[Exact error message]" + +**What I tried:** [List retry attempts, max 1-2] -**Common blockers:** Appium drivers, WinAppDriver, Xcode, emulators, Developer Mode, port conflicts, SDKs +**I am STOPPING here. Options:** +1. [Option for user] +2. [Alternative platform] +3. [Skip with documented limitation] + +Which would you like me to do? +``` ### Rule 2: Gate via Task Agent Only Gate verification MUST run as a `task` agent invocation, NOT inline commands. diff --git a/.github/scripts/Review-PR.ps1 b/.github/scripts/Review-PR.ps1 index 5824b4433789..0a7db7848148 100644 --- a/.github/scripts/Review-PR.ps1 +++ b/.github/scripts/Review-PR.ps1 @@ -1,10 +1,9 @@ <# .SYNOPSIS - Runs the PR Agent workflow to review a GitHub Pull Request using Copilot CLI. + Prepares the environment for a PR review using the PR Agent workflow. .DESCRIPTION - This script invokes GitHub Copilot CLI with the PR agent to perform a comprehensive - 5-phase review of a pull request: + This script prepares the environment for a comprehensive 5-phase PR review: Phase 1: Pre-Flight - Context gathering Phase 2: Tests - Verify test existence @@ -12,7 +11,13 @@ Phase 4: Fix - Multi-model exploration of alternatives Phase 5: Report - Final recommendation - The script uses the plan template at .github/agents/pr/PLAN-TEMPLATE.md. + Since Copilot CLI is interactive, this script: + - Validates prerequisites (gh CLI, PR exists) + - Optionally checks out the PR branch + - Creates the state directory + - Outputs structured context for the agent to use + + Run this script, then ask Copilot CLI to "review PR #XXXXX using the pr agent" .PARAMETER PRNumber The GitHub PR number to review (e.g., 33687) @@ -25,25 +30,23 @@ If specified, skips checking out the PR branch (useful if already on the branch) .PARAMETER DryRun - If specified, shows what would be done without actually invoking Copilot CLI + If specified, shows what would be done without making changes .EXAMPLE .\Review-PR.ps1 -PRNumber 33687 - Reviews PR #33687 using the default platform (android) - -.EXAMPLE - .\Review-PR.ps1 -PRNumber 33687 -Platform ios - Reviews PR #33687 using iOS for testing + Prepares to review PR #33687 using the default platform (android) .EXAMPLE - .\Review-PR.ps1 -PRNumber 33687 -SkipCheckout - Reviews PR #33687 without checking out (assumes already on correct branch) + .\Review-PR.ps1 -PRNumber 33687 -Platform ios -SkipCheckout + Prepares to review PR #33687 on iOS without checking out .NOTES Prerequisites: - - GitHub Copilot CLI installed and authenticated - GitHub CLI (gh) installed and authenticated - For testing: Appropriate platform tools (Appium, emulators, etc.) + + After running this script, tell Copilot CLI: + "Review PR #XXXXX using the pr agent. Platform: " #> [CmdletBinding()] @@ -91,17 +94,8 @@ if (-not $ghVersion) { } Write-Host " ✅ GitHub CLI: $ghVersion" -ForegroundColor Green -# Check Copilot CLI -$copilotVersion = ghcs --version 2>$null -if (-not $copilotVersion) { - # Try alternative command - $copilotVersion = gh copilot --version 2>$null - if (-not $copilotVersion) { - Write-Error "GitHub Copilot CLI is not installed. Install with: gh extension install github/gh-copilot" - exit 1 - } -} -Write-Host " ✅ Copilot CLI available" -ForegroundColor Green +# Check Copilot CLI - informational only +Write-Host " â„šī¸ Note: After this script, tell Copilot CLI to review the PR" -ForegroundColor Gray # Check PR exists Write-Host " 🔍 Verifying PR #$PRNumber exists..." -ForegroundColor Gray @@ -145,65 +139,41 @@ if (-not (Test-Path $stateDir)) { # Step 4: Build the prompt for Copilot CLI $planTemplatePath = ".github/agents/pr/PLAN-TEMPLATE.md" -$prompt = @" -Review PR #$PRNumber using the PR agent workflow. - -**Instructions:** -1. Read the plan template at `$planTemplatePath` for the 5-phase workflow -2. Read `.github/agents/pr.md` for Phases 1-3 instructions -3. Follow ALL critical rules, especially: - - STOP on environment blockers and ask before continuing - - Use task agent for Gate verification - - Run multi-model try-fix in Phase 4 - -**Platform for testing:** $Platform - -**Start with Phase 1: Pre-Flight** -- Create state file: CustomAgentLogsTmp/PRState/pr-$PRNumber.md -- Gather context from PR #$PRNumber -- Proceed through all 5 phases - -Begin the review now. -"@ - -Write-Host "" -Write-Host "🤖 Launching Copilot CLI with PR agent..." -ForegroundColor Yellow Write-Host "" Write-Host "═══════════════════════════════════════════════════════════" -ForegroundColor DarkGray Write-Host "" if ($DryRun) { - Write-Host "[DRY RUN] Would invoke Copilot CLI with prompt:" -ForegroundColor Magenta - Write-Host "" - Write-Host $prompt -ForegroundColor Gray - Write-Host "" - Write-Host "═══════════════════════════════════════════════════════════" -ForegroundColor DarkGray + Write-Host "[DRY RUN] Would prepare environment with:" -ForegroundColor Magenta + Write-Host " PR: #$PRNumber" -ForegroundColor Gray + Write-Host " Platform: $Platform" -ForegroundColor Gray + Write-Host " State file: CustomAgentLogsTmp/PRState/pr-$PRNumber.md" -ForegroundColor Gray Write-Host "" Write-Host "To run for real, remove the -DryRun flag" -ForegroundColor Yellow } else { - # Invoke Copilot CLI - # Note: The exact command may vary based on Copilot CLI version - # Using ghcs (GitHub Copilot Shell) or gh copilot - - # Write prompt to temp file to avoid escaping issues - $promptFile = Join-Path $env:TEMP "copilot-pr-review-prompt.txt" - $prompt | Out-File -FilePath $promptFile -Encoding utf8 - - Write-Host "Prompt saved to: $promptFile" -ForegroundColor Gray + # Output structured context that can be used by the agent + Write-Host "╔═══════════════════════════════════════════════════════════╗" -ForegroundColor Green + Write-Host "║ ENVIRONMENT READY ║" -ForegroundColor Green + Write-Host "╚═══════════════════════════════════════════════════════════╝" -ForegroundColor Green Write-Host "" - Write-Host "Run the following command to start the review:" -ForegroundColor Yellow + Write-Host "PR Review Context:" -ForegroundColor Cyan + Write-Host " PR_NUMBER: $PRNumber" -ForegroundColor White + Write-Host " PLATFORM: $Platform" -ForegroundColor White + Write-Host " STATE_FILE: CustomAgentLogsTmp/PRState/pr-$PRNumber.md" -ForegroundColor White + Write-Host " PLAN_TEMPLATE: $planTemplatePath" -ForegroundColor White + Write-Host " CURRENT_BRANCH: $(git branch --show-current)" -ForegroundColor White + Write-Host " PR_TITLE: $($prInfo.title)" -ForegroundColor White Write-Host "" - Write-Host " ghcs `"Review PR #$PRNumber using the pr agent. Platform: $Platform`"" -ForegroundColor Cyan + Write-Host "─────────────────────────────────────────────────────────────" -ForegroundColor DarkGray Write-Host "" - Write-Host "Or copy this full prompt:" -ForegroundColor Yellow + Write-Host "Next step - Tell Copilot CLI:" -ForegroundColor Yellow Write-Host "" - Write-Host $prompt -ForegroundColor White + Write-Host " Review PR #$PRNumber using the pr agent. Platform: $Platform" -ForegroundColor Cyan + Write-Host "" + Write-Host "Or for the full workflow:" -ForegroundColor Yellow + Write-Host "" + Write-Host " @pr Review PR #$PRNumber following .github/agents/pr/PLAN-TEMPLATE.md" -ForegroundColor Cyan Write-Host "" - Write-Host "═══════════════════════════════════════════════════════════" -ForegroundColor DarkGray - - # Optionally, try to invoke directly (may not work in all environments) - # Uncomment below to auto-launch: - # ghcs $prompt } Write-Host "" From d67da75e85d3a18358a4074f48a2ab5ee52c3d4f Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 14:12:13 -0600 Subject: [PATCH 07/16] Update Review-PR.ps1 to invoke Copilot CLI directly Now that Copilot CLI supports command-line arguments: - -i, --interactive : Start interactive with prompt - -p, --prompt : Non-interactive mode - --agent : Specify custom agent - --allow-all: Enable all permissions The script now: - Validates prerequisites (gh CLI AND copilot CLI) - Checks out PR branch (unless -SkipCheckout) - Creates state directory - Invokes Copilot CLI with the pr agent and a structured prompt New parameters: - -NoInteractive: Run in non-interactive mode (exits after completion) - Default is interactive mode (-i) which keeps session open Usage: # Interactive mode (default) pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 # Non-interactive mode pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -NoInteractive # Dry run to see what would be invoked pwsh .github/scripts/Review-PR.ps1 -PRNumber 33687 -DryRun --- .github/scripts/Review-PR.ps1 | 113 ++++++++++++++++++++++++++-------- 1 file changed, 88 insertions(+), 25 deletions(-) diff --git a/.github/scripts/Review-PR.ps1 b/.github/scripts/Review-PR.ps1 index 0a7db7848148..e46313090123 100644 --- a/.github/scripts/Review-PR.ps1 +++ b/.github/scripts/Review-PR.ps1 @@ -1,9 +1,9 @@ <# .SYNOPSIS - Prepares the environment for a PR review using the PR Agent workflow. + Runs a PR review using Copilot CLI and the PR Agent workflow. .DESCRIPTION - This script prepares the environment for a comprehensive 5-phase PR review: + This script invokes Copilot CLI to perform a comprehensive 5-phase PR review: Phase 1: Pre-Flight - Context gathering Phase 2: Tests - Verify test existence @@ -11,13 +11,11 @@ Phase 4: Fix - Multi-model exploration of alternatives Phase 5: Report - Final recommendation - Since Copilot CLI is interactive, this script: + The script: - Validates prerequisites (gh CLI, PR exists) - Optionally checks out the PR branch - Creates the state directory - - Outputs structured context for the agent to use - - Run this script, then ask Copilot CLI to "review PR #XXXXX using the pr agent" + - Invokes Copilot CLI with the pr agent .PARAMETER PRNumber The GitHub PR number to review (e.g., 33687) @@ -29,24 +27,34 @@ .PARAMETER SkipCheckout If specified, skips checking out the PR branch (useful if already on the branch) +.PARAMETER Interactive + If specified, starts Copilot in interactive mode with the prompt (default). + Use -NoInteractive for non-interactive mode that exits after completion. + +.PARAMETER NoInteractive + If specified, runs in non-interactive mode (exits after completion). + Requires --allow-all for tool permissions. + .PARAMETER DryRun If specified, shows what would be done without making changes .EXAMPLE .\Review-PR.ps1 -PRNumber 33687 - Prepares to review PR #33687 using the default platform (android) + Reviews PR #33687 interactively using the default platform (android) .EXAMPLE .\Review-PR.ps1 -PRNumber 33687 -Platform ios -SkipCheckout - Prepares to review PR #33687 on iOS without checking out + Reviews PR #33687 on iOS without checking out, in interactive mode + +.EXAMPLE + .\Review-PR.ps1 -PRNumber 33687 -NoInteractive + Reviews PR #33687 in non-interactive mode (exits after completion) .NOTES Prerequisites: - GitHub CLI (gh) installed and authenticated + - Copilot CLI (copilot) installed - For testing: Appropriate platform tools (Appium, emulators, etc.) - - After running this script, tell Copilot CLI: - "Review PR #XXXXX using the pr agent. Platform: " #> [CmdletBinding()] @@ -61,6 +69,9 @@ param( [Parameter(Mandatory = $false)] [switch]$SkipCheckout, + [Parameter(Mandatory = $false)] + [switch]$NoInteractive, + [Parameter(Mandatory = $false)] [switch]$DryRun ) @@ -94,8 +105,13 @@ if (-not $ghVersion) { } Write-Host " ✅ GitHub CLI: $ghVersion" -ForegroundColor Green -# Check Copilot CLI - informational only -Write-Host " â„šī¸ Note: After this script, tell Copilot CLI to review the PR" -ForegroundColor Gray +# Check Copilot CLI +$copilotVersion = copilot --version 2>$null +if (-not $copilotVersion) { + Write-Error "Copilot CLI is not installed. Install with: npm install -g @github/copilot" + exit 1 +} +Write-Host " ✅ Copilot CLI: $copilotVersion" -ForegroundColor Green # Check PR exists Write-Host " 🔍 Verifying PR #$PRNumber exists..." -ForegroundColor Gray @@ -139,21 +155,46 @@ if (-not (Test-Path $stateDir)) { # Step 4: Build the prompt for Copilot CLI $planTemplatePath = ".github/agents/pr/PLAN-TEMPLATE.md" +$prompt = @" +Review PR #$PRNumber using the pr agent workflow. + +**Platform for testing:** $Platform + +**Instructions:** +1. Read the plan template at `$planTemplatePath` for the 5-phase workflow +2. Read `.github/agents/pr.md` for Phases 1-3 instructions +3. Follow ALL critical rules, especially: + - STOP on environment blockers and ask before continuing + - Use task agent for Gate verification + - Run multi-model try-fix in Phase 4 + +**Start with Phase 1: Pre-Flight** +- Create state file: CustomAgentLogsTmp/PRState/pr-$PRNumber.md +- Gather context from PR #$PRNumber +- Proceed through all 5 phases + +Begin the review now. +"@ + Write-Host "" Write-Host "═══════════════════════════════════════════════════════════" -ForegroundColor DarkGray Write-Host "" if ($DryRun) { - Write-Host "[DRY RUN] Would prepare environment with:" -ForegroundColor Magenta + Write-Host "[DRY RUN] Would invoke Copilot CLI with:" -ForegroundColor Magenta + Write-Host "" + Write-Host " Agent: pr" -ForegroundColor Gray + Write-Host " Mode: $(if ($NoInteractive) { 'Non-interactive (-p)' } else { 'Interactive (-i)' })" -ForegroundColor Gray Write-Host " PR: #$PRNumber" -ForegroundColor Gray Write-Host " Platform: $Platform" -ForegroundColor Gray - Write-Host " State file: CustomAgentLogsTmp/PRState/pr-$PRNumber.md" -ForegroundColor Gray + Write-Host "" + Write-Host "Prompt:" -ForegroundColor Gray + Write-Host $prompt -ForegroundColor DarkGray Write-Host "" Write-Host "To run for real, remove the -DryRun flag" -ForegroundColor Yellow } else { - # Output structured context that can be used by the agent Write-Host "╔═══════════════════════════════════════════════════════════╗" -ForegroundColor Green - Write-Host "║ ENVIRONMENT READY ║" -ForegroundColor Green + Write-Host "║ LAUNCHING COPILOT CLI ║" -ForegroundColor Green Write-Host "╚═══════════════════════════════════════════════════════════╝" -ForegroundColor Green Write-Host "" Write-Host "PR Review Context:" -ForegroundColor Cyan @@ -163,20 +204,42 @@ if ($DryRun) { Write-Host " PLAN_TEMPLATE: $planTemplatePath" -ForegroundColor White Write-Host " CURRENT_BRANCH: $(git branch --show-current)" -ForegroundColor White Write-Host " PR_TITLE: $($prInfo.title)" -ForegroundColor White + Write-Host " MODE: $(if ($NoInteractive) { 'Non-interactive' } else { 'Interactive' })" -ForegroundColor White Write-Host "" Write-Host "─────────────────────────────────────────────────────────────" -ForegroundColor DarkGray Write-Host "" - Write-Host "Next step - Tell Copilot CLI:" -ForegroundColor Yellow - Write-Host "" - Write-Host " Review PR #$PRNumber using the pr agent. Platform: $Platform" -ForegroundColor Cyan - Write-Host "" - Write-Host "Or for the full workflow:" -ForegroundColor Yellow + + # Build the copilot command arguments + $copilotArgs = @( + "--agent", "pr" + ) + + if ($NoInteractive) { + # Non-interactive mode: -p with --allow-all + $copilotArgs += @("-p", $prompt, "--allow-all") + } else { + # Interactive mode: -i to start with prompt + $copilotArgs += @("-i", $prompt) + } + + Write-Host "🚀 Starting Copilot CLI..." -ForegroundColor Yellow Write-Host "" - Write-Host " @pr Review PR #$PRNumber following .github/agents/pr/PLAN-TEMPLATE.md" -ForegroundColor Cyan + + # Invoke Copilot CLI + & copilot @copilotArgs + + $exitCode = $LASTEXITCODE + Write-Host "" + Write-Host "═══════════════════════════════════════════════════════════" -ForegroundColor DarkGray + if ($exitCode -eq 0) { + Write-Host "✅ Copilot CLI completed successfully" -ForegroundColor Green + } else { + Write-Host "âš ī¸ Copilot CLI exited with code: $exitCode" -ForegroundColor Yellow + } } Write-Host "" -Write-Host "📝 State file will be at: CustomAgentLogsTmp/PRState/pr-$PRNumber.md" -ForegroundColor Gray -Write-Host "📋 Plan template at: $planTemplatePath" -ForegroundColor Gray +Write-Host "📝 State file: CustomAgentLogsTmp/PRState/pr-$PRNumber.md" -ForegroundColor Gray +Write-Host "📋 Plan template: $planTemplatePath" -ForegroundColor Gray Write-Host "" From ed74c574a592143d8d7c561a51691518330b8734 Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 15:31:35 -0600 Subject: [PATCH 08/16] Add 'Do NOT Switch Branches' rule to pr agent Problem: During PR review, the pr agent was checking out branches even when invoked with -SkipCheckout, causing: - Loss of uncommitted changes (stashed) - Confusion about which code is being reviewed - Potential loss of work in progress Solution: Added explicit rule prohibiting branch switching: - Never run: git checkout, git switch, gh pr checkout, git stash - Work on current branch as-is - Use git diff or gh pr diff to see PR changes Also fixed variable expansion in Review-PR.ps1 prompt (double backticks). --- .github/agents/pr.md | 20 ++++++++++++++++++++ .github/agents/pr/PLAN-TEMPLATE.md | 14 ++++++++++---- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/.github/agents/pr.md b/.github/agents/pr.md index 3daf4002f594..c4840d25ccd2 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -64,6 +64,26 @@ When creating state files, use the EXACT format from this document: - **Do NOT deviate** from documented structure - Downstream scripts depend on exact formatting (regex patterns expect specific structure) +### 🚨 CRITICAL: Do NOT Switch Branches + +**Never checkout or switch branches during a PR review.** + +The user or script has already set up the correct branch. Switching branches: +- Loses uncommitted changes +- Creates confusion about which code is being reviewed +- Can stash/lose work in progress + +**What to do instead:** +- Work on the **current branch** as-is +- If you need to see the PR's changes, use `git diff` or `gh pr diff` +- If you need to see specific file versions, use `git show :` + +**Never run these commands:** +- ❌ `git checkout ` +- ❌ `git switch ` +- ❌ `gh pr checkout ` +- ❌ `git stash` (to switch branches) + ### 🚨 CRITICAL: Use Skills' Scripts - Don't Bypass When a skill provides a PowerShell script: diff --git a/.github/agents/pr/PLAN-TEMPLATE.md b/.github/agents/pr/PLAN-TEMPLATE.md index d63e26088806..0796378020af 100644 --- a/.github/agents/pr/PLAN-TEMPLATE.md +++ b/.github/agents/pr/PLAN-TEMPLATE.md @@ -48,12 +48,18 @@ If ANY phase cannot complete due to missing tools/drivers/devices: Which would you like me to do? ``` -### Rule 2: Gate via Task Agent Only +### Rule 2: Do NOT Switch Branches +**Never checkout or switch branches during a PR review.** +- Work on the **current branch** as-is +- Use `git diff` or `gh pr diff` to see changes +- Never run: `git checkout`, `git switch`, `gh pr checkout`, `git stash` + +### Rule 3: Gate via Task Agent Only Gate verification MUST run as a `task` agent invocation, NOT inline commands. - The script does TWO test runs (revert fix → test, restore fix → test) - Running inline allows fabrication of results -### Rule 3: Multi-Model try-fix (Phase 4) +### Rule 4: Multi-Model try-fix (Phase 4) Run try-fix with 5 different models SEQUENTIALLY: 1. `claude-sonnet-4.5` 2. `claude-opus-4.5` @@ -63,12 +69,12 @@ Run try-fix with 5 different models SEQUENTIALLY: Then cross-pollinate: Share ALL results with ALL models, ask for NEW ideas, repeat until exhaustion. -### Rule 4: Follow Templates Exactly +### Rule 5: Follow Templates Exactly - Do NOT add attributes like `open` to `
` tags - Do NOT "improve" template formats - Downstream scripts depend on exact regex patterns -### Rule 5: Use Skills' Scripts +### Rule 6: Use Skills' Scripts - Run the provided PowerShell scripts, don't bypass with manual commands - If script fails, fix inputs rather than using manual `gh` commands From 47a8c29a7a4b9d31e933fcf47329e217ba7837fa Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 18:20:53 -0600 Subject: [PATCH 09/16] Improve PR review script: platform selection, logging, branch protection - Make Platform parameter optional - agent determines appropriate platform - Add platform selection guidance to pr.md, post-gate.md, PLAN-TEMPLATE.md - Enable streaming output (--stream on) for real-time feedback - Add logging to CustomAgentLogsTmp/PRState/{PR}/copilot-logs/ - Save session markdown in non-interactive mode (--share) - Block branch-switching commands (--deny-tool for git checkout/switch/stash) - Key learning: Don't test on platforms not affected by the bug --- .github/agents/pr.md | 31 +++++++++++++++++- .github/agents/pr/PLAN-TEMPLATE.md | 6 +++- .github/agents/pr/post-gate.md | 4 +-- .github/scripts/Review-PR.ps1 | 51 ++++++++++++++++++++++++++---- 4 files changed, 81 insertions(+), 11 deletions(-) diff --git a/.github/agents/pr.md b/.github/agents/pr.md index c4840d25ccd2..5f113c64f267 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -553,13 +553,42 @@ Tests were already verified to FAIL in Phase 2. Gate is a confirmation step: **If starting from a PR (fix exists):** Use full verification mode - tests should FAIL without fix, PASS with fix. +### Platform Selection for Gate + +**🚨 CRITICAL: Choose a platform that is BOTH affected by the bug AND available on the current host.** + +**Step 1: Identify affected platforms** from Pre-Flight: +- Check the "Platforms Affected" checkboxes in the state file +- Check issue labels (e.g., `platform/iOS`, `platform/Android`) +- Check which platform-specific files the PR modifies + +**Step 2: Match to available platforms on current host:** + +| Host OS | Available Platforms | +|---------|---------------------| +| Windows | Android, Windows | +| macOS | Android, iOS, MacCatalyst | + +**Step 3: Select the best match:** +1. Pick a platform that IS affected by the bug +2. That IS available on the current host +3. Prefer the platform most directly impacted by the PR's code changes + +**Example decisions:** +- Bug affects iOS/Windows/MacCatalyst, host is Windows → Test on **Windows** +- Bug affects iOS only, host is Windows → **STOP** - cannot test (ask user) +- Bug affects Android only → Test on **Android** (works on any host) +- Bug affects all platforms → Pick based on host (Windows on Windows, iOS on macOS) + +**âš ī¸ Do NOT test on a platform that isn't affected by the bug** - the test will pass regardless of whether the fix works. + **🚨 MUST invoke as a task agent** to prevent command substitution: ```markdown Invoke the `task` agent with agent_type: "task" and this prompt: "Invoke the verify-tests-fail-without-fix skill for this PR: -- Platform: android (or ios) +- Platform: [selected platform from Platform Selection above] - TestFilter: 'IssueXXXXX' - RequireFullVerification: true diff --git a/.github/agents/pr/PLAN-TEMPLATE.md b/.github/agents/pr/PLAN-TEMPLATE.md index 0796378020af..f9278d470a30 100644 --- a/.github/agents/pr/PLAN-TEMPLATE.md +++ b/.github/agents/pr/PLAN-TEMPLATE.md @@ -108,10 +108,14 @@ Then cross-pollinate: Share ALL results with ALL models, ask for NEW ideas, repe ### Phase 3: Gate - Verify Tests Catch Bug ⛔ **🚨 BLOCKER PHASE - Cannot continue if Gate fails** +- [ ] **Select platform:** Choose platform that is BOTH affected by the bug AND available on host + - Check "Platforms Affected" from Pre-Flight + - Match to host: Windows → (Android, Windows), macOS → (Android, iOS, MacCatalyst) + - If no affected platform is available → STOP and ask user - [ ] Invoke verification via **task agent** (NOT inline): ``` Invoke task agent with: "Run verify-tests-fail-without-fix skill - - Platform: [android/ios/windows] + - Platform: [selected platform] - TestFilter: 'IssueXXXXX' - RequireFullVerification: true Report: Did tests FAIL without fix? PASS with fix? Final status?" diff --git a/.github/agents/pr/post-gate.md b/.github/agents/pr/post-gate.md index 3d23fe3bfc4a..59901b4c2111 100644 --- a/.github/agents/pr/post-gate.md +++ b/.github/agents/pr/post-gate.md @@ -82,8 +82,8 @@ Run the `try-fix` skill **5 times sequentially**, once with each model: ``` Invoke the try-fix skill for PR #XXXXX: - problem: [Description of the bug from issue/PR - what's broken and expected behavior] -- platform: [android/ios] -- test_command: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform [android/ios] -TestFilter "IssueXXXXX" +- platform: [Use platform selected in Gate phase - must be affected by the bug AND available on host] +- test_command: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform [same platform] -TestFilter "IssueXXXXX" - target_files: - src/[area]/[likely-affected-file-1].cs - src/[area]/[likely-affected-file-2].cs diff --git a/.github/scripts/Review-PR.ps1 b/.github/scripts/Review-PR.ps1 index e46313090123..2c15c166c390 100644 --- a/.github/scripts/Review-PR.ps1 +++ b/.github/scripts/Review-PR.ps1 @@ -64,7 +64,7 @@ param( [Parameter(Mandatory = $false)] [ValidateSet('android', 'ios', 'windows', 'maccatalyst')] - [string]$Platform = 'android', + [string]$Platform, # Optional - agent will determine appropriate platform if not specified [Parameter(Mandatory = $false)] [switch]$SkipCheckout, @@ -90,7 +90,11 @@ Write-Host "╔═════════════════════ Write-Host "║ PR Review with Copilot CLI ║" -ForegroundColor Cyan Write-Host "â• â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•Ŗ" -ForegroundColor Cyan Write-Host "║ PR: #$PRNumber ║" -ForegroundColor Cyan -Write-Host "║ Platform: $Platform ║" -ForegroundColor Cyan +if ($Platform) { + Write-Host "║ Platform: $Platform ║" -ForegroundColor Cyan +} else { + Write-Host "║ Platform: (agent will determine) ║" -ForegroundColor Cyan +} Write-Host "╚═══════════════════════════════════════════════════════════╝" -ForegroundColor Cyan Write-Host "" @@ -155,10 +159,17 @@ if (-not (Test-Path $stateDir)) { # Step 4: Build the prompt for Copilot CLI $planTemplatePath = ".github/agents/pr/PLAN-TEMPLATE.md" +# Build platform instruction +$platformInstruction = if ($Platform) { + "**Platform for testing:** $Platform" +} else { + "**Platform for testing:** Determine the appropriate platform(s) based on the PR's affected code paths and the current host OS." +} + $prompt = @" Review PR #$PRNumber using the pr agent workflow. -**Platform for testing:** $Platform +$platformInstruction **Instructions:** 1. Read the plan template at `$planTemplatePath` for the 5-phase workflow @@ -186,7 +197,7 @@ if ($DryRun) { Write-Host " Agent: pr" -ForegroundColor Gray Write-Host " Mode: $(if ($NoInteractive) { 'Non-interactive (-p)' } else { 'Interactive (-i)' })" -ForegroundColor Gray Write-Host " PR: #$PRNumber" -ForegroundColor Gray - Write-Host " Platform: $Platform" -ForegroundColor Gray + Write-Host " Platform: $(if ($Platform) { $Platform } else { '(agent will determine)' })" -ForegroundColor Gray Write-Host "" Write-Host "Prompt:" -ForegroundColor Gray Write-Host $prompt -ForegroundColor DarkGray @@ -199,7 +210,7 @@ if ($DryRun) { Write-Host "" Write-Host "PR Review Context:" -ForegroundColor Cyan Write-Host " PR_NUMBER: $PRNumber" -ForegroundColor White - Write-Host " PLATFORM: $Platform" -ForegroundColor White + Write-Host " PLATFORM: $(if ($Platform) { $Platform } else { '(agent will determine)' })" -ForegroundColor White Write-Host " STATE_FILE: CustomAgentLogsTmp/PRState/pr-$PRNumber.md" -ForegroundColor White Write-Host " PLAN_TEMPLATE: $planTemplatePath" -ForegroundColor White Write-Host " CURRENT_BRANCH: $(git branch --show-current)" -ForegroundColor White @@ -211,12 +222,32 @@ if ($DryRun) { # Build the copilot command arguments $copilotArgs = @( - "--agent", "pr" + "--agent", "pr", + "--stream", "on" # Enable streaming for real-time output + ) + + # Deny branch-switching commands - agent should work on current branch + $copilotArgs += @( + "--deny-tool", "shell(git checkout*)", + "--deny-tool", "shell(git switch*)", + "--deny-tool", "shell(git stash*)", + "--deny-tool", "shell(gh pr checkout*)" ) + # Create log directory for this PR + $prLogDir = Join-Path $RepoRoot "CustomAgentLogsTmp/PRState/$PRNumber/copilot-logs" + if (-not (Test-Path $prLogDir)) { + New-Item -ItemType Directory -Path $prLogDir -Force | Out-Null + } + + # Add logging options + $copilotArgs += @("--log-dir", $prLogDir, "--log-level", "info") + if ($NoInteractive) { # Non-interactive mode: -p with --allow-all - $copilotArgs += @("-p", $prompt, "--allow-all") + # Also save session to markdown for review + $sessionFile = Join-Path $prLogDir "session-$(Get-Date -Format 'yyyyMMdd-HHmmss').md" + $copilotArgs += @("-p", $prompt, "--allow-all", "--share", $sessionFile) } else { # Interactive mode: -i to start with prompt $copilotArgs += @("-i", $prompt) @@ -242,4 +273,10 @@ if ($DryRun) { Write-Host "" Write-Host "📝 State file: CustomAgentLogsTmp/PRState/pr-$PRNumber.md" -ForegroundColor Gray Write-Host "📋 Plan template: $planTemplatePath" -ForegroundColor Gray +if (-not $DryRun) { + Write-Host "📁 Copilot logs: CustomAgentLogsTmp/PRState/$PRNumber/copilot-logs/" -ForegroundColor Gray + if ($NoInteractive) { + Write-Host "📄 Session markdown: $sessionFile" -ForegroundColor Gray + } +} Write-Host "" From e9f2afd8ec48d4e94900a948ba16e75669739082 Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 19:20:30 -0600 Subject: [PATCH 10/16] Fix backtick escaping in Review-PR.ps1 prompt --- .github/scripts/Review-PR.ps1 | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/scripts/Review-PR.ps1 b/.github/scripts/Review-PR.ps1 index 2c15c166c390..c5abf0082a90 100644 --- a/.github/scripts/Review-PR.ps1 +++ b/.github/scripts/Review-PR.ps1 @@ -172,8 +172,8 @@ Review PR #$PRNumber using the pr agent workflow. $platformInstruction **Instructions:** -1. Read the plan template at `$planTemplatePath` for the 5-phase workflow -2. Read `.github/agents/pr.md` for Phases 1-3 instructions +1. Read the plan template at ``$planTemplatePath`` for the 5-phase workflow +2. Read ``.github/agents/pr.md`` for Phases 1-3 instructions 3. Follow ALL critical rules, especially: - STOP on environment blockers and ask before continuing - Use task agent for Gate verification From f8380f49c8fd829d5958bace54f5eb71b1245b8c Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 19:26:24 -0600 Subject: [PATCH 11/16] Remove ineffective deny-tool, strengthen no-git-commands rule --- .github/agents/pr.md | 25 ++++++++++++------------- .github/agents/pr/PLAN-TEMPLATE.md | 11 ++++++----- .github/scripts/Review-PR.ps1 | 9 ++------- 3 files changed, 20 insertions(+), 25 deletions(-) diff --git a/.github/agents/pr.md b/.github/agents/pr.md index 5f113c64f267..9f48e3bebfc7 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -64,25 +64,24 @@ When creating state files, use the EXACT format from this document: - **Do NOT deviate** from documented structure - Downstream scripts depend on exact formatting (regex patterns expect specific structure) -### 🚨 CRITICAL: Do NOT Switch Branches +### 🚨 CRITICAL: No Direct Git Commands -**Never checkout or switch branches during a PR review.** +**Never run git commands directly during a PR review.** -The user or script has already set up the correct branch. Switching branches: -- Loses uncommitted changes -- Creates confusion about which code is being reviewed -- Can stash/lose work in progress +The user or script has already set up the correct branch. All git operations are handled by the PowerShell scripts (verify-tests-fail.ps1, try-fix, etc.). **What to do instead:** -- Work on the **current branch** as-is -- If you need to see the PR's changes, use `git diff` or `gh pr diff` -- If you need to see specific file versions, use `git show :` +- Use `gh pr diff` or `gh pr view` to see PR info (read-only GitHub CLI) +- Use `gh pr diff --name-only` to list changed files +- Let scripts handle all file manipulation internally **Never run these commands:** -- ❌ `git checkout ` -- ❌ `git switch ` -- ❌ `gh pr checkout ` -- ❌ `git stash` (to switch branches) +- ❌ `git checkout` (any form) +- ❌ `git switch` +- ❌ `git stash` +- ❌ `git reset` +- ❌ `git revert` +- ❌ `gh pr checkout` ### 🚨 CRITICAL: Use Skills' Scripts - Don't Bypass diff --git a/.github/agents/pr/PLAN-TEMPLATE.md b/.github/agents/pr/PLAN-TEMPLATE.md index f9278d470a30..49e4cfd724fe 100644 --- a/.github/agents/pr/PLAN-TEMPLATE.md +++ b/.github/agents/pr/PLAN-TEMPLATE.md @@ -48,11 +48,12 @@ If ANY phase cannot complete due to missing tools/drivers/devices: Which would you like me to do? ``` -### Rule 2: Do NOT Switch Branches -**Never checkout or switch branches during a PR review.** -- Work on the **current branch** as-is -- Use `git diff` or `gh pr diff` to see changes -- Never run: `git checkout`, `git switch`, `gh pr checkout`, `git stash` +### Rule 2: No Direct Git Commands +**Never run git commands directly during a PR review.** +- All git operations are handled by the PowerShell scripts (verify-tests-fail.ps1, etc.) +- Use `gh pr diff` or `gh pr view` to see PR info (read-only) +- ❌ Never run: `git checkout`, `git switch`, `git stash`, `git reset`, `git revert` +- ✅ Scripts handle file manipulation internally ### Rule 3: Gate via Task Agent Only Gate verification MUST run as a `task` agent invocation, NOT inline commands. diff --git a/.github/scripts/Review-PR.ps1 b/.github/scripts/Review-PR.ps1 index c5abf0082a90..5bf5c5c3c5e3 100644 --- a/.github/scripts/Review-PR.ps1 +++ b/.github/scripts/Review-PR.ps1 @@ -226,13 +226,8 @@ if ($DryRun) { "--stream", "on" # Enable streaming for real-time output ) - # Deny branch-switching commands - agent should work on current branch - $copilotArgs += @( - "--deny-tool", "shell(git checkout*)", - "--deny-tool", "shell(git switch*)", - "--deny-tool", "shell(git stash*)", - "--deny-tool", "shell(gh pr checkout*)" - ) + # NOTE: --deny-tool does NOT work with --allow-all (allow-all takes precedence) + # Branch switching prevention relies on agent instructions in pr.md only # Create log directory for this PR $prLogDir = Join-Path $RepoRoot "CustomAgentLogsTmp/PRState/$PRNumber/copilot-logs" From fe10f035207e60ce061236693dc5796700df9333 Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 22:12:26 -0600 Subject: [PATCH 12/16] Strengthen cross-pollination requirements - prevent shortcuts - Add explicit warning that explore/glob is NOT valid for cross-pollination - Require Cross-Pollination table with responses from ALL 5 models - Add invocation template showing how to call each model - Add validation that cross-pollination table must exist before marking complete - Add new common mistakes for using shortcuts instead of model invocations --- .github/agents/pr/PLAN-TEMPLATE.md | 16 ++++--- .github/agents/pr/post-gate.md | 72 +++++++++++++++++++++++++----- 2 files changed, 72 insertions(+), 16 deletions(-) diff --git a/.github/agents/pr/PLAN-TEMPLATE.md b/.github/agents/pr/PLAN-TEMPLATE.md index 49e4cfd724fe..e7c763b7aaa4 100644 --- a/.github/agents/pr/PLAN-TEMPLATE.md +++ b/.github/agents/pr/PLAN-TEMPLATE.md @@ -141,14 +141,20 @@ Then cross-pollinate: Share ALL results with ALL models, ask for NEW ideas, repe - [ ] ⛔ **If environment blocker on any attempt**: STOP, report, ask user - [ ] Record each: approach, test result, files changed, failure analysis -**Round 2+: Cross-Pollination Loop** -- [ ] Share bounded summary of ALL attempts with ALL 5 models -- [ ] Ask each: "Any NEW fix ideas not yet tried?" +**Round 2+: Cross-Pollination Loop (MANDATORY - DO NOT SKIP)** +- [ ] **Invoke EACH model via task agent** (not explore/glob): + - [ ] claude-sonnet-4.5: "Any NEW fix ideas?" → Response: ___ + - [ ] claude-opus-4.5: "Any NEW fix ideas?" → Response: ___ + - [ ] gpt-5.2: "Any NEW fix ideas?" → Response: ___ + - [ ] gpt-5.2-codex: "Any NEW fix ideas?" → Response: ___ + - [ ] gemini-3-pro-preview: "Any NEW fix ideas?" → Response: ___ +- [ ] Record responses in Cross-Pollination table in state file - [ ] For each new idea: Run try-fix with that model -- [ ] Repeat until NO model proposes new ideas (max 3 rounds) +- [ ] Repeat until ALL 5 models say "NO NEW IDEAS" (max 3 rounds) **Completion:** -- [ ] Mark Exhausted: Yes (all models confirm no new ideas) +- [ ] Cross-Pollination table exists with all 5 model responses +- [ ] Mark Exhausted: Yes (all models confirmed no new ideas via invocation) - [ ] Compare passing candidates with PR's fix - [ ] Select best fix (test results → simplicity → robustness → style) - [ ] Update state file: Fix → ✅ COMPLETE diff --git a/.github/agents/pr/post-gate.md b/.github/agents/pr/post-gate.md index 59901b4c2111..bf25c38818dc 100644 --- a/.github/agents/pr/post-gate.md +++ b/.github/agents/pr/post-gate.md @@ -96,7 +96,14 @@ Generate ONE independent fix idea. Review the PR's fix first to ensure your appr #### Round 2+: Cross-Pollination Loop -After Round 1 completes, share ALL results with ALL 5 models and ask for NEW ideas: +**🚨 MANDATORY - DO NOT SKIP THIS STEP** + +After Round 1 completes, you MUST invoke each of the 5 models to ask for new ideas. This is NOT optional. + +**❌ WRONG**: Using `explore` or `glob` to "check if approaches are exhausted" +**❌ WRONG**: Declaring exhaustion without invoking each model +**❌ WRONG**: Assuming "comprehensive coverage" means no new ideas exist +**✅ CORRECT**: Invoke EACH model via task agent and ask explicitly for new ideas **âš ī¸ Summary Size Limit**: To stay within Copilot CLI's 30,000 char prompt limit, keep attempt summaries bounded: - Max 3-4 bullet points per attempt @@ -105,9 +112,11 @@ After Round 1 completes, share ALL results with ALL 5 models and ask for NEW ide ``` ┌─────────────────────────────────────────────────────────────┐ -│ Cross-Pollination Loop │ +│ Cross-Pollination Loop - MANDATORY │ ├─────────────────────────────────────────────────────────────┤ │ │ +│ 🚨 You MUST invoke each model - no shortcuts allowed │ +│ │ │ LOOP until no new ideas: │ │ │ │ 1. Compile BOUNDED summary of ALL try-fix attempts: │ @@ -115,23 +124,50 @@ After Round 1 completes, share ALL results with ALL 5 models and ask for NEW ide │ - Pass/Fail result │ │ - Key learning (1 line - why it worked or failed) │ │ │ -│ 2. Share summary with ALL 5 models, ask each: │ -│ "Given these results, do you have any NEW fix ideas │ -│ that haven't been tried? If yes, describe briefly." │ +│ 2. Invoke EACH of the 5 models via task agent: │ +│ prompt: "Review these fix attempts for PR #XXXXX: │ +│ [summary of all attempts] │ +│ Do you have any NEW fix ideas not tried? │ +│ Reply: 'NEW IDEA: [description]' or │ +│ 'NO NEW IDEAS'" │ +│ │ +│ 3. Record each model's response in state file: │ +│ | Model | Response | │ +│ | claude-sonnet-4.5 | NO NEW IDEAS | │ +│ | claude-opus-4.5 | NEW IDEA: ... | │ +│ | ... | ... | │ │ │ -│ 3. For each model with a new idea: │ +│ 4. For each model with a new idea: │ │ → Run try-fix with that model (SEQUENTIAL) │ │ → Wait for completion before next │ │ │ -│ 4. If ANY new ideas were tested → repeat loop │ -│ If NO new ideas from ANY model → exit loop │ +│ 5. If ANY new ideas were tested → repeat loop │ +│ If ALL 5 models said "NO NEW IDEAS" → exit loop │ │ │ │ MAX ROUNDS: 3 (to prevent infinite loops) │ │ │ └─────────────────────────────────────────────────────────────┘ ``` -**Coordination loop stop condition**: Exit when a full round completes and NO model proposes any new fix ideas. This is separate from the per-invocation `Exhausted` flag that each `try-fix` run sets in the state file. +**Cross-Pollination Invocation Template:** +``` +Invoke task agent with model parameter: + agent_type: "task" + model: "[model-name]" + prompt: "Review PR #XXXXX fix attempts: + - Attempt 1 (claude-sonnet-4.5): [approach] - ✅/❌ + - Attempt 2 (claude-opus-4.5): [approach] - ✅/❌ + - ... + + Do you have any NEW fix ideas that haven't been tried? + Reply with EXACTLY one of: + - 'NEW IDEA: [brief description]' + - 'NO NEW IDEAS' + + Only propose ideas that are fundamentally different from above." +``` + +**Coordination loop stop condition**: Exit when ALL 5 models explicitly respond "NO NEW IDEAS" in the same round. This requires actual task agent invocations - not code exploration or assumptions. #### try-fix Invocation Details @@ -220,7 +256,16 @@ Update the state file: **Before marking ✅ COMPLETE, verify state file contains:** - [ ] Round 1 completed: All 5 models ran try-fix -- [ ] Cross-pollination completed: All 5 models confirmed "no new ideas" +- [ ] **Cross-pollination table exists** with responses from ALL 5 models: + ``` + | Model | Round 2 Response | + |-------|------------------| + | claude-sonnet-4.5 | NO NEW IDEAS | + | claude-opus-4.5 | NO NEW IDEAS | + | gpt-5.2 | NO NEW IDEAS | + | gpt-5.2-codex | NO NEW IDEAS | + | gemini-3-pro-preview | NO NEW IDEAS | + ``` - [ ] Fix Candidates table has numbered rows for each try-fix attempt - [ ] Each row has: approach, test result, files changed, notes - [ ] "Exhausted" field set to Yes (all models confirmed no new ideas) @@ -229,6 +274,8 @@ Update the state file: - [ ] No âŗ PENDING markers remain in Fix section - [ ] State file committed +**🚨 If cross-pollination table is missing, you skipped Round 2. Go back and invoke each model.** + --- ## 📋 REPORT: Final Report (Phase 5) @@ -346,8 +393,11 @@ Update all phase statuses to complete. - ❌ **Skipping models in Round 1** - All 5 models must run try-fix before cross-pollination - ❌ **Running try-fix in parallel** - SEQUENTIAL ONLY - they modify same files and use same device - ❌ **Stopping before cross-pollination** - Must share results and check for new ideas +- ❌ **Using explore/glob instead of invoking models** - Cross-pollination requires ACTUAL task agent invocations with each model, not code searches +- ❌ **Assuming "comprehensive coverage" = exhausted** - Only exhausted when all 5 models explicitly say "NO NEW IDEAS" +- ❌ **Not recording cross-pollination responses** - State file must have table showing each model's Round 2 response - ❌ **Not analyzing why fixes failed** - Record the flawed reasoning to help future attempts - ❌ **Selecting a failing fix** - Only select from passing candidates - ❌ **Forgetting to revert between attempts** - Each try-fix must start from broken baseline, end with PR restored -- ❌ **Declaring exhaustion prematurely** - All 5 models must confirm "no new ideas" +- ❌ **Declaring exhaustion prematurely** - All 5 models must confirm "no new ideas" via actual invocation - ❌ **Rushing the report** - Take time to write clear justification From 329dd224976cbc46b5bb96f02591e06721d7d388 Mon Sep 17 00:00:00 2001 From: Shane Date: Sat, 31 Jan 2026 22:15:04 -0600 Subject: [PATCH 13/16] Remove per-attempt exhausted flag from try-fix Cross-pollination exhaustion is determined by pr agent after invoking ALL 5 models. try-fix should NOT set any global exhausted field - it only reports its own attempt result. --- .github/skills/try-fix/SKILL.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/.github/skills/try-fix/SKILL.md b/.github/skills/try-fix/SKILL.md index 0fd1188c7fa2..0e89f53ba9a7 100644 --- a/.github/skills/try-fix/SKILL.md +++ b/.github/skills/try-fix/SKILL.md @@ -343,11 +343,11 @@ Provide structured output to the invoker: [The actual changes made] ``` -**Exhausted:** Yes/No -**Reasoning:** [Why you believe there are/aren't more viable approaches] +**This Attempt's Status:** Done/NeedsRetry +**Reasoning:** [Why this specific approach succeeded or failed] ``` -**Determining Exhaustion:** Set `exhausted=true` when you've tried the same fundamental approach multiple times, all hints have been explored, failure analysis reveals the problem is outside target files, or no new ideas remain. Set `exhausted=false` when this is the first attempt, failure analysis suggests a different approach, hints remain unexplored, or the approach was close but needs refinement. +**Determining Status:** Set `Done` when you've completed testing this approach (whether it passed or failed). Set `NeedsRetry` only if you hit a transient error (network timeout, flaky test) and want to retry the same approach. ### Step 10: Update State File (if provided) @@ -361,15 +361,16 @@ If `state_file` input was provided and file exists: |---|--------|----------|-------------|---------------|-------| | N | try-fix #N | [approach] | ✅ PASS / ❌ FAIL | [files] | [analysis] | -4. **Set exhausted status** based on your determination above -5. **Commit state file:** +4. **Commit state file:** ```bash -git add "$STATE_FILE" && git commit -m "try-fix: attempt #N (exhausted=$EXHAUSTED)" +git add "$STATE_FILE" && git commit -m "try-fix: attempt #N" ``` **If no state file provided:** Skip this step (results returned to invoker only). -**Ownership rule:** try-fix updates its own row AND the exhausted field. Never modify: +**âš ī¸ IMPORTANT: Do NOT set any "Exhausted" field.** Cross-pollination exhaustion is determined by the pr agent after invoking ALL 5 models and confirming none have new ideas. try-fix only reports its own attempt result. + +**Ownership rule:** try-fix updates its own row ONLY. Never modify: - Phase status fields - "Selected Fix" field - Other try-fix rows From 3290db97bddf5135df351178e014f227c9cd2ae6 Mon Sep 17 00:00:00 2001 From: Shane Date: Sun, 1 Feb 2026 07:13:54 -0600 Subject: [PATCH 14/16] Add PR merge and protected branch validation to Review-PR.ps1 - Merge PR into current branch instead of checkout - Fail if on protected branches (main, release/*, net*.0) - Handle fork PRs with separate fetch logic - Replace SkipCheckout with SkipMerge parameter --- .github/scripts/Review-PR.ps1 | 91 +++++++++++++++++++++++++++++------ 1 file changed, 75 insertions(+), 16 deletions(-) diff --git a/.github/scripts/Review-PR.ps1 b/.github/scripts/Review-PR.ps1 index 5bf5c5c3c5e3..250a6eaf20c5 100644 --- a/.github/scripts/Review-PR.ps1 +++ b/.github/scripts/Review-PR.ps1 @@ -13,7 +13,8 @@ The script: - Validates prerequisites (gh CLI, PR exists) - - Optionally checks out the PR branch + - Validates current branch is not protected (main, release/*, net*.0) + - Merges the PR into the current branch (for isolated testing) - Creates the state directory - Invokes Copilot CLI with the pr agent @@ -24,8 +25,8 @@ The platform to use for testing. Default is 'android'. Valid values: android, ios, windows, maccatalyst -.PARAMETER SkipCheckout - If specified, skips checking out the PR branch (useful if already on the branch) +.PARAMETER SkipMerge + If specified, skips merging the PR into the current branch (useful if already merged) .PARAMETER Interactive If specified, starts Copilot in interactive mode with the prompt (default). @@ -43,8 +44,8 @@ Reviews PR #33687 interactively using the default platform (android) .EXAMPLE - .\Review-PR.ps1 -PRNumber 33687 -Platform ios -SkipCheckout - Reviews PR #33687 on iOS without checking out, in interactive mode + .\Review-PR.ps1 -PRNumber 33687 -Platform ios -SkipMerge + Reviews PR #33687 on iOS without merging (assumes already merged), in interactive mode .EXAMPLE .\Review-PR.ps1 -PRNumber 33687 -NoInteractive @@ -67,7 +68,7 @@ param( [string]$Platform, # Optional - agent will determine appropriate platform if not specified [Parameter(Mandatory = $false)] - [switch]$SkipCheckout, + [switch]$SkipMerge, [Parameter(Mandatory = $false)] [switch]$NoInteractive, @@ -127,26 +128,84 @@ if (-not $prInfo) { Write-Host " ✅ PR: $($prInfo.title)" -ForegroundColor Green Write-Host " ✅ State: $($prInfo.state)" -ForegroundColor Green -# Step 2: Checkout PR (unless skipped) -if (-not $SkipCheckout) { +# Step 2: Validate current branch and merge PR +Write-Host "" +$currentBranch = git branch --show-current +Write-Host "📍 Current branch: $currentBranch" -ForegroundColor Yellow + +# Check if on a protected branch (main, release/*, net*.0) +$protectedBranches = @('main', 'master') +$isProtected = $protectedBranches -contains $currentBranch -or + $currentBranch -match '^release/' -or + $currentBranch -match '^net\d+\.0$' + +if ($isProtected) { Write-Host "" - Write-Host "đŸ“Ĩ Checking out PR #$PRNumber..." -ForegroundColor Yellow + Write-Host "╔═══════════════════════════════════════════════════════════╗" -ForegroundColor Red + Write-Host "║ ERROR: Cannot run on protected branch! ║" -ForegroundColor Red + Write-Host "â• â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•â•Ŗ" -ForegroundColor Red + Write-Host "║ Current branch: $currentBranch" -ForegroundColor Red + Write-Host "║ ║" -ForegroundColor Red + Write-Host "║ This script merges the PR into the current branch. ║" -ForegroundColor Red + Write-Host "║ Protected branches: main, release/*, net*.0 ║" -ForegroundColor Red + Write-Host "║ ║" -ForegroundColor Red + Write-Host "║ Please checkout a working branch first: ║" -ForegroundColor Red + Write-Host "║ git checkout -b pr-review-$PRNumber ║" -ForegroundColor Red + Write-Host "╚═══════════════════════════════════════════════════════════╝" -ForegroundColor Red + Write-Host "" + exit 1 +} + +Write-Host " ✅ Branch '$currentBranch' is not protected" -ForegroundColor Green + +# Merge the PR into the current branch (unless skipped) +if (-not $SkipMerge) { + Write-Host "" + Write-Host "🔀 Merging PR #$PRNumber into current branch..." -ForegroundColor Yellow if ($DryRun) { - Write-Host " [DRY RUN] Would run: gh pr checkout $PRNumber" -ForegroundColor Magenta + Write-Host " [DRY RUN] Would fetch and merge PR #$PRNumber" -ForegroundColor Magenta } else { - gh pr checkout $PRNumber + # Fetch the PR ref and merge it + Write-Host " đŸ“Ĩ Fetching PR #$PRNumber..." -ForegroundColor Gray + git fetch origin pull/$PRNumber/head:temp-pr-$PRNumber 2>$null + if ($LASTEXITCODE -ne 0) { + # Try fetching from the PR's head repository (for fork PRs) + $prDetails = gh pr view $PRNumber --json headRepositoryOwner,headRefName 2>$null | ConvertFrom-Json + if ($prDetails) { + $forkOwner = $prDetails.headRepositoryOwner.login + $headRef = $prDetails.headRefName + Write-Host " đŸ“Ĩ PR is from fork: $forkOwner, fetching..." -ForegroundColor Gray + git fetch "https://github.com/$forkOwner/maui.git" "${headRef}:temp-pr-$PRNumber" + if ($LASTEXITCODE -ne 0) { + Write-Error "Failed to fetch PR #$PRNumber from fork" + exit 1 + } + } else { + Write-Error "Failed to fetch PR #$PRNumber" + exit 1 + } + } + + Write-Host " 🔀 Merging into '$currentBranch'..." -ForegroundColor Gray + git merge "temp-pr-$PRNumber" --no-edit if ($LASTEXITCODE -ne 0) { - Write-Error "Failed to checkout PR #$PRNumber" + Write-Host "" + Write-Host "âš ī¸ Merge conflict detected!" -ForegroundColor Red + Write-Host " Please resolve conflicts manually and re-run the script with -SkipMerge" -ForegroundColor Yellow + git merge --abort 2>$null + git branch -D "temp-pr-$PRNumber" 2>$null exit 1 } - Write-Host " ✅ Checked out PR branch" -ForegroundColor Green + + # Clean up temp branch + git branch -D "temp-pr-$PRNumber" 2>$null + + Write-Host " ✅ PR #$PRNumber merged into '$currentBranch'" -ForegroundColor Green } } else { Write-Host "" - Write-Host "â­ī¸ Skipping checkout (using current branch)" -ForegroundColor Yellow - $currentBranch = git branch --show-current - Write-Host " Current branch: $currentBranch" -ForegroundColor Gray + Write-Host "â­ī¸ Skipping merge (assuming PR is already merged)" -ForegroundColor Yellow } # Step 3: Ensure state directory exists From 632bfb7155a40c717a2717c6f4b47958a6f5fe2d Mon Sep 17 00:00:00 2001 From: Shane Neuville Date: Mon, 2 Feb 2026 14:36:16 -0600 Subject: [PATCH 15/16] Extract shared rules, simplify git policy, reduce duplication MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Create SHARED-RULES.md with extracted shared content (blocker handling, git rules, model list, platform selection) - Simplify git rule: agent never runs git commands, always on correct branch - Change 'commit state file' to 'save state file' (state files are gitignored) - Phase 4: 'Apply changes' instead of 'Commit changes' - user handles git - Compress cross-pollination section (51→20 lines) - Convert PLAN-TEMPLATE.md to pure checklist (226→112 lines) - Add instruction for user to handle commit/push when alternative fix applied Line count reduction: - pr.md: 662→535 (-19%) - post-gate.md: 403→302 (-25%) - PLAN-TEMPLATE.md: 226→112 (-50%) - Total: 1291→1116 (-14%) Validated by 5 AI models (Claude Sonnet 4.5, Claude Opus 4.5, GPT-5.2, GPT-5.2-Codex, Gemini 3 Pro) - all confirmed no conflicting guidance. --- .github/agents/pr.md | 162 +++----------------- .github/agents/pr/PLAN-TEMPLATE.md | 234 ++++++++--------------------- .github/agents/pr/SHARED-RULES.md | 167 ++++++++++++++++++++ .github/agents/pr/post-gate.md | 180 +++++----------------- 4 files changed, 285 insertions(+), 458 deletions(-) create mode 100644 .github/agents/pr/SHARED-RULES.md diff --git a/.github/agents/pr.md b/.github/agents/pr.md index 9f48e3bebfc7..ecd6bc1d62e0 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -41,142 +41,23 @@ After Gate passes, read `.github/agents/pr/post-gate.md` for **Phases 4-5**. --- -## Phase Completion Protocol (CRITICAL) +## 🚨 Critical Rules -**Before changing ANY phase status to ✅ COMPLETE:** +**Read `.github/agents/pr/SHARED-RULES.md` for complete details on:** +- Phase Completion Protocol (fill ALL pending fields before marking complete) +- Follow Templates EXACTLY (no `open` attributes, no "improvements") +- No Direct Git Commands (use `gh pr diff/view`, let scripts handle files) +- Use Skills' Scripts (don't bypass with manual commands) +- Stop on Environment Blockers (strict retry limits, report and ask user) +- Multi-Model Configuration (5 models for Phase 4) +- Platform Selection (must be affected AND available on host) -1. **Read the state file section** for the phase you're completing -2. **Find ALL âŗ PENDING and [PENDING] fields** in that section -3. **Fill in every field** with actual content -4. **Verify no pending markers remain** in your section -5. **Commit the state file** with complete content -6. **Then change status** to ✅ COMPLETE +**Key points:** +- ❌ Never run `git checkout`, `git switch`, `git stash`, `git reset` - agent is always on correct branch +- ❌ Never continue after environment blocker - STOP and ask user +- ❌ Never mark phase ✅ with [PENDING] fields remaining -**Rule:** Status ✅ means "documentation complete", not "I finished thinking about it" - ---- - -### 🚨 CRITICAL: Follow Templates EXACTLY - -When creating state files, use the EXACT format from this document: -- **Do NOT add attributes** like `open` to `
` tags -- **Do NOT "improve"** the template format -- **Do NOT deviate** from documented structure -- Downstream scripts depend on exact formatting (regex patterns expect specific structure) - -### 🚨 CRITICAL: No Direct Git Commands - -**Never run git commands directly during a PR review.** - -The user or script has already set up the correct branch. All git operations are handled by the PowerShell scripts (verify-tests-fail.ps1, try-fix, etc.). - -**What to do instead:** -- Use `gh pr diff` or `gh pr view` to see PR info (read-only GitHub CLI) -- Use `gh pr diff --name-only` to list changed files -- Let scripts handle all file manipulation internally - -**Never run these commands:** -- ❌ `git checkout` (any form) -- ❌ `git switch` -- ❌ `git stash` -- ❌ `git reset` -- ❌ `git revert` -- ❌ `gh pr checkout` - -### 🚨 CRITICAL: Use Skills' Scripts - Don't Bypass - -When a skill provides a PowerShell script: -- **Run the script** - don't interpret what it does and do it manually -- **Fix inputs if script fails** - don't bypass with manual `gh` commands -- **Use `-DryRun` to debug** - see what the script would produce before posting -- Scripts handle formatting, API calls, and section management correctly - -### 🚨 CRITICAL: Stop on Environment Blockers - -If you encounter an environment or system setup blocker that prevents completing a phase: - -**STOP IMMEDIATELY. Do NOT continue to the next phase.** - -**Common blockers:** -- Missing Appium drivers (Windows, iOS, Android) -- WinAppDriver not installed or returning errors -- Xcode/iOS simulators not available (on Windows) -- Android emulator not running or not configured -- Developer Mode not enabled -- Port conflicts (e.g., 4723 in use) -- Missing SDKs or tools -- Server errors (500, timeout, "unknown error occurred") - -**Retry Limits - STRICT ENFORCEMENT:** - -| Blocker Type | Max Retries | Then Do | -|--------------|-------------|---------| -| Missing tool/driver | 1 install attempt | STOP and ask user | -| Server errors (500, timeout) | 0 | STOP immediately and report | -| Port conflicts | 1 (kill process) | STOP and ask user | -| Configuration issues | 1 fix attempt | STOP and ask user | - -**When blocked, you MUST:** -1. **Stop all work** - Do not proceed to the next phase -2. **Do NOT keep troubleshooting** - After the retry limit, STOP -3. **Report the blocker** clearly: - - What step failed - - What tool/driver/device is missing - - What error message was shown - - What you already tried (if any retries were used) -4. **Ask the user** how to proceed: - - Install the missing component? - - Switch to a different platform? - - Skip this phase with documented limitations? -5. **Wait for user response** - Do not assume or work around - -**Example blocker report:** -``` -⛔ BLOCKED: Cannot complete Gate phase - -**What failed:** Running verify-tests-fail-without-fix skill -**Blocker:** WinAppDriver returns 500 errors -**Error:** "WinAppDriver server fails to respond with proper status" - -**What I tried:** -- Ran Appium provisioning script -- Updated Windows driver to v5.1.8 - -**I am STOPPING here. Options:** -1. User investigates WinAppDriver setup manually -2. Switch to Android/iOS platform -3. Accept Sandbox manual verification as sufficient -4. Skip automated verification (note limitation in report) - -Which would you like me to do? -``` - -**Never:** -- ❌ Keep trying different fixes after retry limit exceeded -- ❌ Mark a phase as âš ī¸ BLOCKED and continue to the next phase -- ❌ Claim "verification passed" when tests couldn't actually run -- ❌ Skip device/emulator testing and proceed with code review only -- ❌ Install multiple tools/drivers without asking between each -- ❌ Spend more than 2-3 tool calls troubleshooting the same blocker - ---- - -### 🚨 CRITICAL: Phase 4 Uses Multi-Model try-fix - -**Even when a PR already has a fix**, Phase 4 requires running the `try-fix` skill with **5 different AI models** to: -1. **Maximize fix diversity** - Each model brings different perspectives -2. **Cross-pollinate ideas** - Share results between models to spark new ideas -3. **Ensure exhaustive exploration** - Only stop when ALL models confirm "no new ideas" - -**The multi-model workflow:** -- **Round 1**: Run try-fix 5 times sequentially using the `task` agent with `model` parameter: `claude-sonnet-4.5`, `claude-opus-4.5`, `gpt-5.2`, `gpt-5.2-codex`, `gemini-3-pro-preview` -- **Round 2+**: Share all results with all 5 models, run try-fix for any new ideas, repeat until exhaustion - -**âš ī¸ SEQUENTIAL ONLY**: try-fix runs modify the same files and use the same device. Never run in parallel. - -**Note:** The `model` parameter is passed to the `task` tool, which supports model selection. This is separate from agent YAML frontmatter (which is VS Code-only). - -See `post-gate.md` for detailed Phase 4 instructions. +Phase 4 uses a 5-model exploration workflow. See `post-gate.md` for detailed instructions after Gate passes. --- @@ -344,7 +225,7 @@ This file: - Serves as your TODO list for all phases - Tracks progress if interrupted - Must exist before you start gathering context -- **Always include when committing changes** (to `CustomAgentLogsTmp/PRState/`) +- **Always include when saving changes** (to `CustomAgentLogsTmp/PRState/`) - **Phases 4-5 sections are added AFTER Gate passes** (see `pr/post-gate.md`) **Then gather context and update the file as you go.** @@ -353,11 +234,7 @@ This file: **If starting from a PR:** ```bash -# Checkout the PR -git fetch origin pull/XXXXX/head:pr-XXXXX -git checkout pr-XXXXX - -# Fetch PR metadata +# Fetch PR metadata (agent is already on correct branch) gh pr view XXXXX --json title,body,url,author,labels,files # Find and read linked issue @@ -367,7 +244,6 @@ gh issue view ISSUE_NUMBER --json title,body,comments **If starting from an Issue (no PR exists):** ```bash -# Stay on current branch - do NOT checkout anything # Fetch issue details directly gh issue view XXXXX --json title,body,comments,labels ``` @@ -462,7 +338,7 @@ The test result will be updated to `✅ PASS (Gate)` after Gate passes. - [ ] Files Changed table populated (if PR exists) - [ ] PR Discussion Summary documented (if PR exists) - [ ] All [PENDING] placeholders replaced -- [ ] State file committed +- [ ] State file saved --- @@ -529,7 +405,7 @@ The script auto-detects mode based on git diff. If only test files changed, it v - [ ] Test file paths documented - [ ] "Tests verified to FAIL" note added - [ ] Test category identified -- [ ] State file committed +- [ ] State file saved --- @@ -626,7 +502,7 @@ See `.github/skills/verify-tests-fail-without-fix/SKILL.md` for full skill docum - [ ] Result shows PASSED ✅ or FAILED ❌ - [ ] Test behavior documented - [ ] Platform tested noted -- [ ] State file committed +- [ ] State file saved --- diff --git a/.github/agents/pr/PLAN-TEMPLATE.md b/.github/agents/pr/PLAN-TEMPLATE.md index e7c763b7aaa4..a1d1e1912193 100644 --- a/.github/agents/pr/PLAN-TEMPLATE.md +++ b/.github/agents/pr/PLAN-TEMPLATE.md @@ -1,214 +1,99 @@ # PR Review Plan Template -## Overview - -This is a **reusable template** for reviewing PRs using the 5-phase PR Agent workflow. Copy this plan and fill in the PR-specific details. +**Reusable checklist** for the 5-phase PR Agent workflow. **Source documents:** - `.github/agents/pr.md` - Phases 1-3 (Pre-Flight, Tests, Gate) - `.github/agents/pr/post-gate.md` - Phases 4-5 (Fix, Report) +- `.github/agents/pr/SHARED-RULES.md` - Critical rules (blockers, git, templates) --- -## 🚨 Critical Rules - -### Rule 1: Stop on Environment Blockers -If ANY phase cannot complete due to missing tools/drivers/devices: -1. **STOP immediately** - Do NOT continue to next phase -2. **Do NOT keep troubleshooting** - Strict retry limits apply -3. **Report the blocker** - What failed, what's missing, error message -4. **Ask user** for resolution options -5. **Wait for response** - Do not assume or work around - -**Retry Limits (STRICT):** -| Blocker Type | Max Retries | Then Do | -|--------------|-------------|---------| -| Missing tool/driver | 1 install attempt | STOP and ask | -| Server errors (500, timeout) | 0 retries | STOP immediately | -| Port conflicts | 1 (kill process) | STOP and ask | -| Configuration issues | 1 fix attempt | STOP and ask | - -**Common blockers:** Appium drivers, WinAppDriver errors, Xcode, emulators, Developer Mode, port conflicts, SDKs - -**Blocker Report Template:** -``` -⛔ BLOCKED: Cannot complete [Phase Name] - -**What failed:** [Step/skill that failed] -**Blocker:** [Tool/driver/error type] -**Error:** "[Exact error message]" - -**What I tried:** [List retry attempts, max 1-2] - -**I am STOPPING here. Options:** -1. [Option for user] -2. [Alternative platform] -3. [Skip with documented limitation] - -Which would you like me to do? -``` - -### Rule 2: No Direct Git Commands -**Never run git commands directly during a PR review.** -- All git operations are handled by the PowerShell scripts (verify-tests-fail.ps1, etc.) -- Use `gh pr diff` or `gh pr view` to see PR info (read-only) -- ❌ Never run: `git checkout`, `git switch`, `git stash`, `git reset`, `git revert` -- ✅ Scripts handle file manipulation internally - -### Rule 3: Gate via Task Agent Only -Gate verification MUST run as a `task` agent invocation, NOT inline commands. -- The script does TWO test runs (revert fix → test, restore fix → test) -- Running inline allows fabrication of results - -### Rule 4: Multi-Model try-fix (Phase 4) -Run try-fix with 5 different models SEQUENTIALLY: -1. `claude-sonnet-4.5` -2. `claude-opus-4.5` -3. `gpt-5.2` -4. `gpt-5.2-codex` -5. `gemini-3-pro-preview` - -Then cross-pollinate: Share ALL results with ALL models, ask for NEW ideas, repeat until exhaustion. - -### Rule 5: Follow Templates Exactly -- Do NOT add attributes like `open` to `
` tags -- Do NOT "improve" template formats -- Downstream scripts depend on exact regex patterns - -### Rule 6: Use Skills' Scripts -- Run the provided PowerShell scripts, don't bypass with manual commands -- If script fails, fix inputs rather than using manual `gh` commands +## 🚨 Critical Rules (Summary) + +See `SHARED-RULES.md` for complete details. Key points: +- **Environment Blockers**: STOP immediately, report, ask user (strict retry limits) +- **No Git Commands**: Never checkout/switch branches - agent is always on correct branch +- **Gate via Task Agent**: Never run inline (prevents fabrication) +- **Multi-Model try-fix**: 5 models, SEQUENTIAL only +- **Follow Templates**: No `open` attributes, no "improvements" --- ## Work Plan -### Phase 1: Pre-Flight - Context Gathering +### Phase 1: Pre-Flight - [ ] Create state file: `CustomAgentLogsTmp/PRState/pr-XXXXX.md` - [ ] Gather PR metadata (title, body, labels, author) - [ ] Fetch and read linked issue - [ ] Fetch PR comments and review feedback -- [ ] Check for prior agent reviews (if found, import and resume) +- [ ] Check for prior agent reviews (import and resume if found) - [ ] Document platforms affected -- [ ] Identify changed files and classify (fix vs test) +- [ ] Classify changed files (fix vs test) - [ ] Document PR's fix approach in Fix Candidates table - [ ] Update state file: Pre-Flight → ✅ COMPLETE -- [ ] Commit state file +- [ ] Save state file -**Boundaries:** No code analysis, no opinions on fix, no test running +**Boundaries:** No code analysis, no fix opinions, no test running -### Phase 2: Tests - Verify Test Existence +### Phase 2: Tests - [ ] Check if PR includes UI tests -- [ ] Verify tests follow naming conventions (`IssueXXXXX` pattern) +- [ ] Verify tests follow `IssueXXXXX` naming convention - [ ] If tests exist: Verify they compile - [ ] If tests missing: Invoke `write-ui-tests` skill - [ ] Document test files in state file - [ ] Update state file: Tests → ✅ COMPLETE -- [ ] Commit state file +- [ ] Save state file -### Phase 3: Gate - Verify Tests Catch Bug ⛔ -**🚨 BLOCKER PHASE - Cannot continue if Gate fails** +### Phase 3: Gate ⛔ +**🚨 Cannot continue if Gate fails** -- [ ] **Select platform:** Choose platform that is BOTH affected by the bug AND available on host - - Check "Platforms Affected" from Pre-Flight - - Match to host: Windows → (Android, Windows), macOS → (Android, iOS, MacCatalyst) - - If no affected platform is available → STOP and ask user -- [ ] Invoke verification via **task agent** (NOT inline): +- [ ] Select platform (must be affected AND available on host) +- [ ] Invoke via **task agent** (NOT inline): ``` - Invoke task agent with: "Run verify-tests-fail-without-fix skill - - Platform: [selected platform] - - TestFilter: 'IssueXXXXX' - - RequireFullVerification: true - Report: Did tests FAIL without fix? PASS with fix? Final status?" + "Run verify-tests-fail-without-fix skill + Platform: [X], TestFilter: 'IssueXXXXX', RequireFullVerification: true" ``` -- [ ] ⛔ **If environment blocker**: STOP, report, ask user -- [ ] Verify output: - - ✅ Tests FAIL without fix (bug reproduced) - - ✅ Tests PASS with fix (fix works) -- [ ] If Gate fails: STOP, request test fixes from author +- [ ] ⛔ If environment blocker: STOP, report, ask user +- [ ] Verify: Tests FAIL without fix, PASS with fix +- [ ] If Gate fails: STOP, request test fixes - [ ] Update state file: Gate → ✅ PASSED -- [ ] Commit state file +- [ ] Save state file -### Phase 4: Fix - Multi-Model Exploration 🔧 -*(Only proceed if Gate ✅ PASSED)* +### Phase 4: Fix 🔧 +*(Only if Gate ✅ PASSED)* **Round 1: Run try-fix with each model (SEQUENTIAL)** -- [ ] Attempt 1: `claude-sonnet-4.5` -- [ ] Attempt 2: `claude-opus-4.5` -- [ ] Attempt 3: `gpt-5.2` -- [ ] Attempt 4: `gpt-5.2-codex` -- [ ] Attempt 5: `gemini-3-pro-preview` -- [ ] ⛔ **If environment blocker on any attempt**: STOP, report, ask user -- [ ] Record each: approach, test result, files changed, failure analysis - -**Round 2+: Cross-Pollination Loop (MANDATORY - DO NOT SKIP)** -- [ ] **Invoke EACH model via task agent** (not explore/glob): - - [ ] claude-sonnet-4.5: "Any NEW fix ideas?" → Response: ___ - - [ ] claude-opus-4.5: "Any NEW fix ideas?" → Response: ___ - - [ ] gpt-5.2: "Any NEW fix ideas?" → Response: ___ - - [ ] gpt-5.2-codex: "Any NEW fix ideas?" → Response: ___ - - [ ] gemini-3-pro-preview: "Any NEW fix ideas?" → Response: ___ -- [ ] Record responses in Cross-Pollination table in state file -- [ ] For each new idea: Run try-fix with that model -- [ ] Repeat until ALL 5 models say "NO NEW IDEAS" (max 3 rounds) +- [ ] claude-sonnet-4.5 +- [ ] claude-opus-4.5 +- [ ] gpt-5.2 +- [ ] gpt-5.2-codex +- [ ] gemini-3-pro-preview +- [ ] ⛔ If blocker: STOP, report, ask user +- [ ] Record: approach, result, files, failure analysis + +**Round 2+: Cross-Pollination (MANDATORY)** +- [ ] Invoke EACH model: "Any NEW fix ideas?" +- [ ] Record responses in Cross-Pollination table +- [ ] Run try-fix for new ideas (SEQUENTIAL) +- [ ] Repeat until ALL 5 say "NO NEW IDEAS" (max 3 rounds) **Completion:** -- [ ] Cross-Pollination table exists with all 5 model responses -- [ ] Mark Exhausted: Yes (all models confirmed no new ideas via invocation) +- [ ] Cross-Pollination table has all 5 responses +- [ ] Mark Exhausted: Yes - [ ] Compare passing candidates with PR's fix -- [ ] Select best fix (test results → simplicity → robustness → style) +- [ ] Select best fix (results → simplicity → robustness) - [ ] Update state file: Fix → ✅ COMPLETE -- [ ] Commit state file - -### Phase 5: Report - Final Recommendation 📋 -*(Only proceed if Phases 1-4 complete)* - -- [ ] Run `pr-finalize` skill to verify title/description accuracy -- [ ] Generate comprehensive review: - - Root cause analysis - - All fix candidates with results - - Final recommendation (APPROVE / REQUEST CHANGES) - - Reasoning -- [ ] Post review via `ai-summary-comment` skill (NOT manual gh command) -- [ ] Update state file: Report → ✅ COMPLETE -- [ ] Commit final state file - ---- - -## Success Criteria - -✅ **APPROVE if:** -- Gate passed (tests catch bug) -- PR's fix works and is equal/better than alternatives -- Tests are comprehensive -- Code changes are minimal and correct - -âš ī¸ **REQUEST CHANGES if:** -- Gate failed (tests don't catch bug) -- Alternative fix is significantly better -- Tests are missing or inadequate -- Code has issues +- [ ] Save state file ---- - -## Environment Blocker Report Template - -When blocked, use this format: - -``` -⛔ BLOCKED: Cannot complete [Phase] phase +### Phase 5: Report 📋 +*(Only if Phases 1-4 complete)* -**What failed:** [Step description] -**Missing:** [Tool/driver/device] -**Error:** "[Error message]" - -**Options:** -1. Install [component]: `[command]` -2. Switch to [alternative platform] -3. Skip with documented limitation - -Which would you like me to do? -``` +- [ ] Run `pr-finalize` skill +- [ ] Generate review: root cause, candidates, recommendation +- [ ] Post via `ai-summary-comment` skill +- [ ] Update state file: Report → ✅ COMPLETE +- [ ] Save final state file --- @@ -216,11 +101,12 @@ Which would you like me to do? | Phase | Key Action | Blocker Response | |-------|------------|------------------| -| Pre-Flight | Create state file, gather context | N/A (no env needed) | -| Tests | Verify/create tests, compile check | N/A (no device needed) | +| Pre-Flight | Create state file | N/A | +| Tests | Verify/create tests | N/A | | Gate | Task agent → verify script | ⛔ STOP, report, ask | -| Fix | Multi-model try-fix (5+ attempts) | ⛔ STOP, report, ask | -| Report | Post via ai-summary-comment skill | ⛔ STOP, report, ask | +| Fix | Multi-model try-fix | ⛔ STOP, report, ask | +| Report | Post via skill | ⛔ STOP, report, ask | **State file:** `CustomAgentLogsTmp/PRState/pr-XXXXX.md` -**Never:** Mark BLOCKED and continue, claim success without running tests, bypass scripts + +**Never:** Mark BLOCKED and continue, claim success without tests, bypass scripts diff --git a/.github/agents/pr/SHARED-RULES.md b/.github/agents/pr/SHARED-RULES.md new file mode 100644 index 000000000000..8dc2e0fe18ef --- /dev/null +++ b/.github/agents/pr/SHARED-RULES.md @@ -0,0 +1,167 @@ +# PR Agent: Shared Rules + +This file contains critical rules that apply across all PR agent phases. Referenced by `pr.md`, `post-gate.md`, and `PLAN-TEMPLATE.md`. + +--- + +## Phase Completion Protocol + +**Before changing ANY phase status to ✅ COMPLETE:** + +1. **Read the state file section** for the phase you're completing +2. **Find ALL âŗ PENDING and [PENDING] fields** in that section +3. **Fill in every field** with actual content +4. **Verify no pending markers remain** in your section +5. **Save the state file** (it's in gitignored `CustomAgentLogsTmp/`) +6. **Then change status** to ✅ COMPLETE + +**Rule:** Status ✅ means "documentation complete", not "I finished thinking about it" + +--- + +## Follow Templates EXACTLY + +When creating state files, use the EXACT format from the documentation: +- **Do NOT add attributes** like `open` to `
` tags +- **Do NOT "improve"** the template format +- **Do NOT deviate** from documented structure +- Downstream scripts depend on exact formatting (regex patterns expect specific structure) + +--- + +## No Direct Git Commands + +**Never run git commands that change branch or file state.** + +The agent is always invoked from the correct branch. All file state management is handled by PowerShell scripts (`verify-tests-fail.ps1`, `try-fix`, etc.). + +**What to do instead:** +- Use `gh pr diff` or `gh pr view` to see PR info (read-only GitHub CLI) +- Use `gh pr diff --name-only` to list changed files +- Let scripts handle all file manipulation internally + +**Never run these commands:** +- ❌ `git checkout` (any form) +- ❌ `git switch` +- ❌ `git stash` +- ❌ `git reset` +- ❌ `git revert` +- ❌ `gh pr checkout` +- ❌ `git fetch` (for branch switching purposes) + +--- + +## Use Skills' Scripts - Don't Bypass + +When a skill provides a PowerShell script: +- **Run the script** - don't interpret what it does and do it manually +- **Fix inputs if script fails** - don't bypass with manual `gh` commands +- **Use `-DryRun` to debug** - see what the script would produce before posting +- Scripts handle formatting, API calls, and section management correctly + +--- + +## Stop on Environment Blockers + +If you encounter an environment or system setup blocker that prevents completing a phase: + +**STOP IMMEDIATELY. Do NOT continue to the next phase.** + +### Common Blockers + +- Missing Appium drivers (Windows, iOS, Android) +- WinAppDriver not installed or returning errors +- Xcode/iOS simulators not available (on Windows) +- Android emulator not running or not configured +- Developer Mode not enabled +- Port conflicts (e.g., 4723 in use) +- Missing SDKs or tools +- Server errors (500, timeout, "unknown error occurred") + +### Retry Limits (STRICT ENFORCEMENT) + +| Blocker Type | Max Retries | Then Do | +|--------------|-------------|---------| +| Missing tool/driver | 1 install attempt | STOP and ask user | +| Server errors (500, timeout) | 0 | STOP immediately and report | +| Port conflicts | 1 (kill process) | STOP and ask user | +| Configuration issues | 1 fix attempt | STOP and ask user | + +### When Blocked + +1. **Stop all work** - Do not proceed to the next phase +2. **Do NOT keep troubleshooting** - After the retry limit, STOP +3. **Report the blocker** clearly (use template below) +4. **Ask the user** how to proceed +5. **Wait for user response** - Do not assume or work around + +### Blocker Report Template + +``` +⛔ BLOCKED: Cannot complete [Phase Name] + +**What failed:** [Step/skill that failed] +**Blocker:** [Tool/driver/error type] +**Error:** "[Exact error message]" + +**What I tried:** [List retry attempts, max 1-2] + +**I am STOPPING here. Options:** +1. [Option for user - e.g., investigate setup manually] +2. [Alternative platform] +3. [Skip with documented limitation] + +Which would you like me to do? +``` + +### Never Do + +- ❌ Keep trying different fixes after retry limit exceeded +- ❌ Mark a phase as âš ī¸ BLOCKED and continue to the next phase +- ❌ Claim "verification passed" when tests couldn't actually run +- ❌ Skip device/emulator testing and proceed with code review only +- ❌ Install multiple tools/drivers without asking between each +- ❌ Spend more than 2-3 tool calls troubleshooting the same blocker + +--- + +## Multi-Model Configuration + +Phase 4 uses these 5 AI models for try-fix exploration (run SEQUENTIALLY): + +| Order | Model | +|-------|-------| +| 1 | `claude-sonnet-4.5` | +| 2 | `claude-opus-4.5` | +| 3 | `gpt-5.2` | +| 4 | `gpt-5.2-codex` | +| 5 | `gemini-3-pro-preview` | + +**Note:** The `model` parameter is passed to the `task` tool, which supports model selection. This is separate from agent YAML frontmatter (which is VS Code-only). + +**âš ī¸ SEQUENTIAL ONLY**: try-fix runs modify the same files and use the same device. Never run in parallel. + +--- + +## Platform Selection + +**Choose a platform that is BOTH affected by the bug AND available on the current host.** + +### Step 1: Identify affected platforms from Pre-Flight +- Check the "Platforms Affected" checkboxes in the state file +- Check issue labels (e.g., `platform/iOS`, `platform/Android`) +- Check which platform-specific files the PR modifies + +### Step 2: Match to available platforms + +| Host OS | Available Platforms | +|---------|---------------------| +| Windows | Android, Windows | +| macOS | Android, iOS, MacCatalyst | + +### Step 3: Select the best match +1. Pick a platform that IS affected by the bug +2. That IS available on the current host +3. Prefer the platform most directly impacted by the PR's code changes + +**âš ī¸ Do NOT test on a platform that isn't affected by the bug** - the test will pass regardless of whether the fix works. diff --git a/.github/agents/pr/post-gate.md b/.github/agents/pr/post-gate.md index bf25c38818dc..8878f1b6928e 100644 --- a/.github/agents/pr/post-gate.md +++ b/.github/agents/pr/post-gate.md @@ -15,28 +15,14 @@ If Gate is not passed, go back to `.github/agents/pr.md` and complete phases 1-3 --- -## Phase Completion Protocol (CRITICAL) +## 🚨 Critical Rules -**Before changing ANY phase status to ✅ COMPLETE:** +**All rules from `.github/agents/pr/SHARED-RULES.md` apply here**, including: +- Phase Completion Protocol (fill ALL pending fields before marking complete) +- Stop on Environment Blockers (STOP and ask user, don't continue) +- Multi-Model Configuration (5 models, SEQUENTIAL only) -1. **Read the state file section** for the phase you're completing -2. **Find ALL âŗ PENDING and [PENDING] fields** in that section -3. **Fill in every field** with actual content -4. **Verify no pending markers remain** in your section -5. **Commit the state file** with complete content -6. **Then change status** to ✅ COMPLETE - -**Rule:** Status ✅ means "documentation complete", not "I finished thinking about it" - -### 🚨 CRITICAL: Stop on Environment Blockers (Applies to Phase 4) - -The same "Stop on Environment Blockers" rule from `pr.md` applies here. If try-fix cannot run due to: -- Missing Appium drivers -- Device/emulator not available -- WinAppDriver not installed -- Platform tools missing - -**STOP and ask the user** before continuing. Do NOT mark try-fix attempts as "BLOCKED" and continue. Either fix the environment issue or get explicit user permission to skip. +If try-fix cannot run due to environment issues, **STOP and ask the user**. Do NOT mark attempts as "BLOCKED" and continue. --- @@ -66,17 +52,7 @@ Phase 4 uses a **multi-model approach** to maximize fix diversity. Each AI model #### Round 1: Run try-fix with Each Model -Run the `try-fix` skill **5 times sequentially**, once with each model: - -| Order | Model | Invocation | -|-------|-------|------------| -| 1 | `claude-sonnet-4.5` | `task` tool with `model: "claude-sonnet-4.5"` parameter | -| 2 | `claude-opus-4.5` | `task` tool with `model: "claude-opus-4.5"` parameter | -| 3 | `gpt-5.2` | `task` tool with `model: "gpt-5.2"` parameter | -| 4 | `gpt-5.2-codex` | `task` tool with `model: "gpt-5.2-codex"` parameter | -| 5 | `gemini-3-pro-preview` | `task` tool with `model: "gemini-3-pro-preview"` parameter | - -**Note:** The `model` parameter is passed to the `task` tool, which supports model selection. This is separate from agent YAML frontmatter (which is VS Code-only). +Run the `try-fix` skill **5 times sequentially**, once with each model (see `SHARED-RULES.md` for model list). **For each model**, invoke the try-fix skill: ``` @@ -94,95 +70,34 @@ Generate ONE independent fix idea. Review the PR's fix first to ensure your appr **Wait for each to complete before starting the next.** -#### Round 2+: Cross-Pollination Loop +#### Round 2+: Cross-Pollination Loop (MANDATORY) -**🚨 MANDATORY - DO NOT SKIP THIS STEP** +After Round 1, invoke EACH of the 5 models to ask for new ideas. **No shortcuts allowed.** -After Round 1 completes, you MUST invoke each of the 5 models to ask for new ideas. This is NOT optional. +**❌ WRONG**: Using `explore`/`glob`, declaring exhaustion without invoking each model +**✅ CORRECT**: Invoke EACH model via task agent and ask explicitly -**❌ WRONG**: Using `explore` or `glob` to "check if approaches are exhausted" -**❌ WRONG**: Declaring exhaustion without invoking each model -**❌ WRONG**: Assuming "comprehensive coverage" means no new ideas exist -**✅ CORRECT**: Invoke EACH model via task agent and ask explicitly for new ideas +**Steps (repeat until all 5 say "NO NEW IDEAS", max 3 rounds):** -**âš ī¸ Summary Size Limit**: To stay within Copilot CLI's 30,000 char prompt limit, keep attempt summaries bounded: -- Max 3-4 bullet points per attempt -- Focus on: approach name, result (✅/❌), one-line key learning -- Omit verbose stack traces or full code diffs - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Cross-Pollination Loop - MANDATORY │ -├─────────────────────────────────────────────────────────────┤ -│ │ -│ 🚨 You MUST invoke each model - no shortcuts allowed │ -│ │ -│ LOOP until no new ideas: │ -│ │ -│ 1. Compile BOUNDED summary of ALL try-fix attempts: │ -│ - Attempt #, approach (1 line) │ -│ - Pass/Fail result │ -│ - Key learning (1 line - why it worked or failed) │ -│ │ -│ 2. Invoke EACH of the 5 models via task agent: │ -│ prompt: "Review these fix attempts for PR #XXXXX: │ -│ [summary of all attempts] │ -│ Do you have any NEW fix ideas not tried? │ -│ Reply: 'NEW IDEA: [description]' or │ -│ 'NO NEW IDEAS'" │ -│ │ -│ 3. Record each model's response in state file: │ -│ | Model | Response | │ -│ | claude-sonnet-4.5 | NO NEW IDEAS | │ -│ | claude-opus-4.5 | NEW IDEA: ... | │ -│ | ... | ... | │ -│ │ -│ 4. For each model with a new idea: │ -│ → Run try-fix with that model (SEQUENTIAL) │ -│ → Wait for completion before next │ -│ │ -│ 5. If ANY new ideas were tested → repeat loop │ -│ If ALL 5 models said "NO NEW IDEAS" → exit loop │ -│ │ -│ MAX ROUNDS: 3 (to prevent infinite loops) │ -│ │ -└─────────────────────────────────────────────────────────────┘ -``` - -**Cross-Pollination Invocation Template:** -``` -Invoke task agent with model parameter: - agent_type: "task" - model: "[model-name]" - prompt: "Review PR #XXXXX fix attempts: - - Attempt 1 (claude-sonnet-4.5): [approach] - ✅/❌ - - Attempt 2 (claude-opus-4.5): [approach] - ✅/❌ - - ... - - Do you have any NEW fix ideas that haven't been tried? - Reply with EXACTLY one of: - - 'NEW IDEA: [brief description]' - - 'NO NEW IDEAS' - - Only propose ideas that are fundamentally different from above." -``` +1. **Compile bounded summary** (max 3-4 bullets per attempt): + - Attempt #, approach (1 line), result (✅/❌), key learning (1 line) -**Coordination loop stop condition**: Exit when ALL 5 models explicitly respond "NO NEW IDEAS" in the same round. This requires actual task agent invocations - not code exploration or assumptions. +2. **Invoke each model via task agent:** + ``` + agent_type: "task", model: "[model-name]" + prompt: "Review PR #XXXXX fix attempts: + - Attempt 1: [approach] - ✅/❌ + - Attempt 2: [approach] - ✅/❌ + Do you have any NEW fix ideas? Reply: 'NEW IDEA: [desc]' or 'NO NEW IDEAS'" + ``` -#### try-fix Invocation Details +3. **Record each model's response** in state file Cross-Pollination table -Each `try-fix` invocation (via task agent): -- Reads state file to learn from prior attempts -- Reverts PR's fix to get broken baseline -- Proposes and implements ONE fix idea -- Runs tests to validate -- Records result with failure analysis -- Reverts changes (restores PR's fix) -- Updates state file with attempt results +4. **For each new idea**: Run try-fix with that model (SEQUENTIAL, wait for completion) -See `.github/skills/try-fix/SKILL.md` for full details. +5. **Exit when**: ALL 5 models say "NO NEW IDEAS" in the same round -### What try-fix Does (Each Invocation) +#### try-fix Behavior Each `try-fix` invocation (run via task agent with specific model): 1. Reads state file to learn from prior failed attempts @@ -241,7 +156,7 @@ Update the state file: **If a try-fix alternative was selected:** - Re-implement the fix (you documented the approach in the table) -- Commit the changes +- Apply the changes to files (do not commit - user handles git) ### Complete 🔧 Fix @@ -272,7 +187,7 @@ Update the state file: - [ ] "Selected Fix" populated with reasoning - [ ] Root cause analysis documented for the selected fix (to be surfaced in 📋 Report phase "### Root Cause" section) - [ ] No âŗ PENDING markers remain in Fix section -- [ ] State file committed +- [ ] State file saved **🚨 If cross-pollination table is missing, you skipped Round 2. Go back and invoke each model.** @@ -298,41 +213,23 @@ If reviewing an existing PR, check if title/description need updates and include ### If Starting from Issue (No PR) - Create PR -1. **Ensure selected fix is applied and committed**: - ```bash - git add -A - git commit -m "Fix #XXXXX: [Description of fix]" - ``` - -2. **Create a feature branch** (if not already on one): - ```bash - git checkout -b fix/issue-XXXXX - ``` - -3. **⛔ STOP: Ask user for confirmation before creating PR**: +1. **⛔ STOP: Ask user to commit and create PR**: - Present a summary to the user and wait for explicit approval: - > "I'm ready to create a PR for issue #XXXXX. Here's what will be included: - > - **Branch**: fix/issue-XXXXX + Present a summary to the user and wait for them to handle git operations: + > "I've implemented the fix for issue #XXXXX. Here's what needs to be committed: > - **Selected fix**: Candidate #N - [approach] > - **Files changed**: [list files] > - **Tests added**: [list test files] > - **Other candidates considered**: [brief summary] > - > Would you like me to push and create the PR?" - - **Do NOT proceed until user confirms.** - -4. **Push and create PR** (after user confirmation): - - ```bash - git push -u origin fix/issue-XXXXX - gh pr create --title "[Platform] Brief description of behavior fix" --body "" - ``` + > Please commit these changes and create a PR when ready. + > Suggested PR title: `[Platform] Brief description of behavior fix` + > + > Use the pr-finalize skill output for the PR body." - Use the `pr-finalize` skill output as the `--body` argument. + **Do NOT run git commands. User handles commit/push/PR creation.** -5. **Update state file** with PR link +2. **Update state file** with PR link once user provides it ### If Starting from PR - Write Review @@ -345,6 +242,7 @@ Determine your recommendation based on the Fix phase: **If an alternative fix was selected:** - Recommend: `âš ī¸ REQUEST CHANGES` - Justification: Suggest the better approach from try-fix Candidate #N +- **Tell user:** "I've applied the alternative fix locally. Please review the changes and commit/push to update the PR." **If PR's fix failed tests:** - Recommend: `âš ī¸ REQUEST CHANGES` @@ -382,7 +280,7 @@ Update all phase statuses to complete. - [ ] Summary of findings - [ ] Key technical insights documented - [ ] Overall status changed to final recommendation -- [ ] State file committed +- [ ] State file saved --- From f9a5e8db3b6666a4f9341290288f3e13c753c603 Mon Sep 17 00:00:00 2001 From: Shane Neuville Date: Mon, 2 Feb 2026 15:18:54 -0600 Subject: [PATCH 16/16] Remove agents.instructions.md - no longer relevant --- .github/instructions/agents.instructions.md | 154 -------------------- 1 file changed, 154 deletions(-) delete mode 100644 .github/instructions/agents.instructions.md diff --git a/.github/instructions/agents.instructions.md b/.github/instructions/agents.instructions.md deleted file mode 100644 index d3d487bb2e79..000000000000 --- a/.github/instructions/agents.instructions.md +++ /dev/null @@ -1,154 +0,0 @@ ---- -applyTo: ".github/agents/**" ---- - -# Custom Agent Guidelines for Copilot CLI - -Agents in this repo target **Copilot CLI** as the primary interface. - -## Copilot CLI vs VS Code - -| Property | CLI | VS Code | Use It? | -|----------|-----|---------|---------| -| `name` | ✅ | ✅ | Yes | -| `description` | ✅ | ✅ | **Required** | -| `tools` | ✅ | ✅ | Optional | -| `infer` | ✅ | ✅ | Optional | -| `handoffs` | ❌ | ✅ | **No** - VS Code only | -| `model` | ❌ | ✅ | **No** - VS Code only | -| `argument-hint` | ❌ | ✅ | **No** - VS Code only | - ---- - -## Constraints - -| Constraint | Limit | -|------------|-------| -| Prompt body | **30,000 characters** max | -| Name | 64 chars, lowercase, letters/numbers/hyphens only | -| Description | **1,024 characters** max, **required** | -| Body length | < 300 lines ideal, < 500 max | - -### Name Format - -- ✅ `pr`, `write-tests-agent`, `sandbox-agent` -- ❌ `PR-Reviewer` (uppercase), `pr_reviewer` (underscores), `--name` (leading/consecutive hyphens) - ---- - -## Anti-Patterns (Do NOT Do) - -| Anti-Pattern | Why It's Bad | -|--------------|--------------| -| **Too long/verbose** | Wastes context tokens, slower responses | -| **Vague description** | Won't be discovered via `/agent` | -| **No "when to use" section** | Users won't know when to invoke | -| **Duplicating copilot-instructions.md** | Already loaded automatically | -| **Explaining what skills do** | Reference skill, don't duplicate docs | -| **Large inline code samples** | Move to separate files | -| **ASCII art diagrams** | Consume tokens - use sparingly | -| **VS Code features** | `handoffs`, `model`, `argument-hint` don't work in CLI | -| **GUI references** | No "click button" - CLI is terminal-based | - ---- - -## Best Practices - -### Description = Discovery - -The `/agent` command and auto-inference use description keywords: - -```yaml -# ✅ Good -description: Reviews PRs with independent analysis, validates tests catch bugs, proposes alternative fixes - -# ❌ Bad -description: Helps with code review stuff -``` - -### One Agent = One Role - -- ✅ `pr` - Reviews and works on PRs -- ❌ `everything-agent` - Too broad - -### Commands Over Concepts - -```markdown -# ✅ Good -git fetch origin pull/XXXXX/head:pr-XXXXX && git checkout pr-XXXXX - -# ❌ Bad -First you should fetch the PR and check it out locally -``` - -### Reference Skills, Don't Duplicate - -```markdown -# ✅ Good -Run: `pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 -Platform android` - -# ❌ Bad -The skill does: 1. Detects fix files... 2. Detects test classes... [30 more lines] -``` - ---- - -## Tool Aliases - -| Alias | Purpose | -|-------|---------| -| `execute` / `shell` | Run shell commands | -| `read` | Read file contents | -| `edit` / `write` | Modify files | -| `search` / `grep` | Search files/content | -| `agent` | Invoke other agents | - -```yaml -tools: ["read", "search"] # Read-only agent -tools: ["read", "search", "edit", "execute"] # Full dev agent -``` - ---- - -## Minimal Structure - -```yaml ---- -name: my-agent -description: Does X when user asks Y. Keywords: review, test, fix. ---- - -# Agent Title - -Brief philosophy. - -## When to Use -- ✅ "trigger phrase" - -## When NOT to Use -- ❌ Other task → Use `other-agent` - -## Workflow -1. Step one -2. Step two - -## Quick Reference -| Task | Command | -|------|---------| -| Do X | `command` | - -## Common Mistakes -- ❌ **Mistake** - Why it's wrong -``` - ---- - -## Checklist - -- [ ] YAML frontmatter with `name` and `description` -- [ ] `description` has trigger keywords -- [ ] Body under 500 lines -- [ ] No `handoffs`, `model`, `argument-hint` -- [ ] No GUI/button references -- [ ] Skills referenced, not duplicated -- [ ] "When to Use" / "When NOT to Use" included