diff --git a/.github/agents/pr.md b/.github/agents/pr.md index 53e323d82c30..ecd6bc1d62e0 100644 --- a/.github/agents/pr.md +++ b/.github/agents/pr.md @@ -41,31 +41,23 @@ After Gate passes, read `.github/agents/pr/post-gate.md` for **Phases 4-5**. --- -## Phase Completion Protocol (CRITICAL) +## 🚨 Critical Rules -**Before changing ANY phase status to ✅ COMPLETE:** +**Read `.github/agents/pr/SHARED-RULES.md` for complete details on:** +- Phase Completion Protocol (fill ALL pending fields before marking complete) +- Follow Templates EXACTLY (no `open` attributes, no "improvements") +- No Direct Git Commands (use `gh pr diff/view`, let scripts handle files) +- Use Skills' Scripts (don't bypass with manual commands) +- Stop on Environment Blockers (strict retry limits, report and ask user) +- Multi-Model Configuration (5 models for Phase 4) +- Platform Selection (must be affected AND available on host) -1. **Read the state file section** for the phase you're completing -2. **Find ALL ⏳ PENDING and [PENDING] fields** in that section -3. **Fill in every field** with actual content -4. **Verify no pending markers remain** in your section -5. **Commit the state file** with complete content -6. **Then change status** to ✅ COMPLETE +**Key points:** +- ❌ Never run `git checkout`, `git switch`, `git stash`, `git reset` - agent is always on correct branch +- ❌ Never continue after environment blocker - STOP and ask user +- ❌ Never mark phase ✅ with [PENDING] fields remaining -**Rule:** Status ✅ means "documentation complete", not "I finished thinking about it" - ---- - -### 🚨 CRITICAL: Phase 4 Always Uses `try-fix` Skill - -**Even when a PR already has a fix**, Phase 4 requires running the `try-fix` skill to: -1. **Independently explore alternative solutions** - Generate fix ideas WITHOUT looking at the PR's solution -2. **Test alternatives empirically** - Actually implement and run tests, don't just theorize -3. **Compare with PR's fix** - PR's fix is already validated by Gate; try-fix explores if there's something better - -The PR's fix is NOT tested by try-fix (Gate already did that). try-fix generates and tests YOUR independent ideas. - -This ensures independent analysis rather than rubber-stamping the PR. +Phase 4 uses a 5-model exploration workflow. See `post-gate.md` for detailed instructions after Gate passes. --- @@ -233,7 +225,7 @@ This file: - Serves as your TODO list for all phases - Tracks progress if interrupted - Must exist before you start gathering context -- **Always include when committing changes** (to `CustomAgentLogsTmp/PRState/`) +- **Always include when saving changes** (to `CustomAgentLogsTmp/PRState/`) - **Phases 4-5 sections are added AFTER Gate passes** (see `pr/post-gate.md`) **Then gather context and update the file as you go.** @@ -242,11 +234,7 @@ This file: **If starting from a PR:** ```bash -# Checkout the PR -git fetch origin pull/XXXXX/head:pr-XXXXX -git checkout pr-XXXXX - -# Fetch PR metadata +# Fetch PR metadata (agent is already on correct branch) gh pr view XXXXX --json title,body,url,author,labels,files # Find and read linked issue @@ -256,7 +244,6 @@ gh issue view ISSUE_NUMBER --json title,body,comments **If starting from an Issue (no PR exists):** ```bash -# Stay on current branch - do NOT checkout anything # Fetch issue details directly gh issue view XXXXX --json title,body,comments,labels ``` @@ -351,7 +338,7 @@ The test result will be updated to `✅ PASS (Gate)` after Gate passes. - [ ] Files Changed table populated (if PR exists) - [ ] PR Discussion Summary documented (if PR exists) - [ ] All [PENDING] placeholders replaced -- [ ] State file committed +- [ ] State file saved --- @@ -418,7 +405,7 @@ The script auto-detects mode based on git diff. If only test files changed, it v - [ ] Test file paths documented - [ ] "Tests verified to FAIL" note added - [ ] Test category identified -- [ ] State file committed +- [ ] State file saved --- @@ -441,13 +428,42 @@ Tests were already verified to FAIL in Phase 2. Gate is a confirmation step: **If starting from a PR (fix exists):** Use full verification mode - tests should FAIL without fix, PASS with fix. +### Platform Selection for Gate + +**🚨 CRITICAL: Choose a platform that is BOTH affected by the bug AND available on the current host.** + +**Step 1: Identify affected platforms** from Pre-Flight: +- Check the "Platforms Affected" checkboxes in the state file +- Check issue labels (e.g., `platform/iOS`, `platform/Android`) +- Check which platform-specific files the PR modifies + +**Step 2: Match to available platforms on current host:** + +| Host OS | Available Platforms | +|---------|---------------------| +| Windows | Android, Windows | +| macOS | Android, iOS, MacCatalyst | + +**Step 3: Select the best match:** +1. Pick a platform that IS affected by the bug +2. That IS available on the current host +3. Prefer the platform most directly impacted by the PR's code changes + +**Example decisions:** +- Bug affects iOS/Windows/MacCatalyst, host is Windows → Test on **Windows** +- Bug affects iOS only, host is Windows → **STOP** - cannot test (ask user) +- Bug affects Android only → Test on **Android** (works on any host) +- Bug affects all platforms → Pick based on host (Windows on Windows, iOS on macOS) + +**⚠️ Do NOT test on a platform that isn't affected by the bug** - the test will pass regardless of whether the fix works. + **🚨 MUST invoke as a task agent** to prevent command substitution: ```markdown Invoke the `task` agent with agent_type: "task" and this prompt: "Invoke the verify-tests-fail-without-fix skill for this PR: -- Platform: android (or ios) +- Platform: [selected platform from Platform Selection above] - TestFilter: 'IssueXXXXX' - RequireFullVerification: true @@ -486,7 +502,7 @@ See `.github/skills/verify-tests-fail-without-fix/SKILL.md` for full skill docum - [ ] Result shows PASSED ✅ or FAILED ❌ - [ ] Test behavior documented - [ ] Platform tested noted -- [ ] State file committed +- [ ] State file saved --- diff --git a/.github/agents/pr/PLAN-TEMPLATE.md b/.github/agents/pr/PLAN-TEMPLATE.md new file mode 100644 index 000000000000..a1d1e1912193 --- /dev/null +++ b/.github/agents/pr/PLAN-TEMPLATE.md @@ -0,0 +1,112 @@ +# PR Review Plan Template + +**Reusable checklist** for the 5-phase PR Agent workflow. + +**Source documents:** +- `.github/agents/pr.md` - Phases 1-3 (Pre-Flight, Tests, Gate) +- `.github/agents/pr/post-gate.md` - Phases 4-5 (Fix, Report) +- `.github/agents/pr/SHARED-RULES.md` - Critical rules (blockers, git, templates) + +--- + +## 🚨 Critical Rules (Summary) + +See `SHARED-RULES.md` for complete details. Key points: +- **Environment Blockers**: STOP immediately, report, ask user (strict retry limits) +- **No Git Commands**: Never checkout/switch branches - agent is always on correct branch +- **Gate via Task Agent**: Never run inline (prevents fabrication) +- **Multi-Model try-fix**: 5 models, SEQUENTIAL only +- **Follow Templates**: No `open` attributes, no "improvements" + +--- + +## Work Plan + +### Phase 1: Pre-Flight +- [ ] Create state file: `CustomAgentLogsTmp/PRState/pr-XXXXX.md` +- [ ] Gather PR metadata (title, body, labels, author) +- [ ] Fetch and read linked issue +- [ ] Fetch PR comments and review feedback +- [ ] Check for prior agent reviews (import and resume if found) +- [ ] Document platforms affected +- [ ] Classify changed files (fix vs test) +- [ ] Document PR's fix approach in Fix Candidates table +- [ ] Update state file: Pre-Flight → ✅ COMPLETE +- [ ] Save state file + +**Boundaries:** No code analysis, no fix opinions, no test running + +### Phase 2: Tests +- [ ] Check if PR includes UI tests +- [ ] Verify tests follow `IssueXXXXX` naming convention +- [ ] If tests exist: Verify they compile +- [ ] If tests missing: Invoke `write-ui-tests` skill +- [ ] Document test files in state file +- [ ] Update state file: Tests → ✅ COMPLETE +- [ ] Save state file + +### Phase 3: Gate ⛔ +**🚨 Cannot continue if Gate fails** + +- [ ] Select platform (must be affected AND available on host) +- [ ] Invoke via **task agent** (NOT inline): + ``` + "Run verify-tests-fail-without-fix skill + Platform: [X], TestFilter: 'IssueXXXXX', RequireFullVerification: true" + ``` +- [ ] ⛔ If environment blocker: STOP, report, ask user +- [ ] Verify: Tests FAIL without fix, PASS with fix +- [ ] If Gate fails: STOP, request test fixes +- [ ] Update state file: Gate → ✅ PASSED +- [ ] Save state file + +### Phase 4: Fix 🔧 +*(Only if Gate ✅ PASSED)* + +**Round 1: Run try-fix with each model (SEQUENTIAL)** +- [ ] claude-sonnet-4.5 +- [ ] claude-opus-4.5 +- [ ] gpt-5.2 +- [ ] gpt-5.2-codex +- [ ] gemini-3-pro-preview +- [ ] ⛔ If blocker: STOP, report, ask user +- [ ] Record: approach, result, files, failure analysis + +**Round 2+: Cross-Pollination (MANDATORY)** +- [ ] Invoke EACH model: "Any NEW fix ideas?" +- [ ] Record responses in Cross-Pollination table +- [ ] Run try-fix for new ideas (SEQUENTIAL) +- [ ] Repeat until ALL 5 say "NO NEW IDEAS" (max 3 rounds) + +**Completion:** +- [ ] Cross-Pollination table has all 5 responses +- [ ] Mark Exhausted: Yes +- [ ] Compare passing candidates with PR's fix +- [ ] Select best fix (results → simplicity → robustness) +- [ ] Update state file: Fix → ✅ COMPLETE +- [ ] Save state file + +### Phase 5: Report 📋 +*(Only if Phases 1-4 complete)* + +- [ ] Run `pr-finalize` skill +- [ ] Generate review: root cause, candidates, recommendation +- [ ] Post via `ai-summary-comment` skill +- [ ] Update state file: Report → ✅ COMPLETE +- [ ] Save final state file + +--- + +## Quick Reference + +| Phase | Key Action | Blocker Response | +|-------|------------|------------------| +| Pre-Flight | Create state file | N/A | +| Tests | Verify/create tests | N/A | +| Gate | Task agent → verify script | ⛔ STOP, report, ask | +| Fix | Multi-model try-fix | ⛔ STOP, report, ask | +| Report | Post via skill | ⛔ STOP, report, ask | + +**State file:** `CustomAgentLogsTmp/PRState/pr-XXXXX.md` + +**Never:** Mark BLOCKED and continue, claim success without tests, bypass scripts diff --git a/.github/agents/pr/SHARED-RULES.md b/.github/agents/pr/SHARED-RULES.md new file mode 100644 index 000000000000..8dc2e0fe18ef --- /dev/null +++ b/.github/agents/pr/SHARED-RULES.md @@ -0,0 +1,167 @@ +# PR Agent: Shared Rules + +This file contains critical rules that apply across all PR agent phases. Referenced by `pr.md`, `post-gate.md`, and `PLAN-TEMPLATE.md`. + +--- + +## Phase Completion Protocol + +**Before changing ANY phase status to ✅ COMPLETE:** + +1. **Read the state file section** for the phase you're completing +2. **Find ALL ⏳ PENDING and [PENDING] fields** in that section +3. **Fill in every field** with actual content +4. **Verify no pending markers remain** in your section +5. **Save the state file** (it's in gitignored `CustomAgentLogsTmp/`) +6. **Then change status** to ✅ COMPLETE + +**Rule:** Status ✅ means "documentation complete", not "I finished thinking about it" + +--- + +## Follow Templates EXACTLY + +When creating state files, use the EXACT format from the documentation: +- **Do NOT add attributes** like `open` to `
` tags +- **Do NOT "improve"** the template format +- **Do NOT deviate** from documented structure +- Downstream scripts depend on exact formatting (regex patterns expect specific structure) + +--- + +## No Direct Git Commands + +**Never run git commands that change branch or file state.** + +The agent is always invoked from the correct branch. All file state management is handled by PowerShell scripts (`verify-tests-fail.ps1`, `try-fix`, etc.). + +**What to do instead:** +- Use `gh pr diff` or `gh pr view` to see PR info (read-only GitHub CLI) +- Use `gh pr diff --name-only` to list changed files +- Let scripts handle all file manipulation internally + +**Never run these commands:** +- ❌ `git checkout` (any form) +- ❌ `git switch` +- ❌ `git stash` +- ❌ `git reset` +- ❌ `git revert` +- ❌ `gh pr checkout` +- ❌ `git fetch` (for branch switching purposes) + +--- + +## Use Skills' Scripts - Don't Bypass + +When a skill provides a PowerShell script: +- **Run the script** - don't interpret what it does and do it manually +- **Fix inputs if script fails** - don't bypass with manual `gh` commands +- **Use `-DryRun` to debug** - see what the script would produce before posting +- Scripts handle formatting, API calls, and section management correctly + +--- + +## Stop on Environment Blockers + +If you encounter an environment or system setup blocker that prevents completing a phase: + +**STOP IMMEDIATELY. Do NOT continue to the next phase.** + +### Common Blockers + +- Missing Appium drivers (Windows, iOS, Android) +- WinAppDriver not installed or returning errors +- Xcode/iOS simulators not available (on Windows) +- Android emulator not running or not configured +- Developer Mode not enabled +- Port conflicts (e.g., 4723 in use) +- Missing SDKs or tools +- Server errors (500, timeout, "unknown error occurred") + +### Retry Limits (STRICT ENFORCEMENT) + +| Blocker Type | Max Retries | Then Do | +|--------------|-------------|---------| +| Missing tool/driver | 1 install attempt | STOP and ask user | +| Server errors (500, timeout) | 0 | STOP immediately and report | +| Port conflicts | 1 (kill process) | STOP and ask user | +| Configuration issues | 1 fix attempt | STOP and ask user | + +### When Blocked + +1. **Stop all work** - Do not proceed to the next phase +2. **Do NOT keep troubleshooting** - After the retry limit, STOP +3. **Report the blocker** clearly (use template below) +4. **Ask the user** how to proceed +5. **Wait for user response** - Do not assume or work around + +### Blocker Report Template + +``` +⛔ BLOCKED: Cannot complete [Phase Name] + +**What failed:** [Step/skill that failed] +**Blocker:** [Tool/driver/error type] +**Error:** "[Exact error message]" + +**What I tried:** [List retry attempts, max 1-2] + +**I am STOPPING here. Options:** +1. [Option for user - e.g., investigate setup manually] +2. [Alternative platform] +3. [Skip with documented limitation] + +Which would you like me to do? +``` + +### Never Do + +- ❌ Keep trying different fixes after retry limit exceeded +- ❌ Mark a phase as ⚠️ BLOCKED and continue to the next phase +- ❌ Claim "verification passed" when tests couldn't actually run +- ❌ Skip device/emulator testing and proceed with code review only +- ❌ Install multiple tools/drivers without asking between each +- ❌ Spend more than 2-3 tool calls troubleshooting the same blocker + +--- + +## Multi-Model Configuration + +Phase 4 uses these 5 AI models for try-fix exploration (run SEQUENTIALLY): + +| Order | Model | +|-------|-------| +| 1 | `claude-sonnet-4.5` | +| 2 | `claude-opus-4.5` | +| 3 | `gpt-5.2` | +| 4 | `gpt-5.2-codex` | +| 5 | `gemini-3-pro-preview` | + +**Note:** The `model` parameter is passed to the `task` tool, which supports model selection. This is separate from agent YAML frontmatter (which is VS Code-only). + +**⚠️ SEQUENTIAL ONLY**: try-fix runs modify the same files and use the same device. Never run in parallel. + +--- + +## Platform Selection + +**Choose a platform that is BOTH affected by the bug AND available on the current host.** + +### Step 1: Identify affected platforms from Pre-Flight +- Check the "Platforms Affected" checkboxes in the state file +- Check issue labels (e.g., `platform/iOS`, `platform/Android`) +- Check which platform-specific files the PR modifies + +### Step 2: Match to available platforms + +| Host OS | Available Platforms | +|---------|---------------------| +| Windows | Android, Windows | +| macOS | Android, iOS, MacCatalyst | + +### Step 3: Select the best match +1. Pick a platform that IS affected by the bug +2. That IS available on the current host +3. Prefer the platform most directly impacted by the PR's code changes + +**⚠️ Do NOT test on a platform that isn't affected by the bug** - the test will pass regardless of whether the fix works. diff --git a/.github/agents/pr/post-gate.md b/.github/agents/pr/post-gate.md index 7b42ace936c7..8878f1b6928e 100644 --- a/.github/agents/pr/post-gate.md +++ b/.github/agents/pr/post-gate.md @@ -15,18 +15,14 @@ If Gate is not passed, go back to `.github/agents/pr.md` and complete phases 1-3 --- -## Phase Completion Protocol (CRITICAL) +## 🚨 Critical Rules -**Before changing ANY phase status to ✅ COMPLETE:** +**All rules from `.github/agents/pr/SHARED-RULES.md` apply here**, including: +- Phase Completion Protocol (fill ALL pending fields before marking complete) +- Stop on Environment Blockers (STOP and ask user, don't continue) +- Multi-Model Configuration (5 models, SEQUENTIAL only) -1. **Read the state file section** for the phase you're completing -2. **Find ALL ⏳ PENDING and [PENDING] fields** in that section -3. **Fill in every field** with actual content -4. **Verify no pending markers remain** in your section -5. **Commit the state file** with complete content -6. **Then change status** to ✅ COMPLETE - -**Rule:** Status ✅ means "documentation complete", not "I finished thinking about it" +If try-fix cannot run due to environment issues, **STOP and ask the user**. Do NOT mark attempts as "BLOCKED" and continue. --- @@ -48,61 +44,69 @@ The purpose of Phase 4 is NOT to re-test the PR's fix, but to: **Do NOT let the PR's fix influence your thinking.** Generate ideas as if you hadn't seen the PR. -### Step 1: Agent Orchestrates try-fix Loop +### Step 1: Multi-Model try-fix Exploration -Invoke the `try-fix` skill repeatedly. The skill handles one fix attempt per invocation. +Phase 4 uses a **multi-model approach** to maximize fix diversity. Each AI model brings different perspectives and may find solutions others miss. -**IMPORTANT:** Always pass the `state_file` parameter so try-fix can record its results: -``` -state_file: CustomAgentLogsTmp/PRState/pr-XXXXX.md -``` +**⚠️ SEQUENTIAL ONLY**: try-fix runs MUST execute one at a time. They modify the same files and use the same test device. Never run try-fix attempts in parallel. + +#### Round 1: Run try-fix with Each Model -try-fix will automatically append rows to the Fix Candidates table and set the "Exhausted" field. You remain responsible for: -- Setting "Selected Fix" field with reasoning -- Updating phase status to ✅ COMPLETE +Run the `try-fix` skill **5 times sequentially**, once with each model (see `SHARED-RULES.md` for model list). +**For each model**, invoke the try-fix skill: ``` -┌─────────────────────────────────────────────────────────────┐ -│ Agent orchestration loop │ -├─────────────────────────────────────────────────────────────┤ -│ │ -│ attempts = 0 │ -│ max_attempts = 5 │ -│ state_file = "CustomAgentLogsTmp/PRState/pr-XXXXX.md" │ -│ │ -│ while (attempts < max_attempts): │ -│ result = invoke try-fix skill (with state_file) │ -│ attempts++ │ -│ │ -│ if result.exhausted: │ -│ break # try-fix has no more ideas │ -│ │ -│ # result.passed indicates if this attempt worked │ -│ # try-fix already recorded to state file │ -│ # Continue loop to explore more alternatives │ -│ │ -│ # After loop: compare all try-fix results vs PR's fix │ -│ # Update "Exhausted" and "Selected Fix" fields │ -│ │ -└─────────────────────────────────────────────────────────────┘ +Invoke the try-fix skill for PR #XXXXX: +- problem: [Description of the bug from issue/PR - what's broken and expected behavior] +- platform: [Use platform selected in Gate phase - must be affected by the bug AND available on host] +- test_command: pwsh .github/scripts/BuildAndRunHostApp.ps1 -Platform [same platform] -TestFilter "IssueXXXXX" +- target_files: + - src/[area]/[likely-affected-file-1].cs + - src/[area]/[likely-affected-file-2].cs +- state_file: CustomAgentLogsTmp/PRState/pr-XXXXX.md + +Generate ONE independent fix idea. Review the PR's fix first to ensure your approach is DIFFERENT. ``` -**Stop the loop when:** -- `try-fix` returns `exhausted=true` (no more ideas) -- 5 try-fix attempts have been made -- User requests to stop +**Wait for each to complete before starting the next.** + +#### Round 2+: Cross-Pollination Loop (MANDATORY) + +After Round 1, invoke EACH of the 5 models to ask for new ideas. **No shortcuts allowed.** + +**❌ WRONG**: Using `explore`/`glob`, declaring exhaustion without invoking each model +**✅ CORRECT**: Invoke EACH model via task agent and ask explicitly + +**Steps (repeat until all 5 say "NO NEW IDEAS", max 3 rounds):** + +1. **Compile bounded summary** (max 3-4 bullets per attempt): + - Attempt #, approach (1 line), result (✅/❌), key learning (1 line) + +2. **Invoke each model via task agent:** + ``` + agent_type: "task", model: "[model-name]" + prompt: "Review PR #XXXXX fix attempts: + - Attempt 1: [approach] - ✅/❌ + - Attempt 2: [approach] - ✅/❌ + Do you have any NEW fix ideas? Reply: 'NEW IDEA: [desc]' or 'NO NEW IDEAS'" + ``` + +3. **Record each model's response** in state file Cross-Pollination table -### What try-fix Does (Each Invocation) +4. **For each new idea**: Run try-fix with that model (SEQUENTIAL, wait for completion) -Each `try-fix` invocation: +5. **Exit when**: ALL 5 models say "NO NEW IDEAS" in the same round + +#### try-fix Behavior + +Each `try-fix` invocation (run via task agent with specific model): 1. Reads state file to learn from prior failed attempts 2. Reverts PR's fix to get a broken baseline 3. Proposes ONE new independent fix idea 4. Implements and tests it 5. Records result (with failure analysis if it failed) -6. **Updates state file** (appends row to Fix Candidates table if state_file provided) +6. **Updates state file** (appends row to Fix Candidates table) 7. Reverts all changes (restores PR's fix) -8. Returns `{passed: bool, exhausted: bool}` See `.github/skills/try-fix/SKILL.md` for full details. @@ -143,6 +147,7 @@ Update the state file: - **try-fix found a simpler/better alternative** → Request changes with suggestion - **try-fix found same solution independently** → Strong validation, approve PR - **All try-fix attempts failed** → PR's fix is the only working solution, approve PR +- **Multiple passing alternatives** → Select simplest/most robust ### Step 4: Apply Selected Fix (if different from PR) @@ -151,7 +156,7 @@ Update the state file: **If a try-fix alternative was selected:** - Re-implement the fix (you documented the approach in the table) -- Commit the changes +- Apply the changes to files (do not commit - user handles git) ### Complete 🔧 Fix @@ -165,13 +170,26 @@ Update the state file: 5. Change 📋 Report status to `▶️ IN PROGRESS` **Before marking ✅ COMPLETE, verify state file contains:** -- [ ] Root Cause Analysis filled in (if applicable) +- [ ] Round 1 completed: All 5 models ran try-fix +- [ ] **Cross-pollination table exists** with responses from ALL 5 models: + ``` + | Model | Round 2 Response | + |-------|------------------| + | claude-sonnet-4.5 | NO NEW IDEAS | + | claude-opus-4.5 | NO NEW IDEAS | + | gpt-5.2 | NO NEW IDEAS | + | gpt-5.2-codex | NO NEW IDEAS | + | gemini-3-pro-preview | NO NEW IDEAS | + ``` - [ ] Fix Candidates table has numbered rows for each try-fix attempt - [ ] Each row has: approach, test result, files changed, notes -- [ ] "Exhausted" field set (Yes/No) +- [ ] "Exhausted" field set to Yes (all models confirmed no new ideas) - [ ] "Selected Fix" populated with reasoning +- [ ] Root cause analysis documented for the selected fix (to be surfaced in 📋 Report phase "### Root Cause" section) - [ ] No ⏳ PENDING markers remain in Fix section -- [ ] State file committed +- [ ] State file saved + +**🚨 If cross-pollination table is missing, you skipped Round 2. Go back and invoke each model.** --- @@ -195,41 +213,23 @@ If reviewing an existing PR, check if title/description need updates and include ### If Starting from Issue (No PR) - Create PR -1. **Ensure selected fix is applied and committed**: - ```bash - git add -A - git commit -m "Fix #XXXXX: [Description of fix]" - ``` - -2. **Create a feature branch** (if not already on one): - ```bash - git checkout -b fix/issue-XXXXX - ``` - -3. **⛔ STOP: Ask user for confirmation before creating PR**: +1. **⛔ STOP: Ask user to commit and create PR**: - Present a summary to the user and wait for explicit approval: - > "I'm ready to create a PR for issue #XXXXX. Here's what will be included: - > - **Branch**: fix/issue-XXXXX + Present a summary to the user and wait for them to handle git operations: + > "I've implemented the fix for issue #XXXXX. Here's what needs to be committed: > - **Selected fix**: Candidate #N - [approach] > - **Files changed**: [list files] > - **Tests added**: [list test files] > - **Other candidates considered**: [brief summary] > - > Would you like me to push and create the PR?" - - **Do NOT proceed until user confirms.** - -4. **Push and create PR** (after user confirmation): - - ```bash - git push -u origin fix/issue-XXXXX - gh pr create --title "[Platform] Brief description of behavior fix" --body "" - ``` + > Please commit these changes and create a PR when ready. + > Suggested PR title: `[Platform] Brief description of behavior fix` + > + > Use the pr-finalize skill output for the PR body." - Use the `pr-finalize` skill output as the `--body` argument. + **Do NOT run git commands. User handles commit/push/PR creation.** -5. **Update state file** with PR link +2. **Update state file** with PR link once user provides it ### If Starting from PR - Write Review @@ -242,6 +242,7 @@ Determine your recommendation based on the Fix phase: **If an alternative fix was selected:** - Recommend: `⚠️ REQUEST CHANGES` - Justification: Suggest the better approach from try-fix Candidate #N +- **Tell user:** "I've applied the alternative fix locally. Please review the changes and commit/push to update the PR." **If PR's fix failed tests:** - Recommend: `⚠️ REQUEST CHANGES` @@ -279,7 +280,7 @@ Update all phase statuses to complete. - [ ] Summary of findings - [ ] Key technical insights documented - [ ] Overall status changed to final recommendation -- [ ] State file committed +- [ ] State file saved --- @@ -287,8 +288,14 @@ Update all phase statuses to complete. - ❌ **Looking at PR's fix before generating ideas** - Generate fix ideas independently first - ❌ **Re-testing the PR's fix in try-fix** - Gate already validated it; try-fix tests YOUR ideas -- ❌ **Skipping the try-fix loop** - Always explore at least one independent alternative +- ❌ **Skipping models in Round 1** - All 5 models must run try-fix before cross-pollination +- ❌ **Running try-fix in parallel** - SEQUENTIAL ONLY - they modify same files and use same device +- ❌ **Stopping before cross-pollination** - Must share results and check for new ideas +- ❌ **Using explore/glob instead of invoking models** - Cross-pollination requires ACTUAL task agent invocations with each model, not code searches +- ❌ **Assuming "comprehensive coverage" = exhausted** - Only exhausted when all 5 models explicitly say "NO NEW IDEAS" +- ❌ **Not recording cross-pollination responses** - State file must have table showing each model's Round 2 response - ❌ **Not analyzing why fixes failed** - Record the flawed reasoning to help future attempts - ❌ **Selecting a failing fix** - Only select from passing candidates - ❌ **Forgetting to revert between attempts** - Each try-fix must start from broken baseline, end with PR restored +- ❌ **Declaring exhaustion prematurely** - All 5 models must confirm "no new ideas" via actual invocation - ❌ **Rushing the report** - Take time to write clear justification diff --git a/.github/instructions/agents.instructions.md b/.github/instructions/agents.instructions.md deleted file mode 100644 index d3d487bb2e79..000000000000 --- a/.github/instructions/agents.instructions.md +++ /dev/null @@ -1,154 +0,0 @@ ---- -applyTo: ".github/agents/**" ---- - -# Custom Agent Guidelines for Copilot CLI - -Agents in this repo target **Copilot CLI** as the primary interface. - -## Copilot CLI vs VS Code - -| Property | CLI | VS Code | Use It? | -|----------|-----|---------|---------| -| `name` | ✅ | ✅ | Yes | -| `description` | ✅ | ✅ | **Required** | -| `tools` | ✅ | ✅ | Optional | -| `infer` | ✅ | ✅ | Optional | -| `handoffs` | ❌ | ✅ | **No** - VS Code only | -| `model` | ❌ | ✅ | **No** - VS Code only | -| `argument-hint` | ❌ | ✅ | **No** - VS Code only | - ---- - -## Constraints - -| Constraint | Limit | -|------------|-------| -| Prompt body | **30,000 characters** max | -| Name | 64 chars, lowercase, letters/numbers/hyphens only | -| Description | **1,024 characters** max, **required** | -| Body length | < 300 lines ideal, < 500 max | - -### Name Format - -- ✅ `pr`, `write-tests-agent`, `sandbox-agent` -- ❌ `PR-Reviewer` (uppercase), `pr_reviewer` (underscores), `--name` (leading/consecutive hyphens) - ---- - -## Anti-Patterns (Do NOT Do) - -| Anti-Pattern | Why It's Bad | -|--------------|--------------| -| **Too long/verbose** | Wastes context tokens, slower responses | -| **Vague description** | Won't be discovered via `/agent` | -| **No "when to use" section** | Users won't know when to invoke | -| **Duplicating copilot-instructions.md** | Already loaded automatically | -| **Explaining what skills do** | Reference skill, don't duplicate docs | -| **Large inline code samples** | Move to separate files | -| **ASCII art diagrams** | Consume tokens - use sparingly | -| **VS Code features** | `handoffs`, `model`, `argument-hint` don't work in CLI | -| **GUI references** | No "click button" - CLI is terminal-based | - ---- - -## Best Practices - -### Description = Discovery - -The `/agent` command and auto-inference use description keywords: - -```yaml -# ✅ Good -description: Reviews PRs with independent analysis, validates tests catch bugs, proposes alternative fixes - -# ❌ Bad -description: Helps with code review stuff -``` - -### One Agent = One Role - -- ✅ `pr` - Reviews and works on PRs -- ❌ `everything-agent` - Too broad - -### Commands Over Concepts - -```markdown -# ✅ Good -git fetch origin pull/XXXXX/head:pr-XXXXX && git checkout pr-XXXXX - -# ❌ Bad -First you should fetch the PR and check it out locally -``` - -### Reference Skills, Don't Duplicate - -```markdown -# ✅ Good -Run: `pwsh .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1 -Platform android` - -# ❌ Bad -The skill does: 1. Detects fix files... 2. Detects test classes... [30 more lines] -``` - ---- - -## Tool Aliases - -| Alias | Purpose | -|-------|---------| -| `execute` / `shell` | Run shell commands | -| `read` | Read file contents | -| `edit` / `write` | Modify files | -| `search` / `grep` | Search files/content | -| `agent` | Invoke other agents | - -```yaml -tools: ["read", "search"] # Read-only agent -tools: ["read", "search", "edit", "execute"] # Full dev agent -``` - ---- - -## Minimal Structure - -```yaml ---- -name: my-agent -description: Does X when user asks Y. Keywords: review, test, fix. ---- - -# Agent Title - -Brief philosophy. - -## When to Use -- ✅ "trigger phrase" - -## When NOT to Use -- ❌ Other task → Use `other-agent` - -## Workflow -1. Step one -2. Step two - -## Quick Reference -| Task | Command | -|------|---------| -| Do X | `command` | - -## Common Mistakes -- ❌ **Mistake** - Why it's wrong -``` - ---- - -## Checklist - -- [ ] YAML frontmatter with `name` and `description` -- [ ] `description` has trigger keywords -- [ ] Body under 500 lines -- [ ] No `handoffs`, `model`, `argument-hint` -- [ ] No GUI/button references -- [ ] Skills referenced, not duplicated -- [ ] "When to Use" / "When NOT to Use" included diff --git a/.github/scripts/Review-PR.ps1 b/.github/scripts/Review-PR.ps1 new file mode 100644 index 000000000000..250a6eaf20c5 --- /dev/null +++ b/.github/scripts/Review-PR.ps1 @@ -0,0 +1,336 @@ +<# +.SYNOPSIS + Runs a PR review using Copilot CLI and the PR Agent workflow. + +.DESCRIPTION + This script invokes Copilot CLI to perform a comprehensive 5-phase PR review: + + Phase 1: Pre-Flight - Context gathering + Phase 2: Tests - Verify test existence + Phase 3: Gate - Verify tests catch the bug + Phase 4: Fix - Multi-model exploration of alternatives + Phase 5: Report - Final recommendation + + The script: + - Validates prerequisites (gh CLI, PR exists) + - Validates current branch is not protected (main, release/*, net*.0) + - Merges the PR into the current branch (for isolated testing) + - Creates the state directory + - Invokes Copilot CLI with the pr agent + +.PARAMETER PRNumber + The GitHub PR number to review (e.g., 33687) + +.PARAMETER Platform + The platform to use for testing. Default is 'android'. + Valid values: android, ios, windows, maccatalyst + +.PARAMETER SkipMerge + If specified, skips merging the PR into the current branch (useful if already merged) + +.PARAMETER Interactive + If specified, starts Copilot in interactive mode with the prompt (default). + Use -NoInteractive for non-interactive mode that exits after completion. + +.PARAMETER NoInteractive + If specified, runs in non-interactive mode (exits after completion). + Requires --allow-all for tool permissions. + +.PARAMETER DryRun + If specified, shows what would be done without making changes + +.EXAMPLE + .\Review-PR.ps1 -PRNumber 33687 + Reviews PR #33687 interactively using the default platform (android) + +.EXAMPLE + .\Review-PR.ps1 -PRNumber 33687 -Platform ios -SkipMerge + Reviews PR #33687 on iOS without merging (assumes already merged), in interactive mode + +.EXAMPLE + .\Review-PR.ps1 -PRNumber 33687 -NoInteractive + Reviews PR #33687 in non-interactive mode (exits after completion) + +.NOTES + Prerequisites: + - GitHub CLI (gh) installed and authenticated + - Copilot CLI (copilot) installed + - For testing: Appropriate platform tools (Appium, emulators, etc.) +#> + +[CmdletBinding()] +param( + [Parameter(Mandatory = $true, Position = 0)] + [int]$PRNumber, + + [Parameter(Mandatory = $false)] + [ValidateSet('android', 'ios', 'windows', 'maccatalyst')] + [string]$Platform, # Optional - agent will determine appropriate platform if not specified + + [Parameter(Mandatory = $false)] + [switch]$SkipMerge, + + [Parameter(Mandatory = $false)] + [switch]$NoInteractive, + + [Parameter(Mandatory = $false)] + [switch]$DryRun +) + +$ErrorActionPreference = 'Stop' + +# Get repository root +$RepoRoot = git rev-parse --show-toplevel 2>$null +if (-not $RepoRoot) { + Write-Error "Not in a git repository" + exit 1 +} + +Write-Host "" +Write-Host "╔═══════════════════════════════════════════════════════════╗" -ForegroundColor Cyan +Write-Host "║ PR Review with Copilot CLI ║" -ForegroundColor Cyan +Write-Host "╠═══════════════════════════════════════════════════════════╣" -ForegroundColor Cyan +Write-Host "║ PR: #$PRNumber ║" -ForegroundColor Cyan +if ($Platform) { + Write-Host "║ Platform: $Platform ║" -ForegroundColor Cyan +} else { + Write-Host "║ Platform: (agent will determine) ║" -ForegroundColor Cyan +} +Write-Host "╚═══════════════════════════════════════════════════════════╝" -ForegroundColor Cyan +Write-Host "" + +# Step 1: Verify prerequisites +Write-Host "📋 Checking prerequisites..." -ForegroundColor Yellow + +# Check GitHub CLI +$ghVersion = gh --version 2>$null | Select-Object -First 1 +if (-not $ghVersion) { + Write-Error "GitHub CLI (gh) is not installed. Install from: https://cli.github.com/" + exit 1 +} +Write-Host " ✅ GitHub CLI: $ghVersion" -ForegroundColor Green + +# Check Copilot CLI +$copilotVersion = copilot --version 2>$null +if (-not $copilotVersion) { + Write-Error "Copilot CLI is not installed. Install with: npm install -g @github/copilot" + exit 1 +} +Write-Host " ✅ Copilot CLI: $copilotVersion" -ForegroundColor Green + +# Check PR exists +Write-Host " 🔍 Verifying PR #$PRNumber exists..." -ForegroundColor Gray +$prInfo = gh pr view $PRNumber --json title,state,url 2>$null | ConvertFrom-Json +if (-not $prInfo) { + Write-Error "PR #$PRNumber not found or not accessible" + exit 1 +} +Write-Host " ✅ PR: $($prInfo.title)" -ForegroundColor Green +Write-Host " ✅ State: $($prInfo.state)" -ForegroundColor Green + +# Step 2: Validate current branch and merge PR +Write-Host "" +$currentBranch = git branch --show-current +Write-Host "📍 Current branch: $currentBranch" -ForegroundColor Yellow + +# Check if on a protected branch (main, release/*, net*.0) +$protectedBranches = @('main', 'master') +$isProtected = $protectedBranches -contains $currentBranch -or + $currentBranch -match '^release/' -or + $currentBranch -match '^net\d+\.0$' + +if ($isProtected) { + Write-Host "" + Write-Host "╔═══════════════════════════════════════════════════════════╗" -ForegroundColor Red + Write-Host "║ ERROR: Cannot run on protected branch! ║" -ForegroundColor Red + Write-Host "╠═══════════════════════════════════════════════════════════╣" -ForegroundColor Red + Write-Host "║ Current branch: $currentBranch" -ForegroundColor Red + Write-Host "║ ║" -ForegroundColor Red + Write-Host "║ This script merges the PR into the current branch. ║" -ForegroundColor Red + Write-Host "║ Protected branches: main, release/*, net*.0 ║" -ForegroundColor Red + Write-Host "║ ║" -ForegroundColor Red + Write-Host "║ Please checkout a working branch first: ║" -ForegroundColor Red + Write-Host "║ git checkout -b pr-review-$PRNumber ║" -ForegroundColor Red + Write-Host "╚═══════════════════════════════════════════════════════════╝" -ForegroundColor Red + Write-Host "" + exit 1 +} + +Write-Host " ✅ Branch '$currentBranch' is not protected" -ForegroundColor Green + +# Merge the PR into the current branch (unless skipped) +if (-not $SkipMerge) { + Write-Host "" + Write-Host "🔀 Merging PR #$PRNumber into current branch..." -ForegroundColor Yellow + + if ($DryRun) { + Write-Host " [DRY RUN] Would fetch and merge PR #$PRNumber" -ForegroundColor Magenta + } else { + # Fetch the PR ref and merge it + Write-Host " 📥 Fetching PR #$PRNumber..." -ForegroundColor Gray + git fetch origin pull/$PRNumber/head:temp-pr-$PRNumber 2>$null + if ($LASTEXITCODE -ne 0) { + # Try fetching from the PR's head repository (for fork PRs) + $prDetails = gh pr view $PRNumber --json headRepositoryOwner,headRefName 2>$null | ConvertFrom-Json + if ($prDetails) { + $forkOwner = $prDetails.headRepositoryOwner.login + $headRef = $prDetails.headRefName + Write-Host " 📥 PR is from fork: $forkOwner, fetching..." -ForegroundColor Gray + git fetch "https://github.com/$forkOwner/maui.git" "${headRef}:temp-pr-$PRNumber" + if ($LASTEXITCODE -ne 0) { + Write-Error "Failed to fetch PR #$PRNumber from fork" + exit 1 + } + } else { + Write-Error "Failed to fetch PR #$PRNumber" + exit 1 + } + } + + Write-Host " 🔀 Merging into '$currentBranch'..." -ForegroundColor Gray + git merge "temp-pr-$PRNumber" --no-edit + if ($LASTEXITCODE -ne 0) { + Write-Host "" + Write-Host "⚠️ Merge conflict detected!" -ForegroundColor Red + Write-Host " Please resolve conflicts manually and re-run the script with -SkipMerge" -ForegroundColor Yellow + git merge --abort 2>$null + git branch -D "temp-pr-$PRNumber" 2>$null + exit 1 + } + + # Clean up temp branch + git branch -D "temp-pr-$PRNumber" 2>$null + + Write-Host " ✅ PR #$PRNumber merged into '$currentBranch'" -ForegroundColor Green + } +} else { + Write-Host "" + Write-Host "⏭️ Skipping merge (assuming PR is already merged)" -ForegroundColor Yellow +} + +# Step 3: Ensure state directory exists +$stateDir = Join-Path $RepoRoot "CustomAgentLogsTmp/PRState" +if (-not (Test-Path $stateDir)) { + New-Item -ItemType Directory -Path $stateDir -Force | Out-Null + Write-Host " 📁 Created state directory: $stateDir" -ForegroundColor Gray +} + +# Step 4: Build the prompt for Copilot CLI +$planTemplatePath = ".github/agents/pr/PLAN-TEMPLATE.md" + +# Build platform instruction +$platformInstruction = if ($Platform) { + "**Platform for testing:** $Platform" +} else { + "**Platform for testing:** Determine the appropriate platform(s) based on the PR's affected code paths and the current host OS." +} + +$prompt = @" +Review PR #$PRNumber using the pr agent workflow. + +$platformInstruction + +**Instructions:** +1. Read the plan template at ``$planTemplatePath`` for the 5-phase workflow +2. Read ``.github/agents/pr.md`` for Phases 1-3 instructions +3. Follow ALL critical rules, especially: + - STOP on environment blockers and ask before continuing + - Use task agent for Gate verification + - Run multi-model try-fix in Phase 4 + +**Start with Phase 1: Pre-Flight** +- Create state file: CustomAgentLogsTmp/PRState/pr-$PRNumber.md +- Gather context from PR #$PRNumber +- Proceed through all 5 phases + +Begin the review now. +"@ + +Write-Host "" +Write-Host "═══════════════════════════════════════════════════════════" -ForegroundColor DarkGray +Write-Host "" + +if ($DryRun) { + Write-Host "[DRY RUN] Would invoke Copilot CLI with:" -ForegroundColor Magenta + Write-Host "" + Write-Host " Agent: pr" -ForegroundColor Gray + Write-Host " Mode: $(if ($NoInteractive) { 'Non-interactive (-p)' } else { 'Interactive (-i)' })" -ForegroundColor Gray + Write-Host " PR: #$PRNumber" -ForegroundColor Gray + Write-Host " Platform: $(if ($Platform) { $Platform } else { '(agent will determine)' })" -ForegroundColor Gray + Write-Host "" + Write-Host "Prompt:" -ForegroundColor Gray + Write-Host $prompt -ForegroundColor DarkGray + Write-Host "" + Write-Host "To run for real, remove the -DryRun flag" -ForegroundColor Yellow +} else { + Write-Host "╔═══════════════════════════════════════════════════════════╗" -ForegroundColor Green + Write-Host "║ LAUNCHING COPILOT CLI ║" -ForegroundColor Green + Write-Host "╚═══════════════════════════════════════════════════════════╝" -ForegroundColor Green + Write-Host "" + Write-Host "PR Review Context:" -ForegroundColor Cyan + Write-Host " PR_NUMBER: $PRNumber" -ForegroundColor White + Write-Host " PLATFORM: $(if ($Platform) { $Platform } else { '(agent will determine)' })" -ForegroundColor White + Write-Host " STATE_FILE: CustomAgentLogsTmp/PRState/pr-$PRNumber.md" -ForegroundColor White + Write-Host " PLAN_TEMPLATE: $planTemplatePath" -ForegroundColor White + Write-Host " CURRENT_BRANCH: $(git branch --show-current)" -ForegroundColor White + Write-Host " PR_TITLE: $($prInfo.title)" -ForegroundColor White + Write-Host " MODE: $(if ($NoInteractive) { 'Non-interactive' } else { 'Interactive' })" -ForegroundColor White + Write-Host "" + Write-Host "─────────────────────────────────────────────────────────────" -ForegroundColor DarkGray + Write-Host "" + + # Build the copilot command arguments + $copilotArgs = @( + "--agent", "pr", + "--stream", "on" # Enable streaming for real-time output + ) + + # NOTE: --deny-tool does NOT work with --allow-all (allow-all takes precedence) + # Branch switching prevention relies on agent instructions in pr.md only + + # Create log directory for this PR + $prLogDir = Join-Path $RepoRoot "CustomAgentLogsTmp/PRState/$PRNumber/copilot-logs" + if (-not (Test-Path $prLogDir)) { + New-Item -ItemType Directory -Path $prLogDir -Force | Out-Null + } + + # Add logging options + $copilotArgs += @("--log-dir", $prLogDir, "--log-level", "info") + + if ($NoInteractive) { + # Non-interactive mode: -p with --allow-all + # Also save session to markdown for review + $sessionFile = Join-Path $prLogDir "session-$(Get-Date -Format 'yyyyMMdd-HHmmss').md" + $copilotArgs += @("-p", $prompt, "--allow-all", "--share", $sessionFile) + } else { + # Interactive mode: -i to start with prompt + $copilotArgs += @("-i", $prompt) + } + + Write-Host "🚀 Starting Copilot CLI..." -ForegroundColor Yellow + Write-Host "" + + # Invoke Copilot CLI + & copilot @copilotArgs + + $exitCode = $LASTEXITCODE + + Write-Host "" + Write-Host "═══════════════════════════════════════════════════════════" -ForegroundColor DarkGray + if ($exitCode -eq 0) { + Write-Host "✅ Copilot CLI completed successfully" -ForegroundColor Green + } else { + Write-Host "⚠️ Copilot CLI exited with code: $exitCode" -ForegroundColor Yellow + } +} + +Write-Host "" +Write-Host "📝 State file: CustomAgentLogsTmp/PRState/pr-$PRNumber.md" -ForegroundColor Gray +Write-Host "📋 Plan template: $planTemplatePath" -ForegroundColor Gray +if (-not $DryRun) { + Write-Host "📁 Copilot logs: CustomAgentLogsTmp/PRState/$PRNumber/copilot-logs/" -ForegroundColor Gray + if ($NoInteractive) { + Write-Host "📄 Session markdown: $sessionFile" -ForegroundColor Gray + } +} +Write-Host "" diff --git a/.github/skills/try-fix/SKILL.md b/.github/skills/try-fix/SKILL.md index 0fd1188c7fa2..0e89f53ba9a7 100644 --- a/.github/skills/try-fix/SKILL.md +++ b/.github/skills/try-fix/SKILL.md @@ -343,11 +343,11 @@ Provide structured output to the invoker: [The actual changes made] ``` -**Exhausted:** Yes/No -**Reasoning:** [Why you believe there are/aren't more viable approaches] +**This Attempt's Status:** Done/NeedsRetry +**Reasoning:** [Why this specific approach succeeded or failed] ``` -**Determining Exhaustion:** Set `exhausted=true` when you've tried the same fundamental approach multiple times, all hints have been explored, failure analysis reveals the problem is outside target files, or no new ideas remain. Set `exhausted=false` when this is the first attempt, failure analysis suggests a different approach, hints remain unexplored, or the approach was close but needs refinement. +**Determining Status:** Set `Done` when you've completed testing this approach (whether it passed or failed). Set `NeedsRetry` only if you hit a transient error (network timeout, flaky test) and want to retry the same approach. ### Step 10: Update State File (if provided) @@ -361,15 +361,16 @@ If `state_file` input was provided and file exists: |---|--------|----------|-------------|---------------|-------| | N | try-fix #N | [approach] | ✅ PASS / ❌ FAIL | [files] | [analysis] | -4. **Set exhausted status** based on your determination above -5. **Commit state file:** +4. **Commit state file:** ```bash -git add "$STATE_FILE" && git commit -m "try-fix: attempt #N (exhausted=$EXHAUSTED)" +git add "$STATE_FILE" && git commit -m "try-fix: attempt #N" ``` **If no state file provided:** Skip this step (results returned to invoker only). -**Ownership rule:** try-fix updates its own row AND the exhausted field. Never modify: +**⚠️ IMPORTANT: Do NOT set any "Exhausted" field.** Cross-pollination exhaustion is determined by the pr agent after invoking ALL 5 models and confirming none have new ideas. try-fix only reports its own attempt result. + +**Ownership rule:** try-fix updates its own row ONLY. Never modify: - Phase status fields - "Selected Fix" field - Other try-fix rows