Add scrape-and-improve skill, custom agent, and workflow for agent instruction refinement#33939
Add scrape-and-improve skill, custom agent, and workflow for agent instruction refinement#33939
Conversation
…teractions Co-authored-by: PureWeen <5375137+PureWeen@users.noreply.github.com>
|
Can you generate suggestions based on 1 ) memories |
…cent PR suggestion analysis Co-authored-by: PureWeen <5375137+PureWeen@users.noreply.github.com>
Added all three in commit
|
|
Convert this to a custom agent And then do a run of all the scrapings and show me the results Check in the improvements |
…ions from analysis New agent: .github/agents/scrape-and-improve.md Applied findings: Common Fix Patterns section in copilot-instructions.md - NavigationPage handler disconnection (from PR #32289 - 6 attempts) - CollectionView EmptyView Android (from PR #33134 - 8 attempts) - Shell navigation iOS tests (from PR #33380 - 4 attempts) - Device test isolation (from PR #33406 - 3 attempts) Scrape results: 5 sessions, 21 fix attempts, 47.6% success rate, 7 recommendations Co-authored-by: PureWeen <5375137+PureWeen@users.noreply.github.com>
Done in commit 1. Custom Agent Created — 2. Scraping Results (run against current repo):
Slow-discovery PRs: #32289 (6 attempts), #33134 (8 attempts), #33380 (4 attempts), #33406 (3 attempts). 3. Improvements Checked In — Added "Common Fix Patterns (From Agent Analysis)" section to
|
🤖 AI Summary📊 Expand Full Review🔍 Pre-Flight — Context & Validation📝 Review Session — Add scrape-and-improve custom agent and apply improvement recommendations from analysis ·
|
| File | Type | Description |
|---|---|---|
.github/actions/scrape-and-improve/action.yml |
New | Composite GitHub Action wrapping both scripts |
.github/agents/scrape-and-improve.md |
New | Custom agent definition (5-phase: Collect→Analyze→Present→Apply→Verify) |
.github/copilot-instructions.md |
Modified | Adds "Common Fix Patterns" section + registers new skill/agent |
.github/skills/scrape-and-improve/SKILL.md |
New | Skill documentation |
.github/skills/scrape-and-improve/scripts/Analyze-And-Recommend.ps1 |
New | Pattern analysis script |
.github/skills/scrape-and-improve/scripts/Collect-AgentData.ps1 |
New | Data collection script |
.github/workflows/scrape-and-improve.yml |
New | Weekly scheduled workflow + manual dispatch |
Total: 6 new files, 1 modified file (+1736 lines, -1 lines)
PR Discussion Summary
| Comment | Author | Content |
|---|---|---|
| Initial request | PureWeen | "Add a skill able to scrape memories, Copilot sessions, CCA sessions and Copilot comments to generate instruction updates. Also add an action to trigger this." |
| Iteration 1 | PureWeen | "Generate suggestions based on memories, CCA GitHub Copilot, and scrape most recent 20 PRs" |
| Response 1 | Copilot | Added -MemoryContext parameter, CCA session scanning, and -RecentPRCount 20 parameter |
| Iteration 2 | PureWeen | "Convert this to a custom agent. Run all scrapings and show me results. Check in improvements." |
| Response 2 | Copilot | Created custom agent, ran scraping (5 session files, 21 memories, 21 fix attempts, 47.6% success rate, 7 recommendations), applied "Common Fix Patterns" section to copilot-instructions.md |
No disagreements or edge case discussions in PR comments.
Fix Candidates (for Infrastructure PR)
| # | Source | Approach | Test Result | Files Changed | Notes |
|---|---|---|---|---|---|
| PR | PR #33939 | Add scrape-and-improve infrastructure (skill, agent, action, workflow, fix patterns in copilot-instructions.md) | ⏳ PENDING (Gate) | 7 files (+1736) | Original PR |
Key Observations
- No functional MAUI test code — no
TestCases.HostApporTestCases.Shared.Testsfiles - No related GitHub issue — the PR originated from an in-PR conversation request
- Applied improvements included —
copilot-instructions.mdnow has "Common Fix Patterns" section with concrete guidance on NavigationPage, CollectionView, Shell, and device test isolation patterns - Gate assessment: There are NO UI/device tests to verify; this is a tooling PR
🚦 Gate — Test Verification
📝 Review Session — Add scrape-and-improve custom agent and apply improvement recommendations from analysis · de8d59e
Gate: PR #33939
Result:
Platform: N/A
Mode: N/A
Gate Assessment
This PR is a pure tooling/infrastructure PR. It contains:
- PowerShell scripts (
.github/skills/scrape-and-improve/scripts/) - GitHub Actions composite action (
.github/actions/scrape-and-improve/action.yml) - GitHub Actions workflow (
.github/workflows/scrape-and-improve.yml) - Agent/skill documentation (
.github/agents/,.github/skills/) - Documentation updates to
copilot-instructions.md
No MAUI source code changed. No UI tests, device tests, or unit tests exist for this PR, nor are they applicable (PowerShell/YAML tooling has no testable MAUI behavior).
Gate Decision
Gate verification is NOT APPLICABLE for this PR type. The PR does not modify any MAUI framework code, controls, handlers, or platform code that would require test verification.
Proceeding to Report phase with code review of the tooling/infrastructure changes.
🔧 Fix — Analysis & Comparison
📝 Review Session — Add scrape-and-improve custom agent and apply improvement recommendations from analysis · de8d59e
Fix (Try-Fix): PR #33939
Status: SKIPPED
Reason: This is a tooling/infrastructure PR. It adds PowerShell scripts, GitHub Actions, agent documentation, and skill definitions — no MAUI application code, no testable behavior via UI tests.
The try-fix phase is not applicable because:
- There is no bug to fix — this PR adds new features
- There are no MAUI tests to verify a fix against
- The PR's changes cannot be validated via
BuildAndRunHostApp.ps1
Fix Candidates
| # | Source | Approach | Test Result | Files Changed | Notes |
|---|---|---|---|---|---|
| PR | PR #33939 | Add scrape-and-improve infrastructure (skill, agent, action, workflow, fix patterns) | N/A (no tests) | 7 files (+1736) | Infrastructure/tooling PR |
Exhausted: N/A (skipped - infra PR)
Selected Fix: PR's approach — the only applicable approach for an infrastructure PR
📋 Report — Final Recommendation
📝 Review Session — Add scrape-and-improve custom agent and apply improvement recommendations from analysis · de8d59e
Report: PR #33939
⚠️ Final Recommendation: COMMENT (Request Review)
Summary
PR #33939 adds a new scrape-and-improve skill, custom agent, GitHub composite action, and scheduled workflow for automatically analyzing agent interaction data and generating instruction improvement recommendations. It also applies initial findings directly to copilot-instructions.md as a "Common Fix Patterns" section.
This is an infrastructure/tooling PR with no MAUI application code changes. Gate verification was not applicable. The try-fix phase was skipped (no testable fix to verify).
Root Cause (of the need for this PR)
Agent PR sessions have historically required multiple fix attempts for certain problem types (e.g., NavigationPage handler disconnection: 6 attempts, CollectionView EmptyView: 8 attempts). This PR adds automation to identify such patterns and codify learnings as instruction file updates.
Fix Quality Assessment
The PR is well-structured and achieves its stated goal. The tooling correctly:
- Collects data from 5 sources (agent sessions, Copilot comments, CCA logs, memories, recent PR reviews)
- Analyzes patterns (slow discovery, quick success, memory frequency, suggestion rejection rates)
- Generates prioritized recommendations
- Applies initial findings to
copilot-instructions.md
Issues Found
🔴 Bug: Suggestion acceptance/rejection rate calculation is incorrect
File: .github/skills/scrape-and-improve/scripts/Collect-AgentData.ps1 lines 483-495
The acceptance/rejection pattern matching (lgtm, looks good, disagree, won't fix, etc.) applies to ALL review comments, not just replies to Copilot-authored suggestions. This causes inflated or impossible statistics (e.g., if Copilot posts 1 suggestion and 2 humans reply "lgtm", the script reports 2 accepted out of 1 Copilot suggestion = 200% acceptance rate).
The copilotSuggestions counter is correctly incremented only for Copilot-authored comments (line 475), but suggestionsAccepted/suggestionsRejected count human responses in any comment thread.
A better approach: use in_reply_to_id from the GitHub API to track which human comments are replies to Copilot suggestions, or at minimum only count acceptance/rejection in comments that are replies to Copilot-authored comments.
🟡 Minor: Repository parameter not passed to Collect-AgentData.ps1 via the action
File: .github/actions/scrape-and-improve/action.yml
The Collect-AgentData.ps1 script has a $Repository parameter (line 48 of the script, default "dotnet/maui"). In the action, it's passed via $params["Repository"] = "${{ github.repository }}" — this is good. But there is no explicit documentation of this parameter in the action's inputs, so someone using the action on a fork would need to know the repository is auto-detected from github.repository.
🟡 Minor: Workflow doesn't pass memory-context input
File: .github/workflows/scrape-and-improve.yml
The action supports a memory-context input for analyzing structured memory blocks, but the workflow doesn't expose this as a workflow_dispatch input. Memory context provides the richest data for the analysis. Consider adding it as an optional input.
PR Description Accuracy
The PR description accurately describes the implementation. The "Applied Improvements from Initial Run" section provides concrete metrics (5 session files, 21 memories, 21 fix attempts, 47.6% success rate, 7 recommendations) and the Common Fix Patterns in copilot-instructions.md are well-documented with evidence from specific PRs.
Positive Aspects
- ✅ Workflow uses
github.repository_owner == 'dotnet'guard to prevent running on forks - ✅ Permissions are minimal (read-only: contents, issues, pull-requests)
- ✅
ErrorActionPreference = "Continue"prevents script halting on non-critical errors - ✅ All data collection sections have
gh CLI not availablefallbacks - ✅ Recommendations are evidence-based with specific PR citations
- ✅ Artifacts are uploaded with 30-day retention
- ✅ The "Common Fix Patterns" content in
copilot-instructions.mdis accurate and based on real PR data - ✅
ci: github.repository_owner == 'dotnet'guard prevents fork runs - ✅ Weekly schedule is appropriate for this type of background analysis
📋 Expand PR Finalization Review
Title: ✅ Good
Current: Add scrape-and-improve skill, custom agent, and workflow for agent instruction refinement
Description: ✅ Excellent
Description needs updates. See details below.
Missing Elements:
** Prepend the required NOTE block to the top of the description.
Phase 2: Code Review
See code-review.md for detailed findings.
Summary:
- 🟡 3 suggestions (non-blocking, moderate improvements)
- ✅ Implementation is generally solid and well-structured
Recommendation
- Add NOTE block at top of description (required for all PRs)
- Title is fine as-is — no change needed
- Code review findings are suggestions only (no critical issues)
The PR is a tooling/infrastructure addition (skill + agent + workflow) with no impact on MAUI runtime code. The copilot-instructions.md changes are additive documentation of patterns from the initial run.
✨ Suggested PR Description
[!NOTE]
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Adds automated tooling to scrape agent interaction data (PR sessions, Copilot comments, CCA sessions, memories, recent PR reviews) and generate instruction file improvement recommendations — both as a reusable skill and as an autonomous custom agent that applies improvements directly.
New Skill: scrape-and-improve
Collect-AgentData.ps1— Gathers data from five sources:.github/agent-pr-session/*.md(fix candidates, root causes, phase statuses)- PR comments via
ghCLI (AI Summary markers, try-fix attempts, test verification) CustomAgentLogsTmp/PRState/(CCA session state with convention/build-command pattern detection)- Repository memories via
-MemoryContextparameter (structured agent memory blocks: subject, fact, citations) - Most recent N PRs via
-RecentPRCount(default: 20) — review comments, Copilot suggestion acceptance/rejection rates, review hotspot areas by file path
Analyze-And-Recommend.ps1— Pattern detection across collected data: slow discovery (multiple attempts), quick successes, repeated failure approaches, common root causes, recurring memory subjects, suggestion rejection rates, and review comment hotspots. Outputs prioritized recommendations with evidence.
New Custom Agent: scrape-and-improve
.github/agents/scrape-and-improve.md— Autonomous 5-phase agent (Collect → Analyze → Present → Apply → Verify) that runs all scrapings and applies High/Medium priority instruction improvements directly to instruction files, skills, andcopilot-instructions.md.- Registered as agent Update README.md #5 in
copilot-instructions.md.
GitHub Action & Workflow
- Composite action (
.github/actions/scrape-and-improve/) wraps both scripts, uploads artifacts - Workflow (
.github/workflows/scrape-and-improve.yml) runs weekly on schedule or viaworkflow_dispatchwith optional PR number, label, date filters, and recent PR count
Applied Improvements from Initial Run
Ran all scrapings against the repo (5 session files, 21 memories, 21 fix attempts, 47.6% success rate, 7 recommendations). Applied findings as a new "Common Fix Patterns (From Agent Analysis)" section in copilot-instructions.md documenting what works and what fails for:
- NavigationPage handler disconnection (PR Fix handler not disconnected when removing non visible pages using RemovePage() #32289, 6 attempts) — fix is in Legacy
RemovePagepath - CollectionView EmptyView Android (PR [Android] Fixed EmptyView doesn’t display when CollectionView is placed inside a VerticalStackLayout #33134, 8 attempts) — normalize
int.MaxValueback todouble.PositiveInfinity - Shell navigation iOS tests (PR [iOS] Fix Shell long-press back button not triggering navigation events #33380, 4 attempts) — simplify
DidPopItemto always sync stacks - Device test isolation (PR [iOS] Fixed Shell navigation on search handler suggestion selection #33406, 3 attempts) — check for leaked state from previous tests
Code Review: ✅ Passed
Code Review: PR #33939
PR: Add scrape-and-improve skill, custom agent, and workflow for agent instruction refinement
Scope: Tooling/infrastructure only — no MAUI runtime code changes
Code Review Findings
🟡 Suggestions
1. Workflow missing memory-context input wiring
- File:
.github/workflows/scrape-and-improve.yml - Observation: The composite action (
action.yml) accepts amemory-contextinput, but the workflow (scrape-and-improve.yml) does not expose it as aworkflow_dispatchinput and does not pass it to the action. This means scheduled/dispatch runs can never pass memory context. - Recommendation: Either add
memory_contextas an optionalworkflow_dispatchinput and wire it through, or document that this input is only usable when calling the action directly (not via the workflow).
2. PowerShell $ErrorActionPreference = "Continue" in analysis script
- File:
.github/skills/scrape-and-improve/scripts/Analyze-And-Recommend.ps1(line ~30) - Observation:
$ErrorActionPreference = "Continue"silently swallows errors. For a script that parses JSON and writes reports, a silent failure could produce an empty/corrupt output file that downstream steps (like the artifact upload) would capture without any CI signal. - Recommendation: Use
$ErrorActionPreference = "Stop"and wrap JSON parsing in atry/catchwith a clear error message, or at minimum change it to"SilentlyContinue"only for specific commands that are expected to fail (e.g., missing optional files).
3. Inline ${{ inputs.memory-context }} in shell script is injection-prone
- File:
.github/actions/scrape-and-improve/action.yml(line ~64) - Observation: The composite action passes
memory-contextinput directly into a shellifcondition via"${{ inputs.memory-context }}". If the memory context contains PowerShell special characters or newlines, this could break the script or produce unexpected behavior. The action runs withshell: pwshso PowerShell injection is the concern. - Recommendation: Pass multi-line or complex inputs via an environment variable rather than inline template substitution:
env: MEMORY_CONTEXT: ${{ inputs.memory-context }} run: | if ($env:MEMORY_CONTEXT) { $params["MemoryContext"] = $env:MEMORY_CONTEXT }
✅ Looks Good
- Script structure — Both PowerShell scripts are well-organized with clear sections, meaningful variable names, and helpful
Write-Hostprogress output. - Analysis thresholds — Constants like
$HIGH_REJECTION_THRESHOLD = 30,$HOTSPOT_COMMENT_THRESHOLD = 5, and$MEMORY_FREQUENCY_THRESHOLD = 2are clearly named and documented at the top ofAnalyze-And-Recommend.ps1. - Agent documentation — The
scrape-and-improve.mdagent file is thorough: 5-phase workflow, error handling table, and clear distinction between skill vs. agent modes. - SKILL.md structure — Follows the established pattern of other skills in the repo (inputs, outputs, workflow steps, error handling, integration section).
copilot-instructions.mdadditions — The "Common Fix Patterns" section is well-targeted and provides concrete, actionable guidance for future agents with specific PR references and ✅/❌ formatting consistent with the rest of the file.- Workflow safety —
if: github.repository_owner == 'dotnet'guard prevents the scheduled workflow from running on forks. - Permissions — Workflow uses minimal permissions (
contents: read,issues: read,pull-requests: read) — no write permissions needed since the action only uploads artifacts. - Artifact retention — 30-day retention on the analysis artifacts is appropriate for a weekly workflow.
Adds automated tooling to scrape agent interaction data (PR sessions, Copilot comments, CCA sessions, memories, recent PR reviews) and generate instruction file improvement recommendations — both as a reusable skill and as an autonomous custom agent that applies improvements directly.
New Skill:
scrape-and-improveCollect-AgentData.ps1— Gathers data from five sources:.github/agent-pr-session/*.md(fix candidates, root causes, phase statuses)ghCLI (AI Summary markers, try-fix attempts, test verification)CustomAgentLogsTmp/PRState/(CCA session state with convention/build-command pattern detection)-MemoryContextparameter (structured agent memory blocks: subject, fact, citations)-RecentPRCount(default: 20) — review comments, Copilot suggestion acceptance/rejection rates, review hotspot areas by file pathAnalyze-And-Recommend.ps1— Pattern detection across collected data: slow discovery (multiple attempts), quick successes, repeated failure approaches, common root causes, recurring memory subjects, suggestion rejection rates, and review comment hotspots. Outputs prioritized recommendations with evidence.New Custom Agent:
scrape-and-improve.github/agents/scrape-and-improve.md— Autonomous 5-phase agent (Collect → Analyze → Present → Apply → Verify) that runs all scrapings and applies High/Medium priority instruction improvements directly to instruction files, skills, andcopilot-instructions.md.copilot-instructions.md.GitHub Action & Workflow
.github/actions/scrape-and-improve/) wraps both scripts, uploads artifacts.github/workflows/scrape-and-improve.yml) runs weekly on schedule or viaworkflow_dispatchwith optional PR number, label, date filters, and recent PR countApplied Improvements from Initial Run
Ran all scrapings against the repo (5 session files, 21 memories, 21 fix attempts, 47.6% success rate, 7 recommendations). Applied findings as a new "Common Fix Patterns (From Agent Analysis)" section in
copilot-instructions.mddocumenting what works and what fails for:RemovePagepathint.MaxValueback todouble.PositiveInfinityDidPopItemto always sync stacksOriginal prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.