Skip to content

Comments

Add scrape-and-improve skill, custom agent, and workflow for agent instruction refinement#33939

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/add-memory-scraping-skill
Draft

Add scrape-and-improve skill, custom agent, and workflow for agent instruction refinement#33939
Copilot wants to merge 4 commits intomainfrom
copilot/add-memory-scraping-skill

Conversation

Copy link
Contributor

Copilot AI commented Feb 7, 2026

Adds automated tooling to scrape agent interaction data (PR sessions, Copilot comments, CCA sessions, memories, recent PR reviews) and generate instruction file improvement recommendations — both as a reusable skill and as an autonomous custom agent that applies improvements directly.

New Skill: scrape-and-improve

  • Collect-AgentData.ps1 — Gathers data from five sources:
    • .github/agent-pr-session/*.md (fix candidates, root causes, phase statuses)
    • PR comments via gh CLI (AI Summary markers, try-fix attempts, test verification)
    • CustomAgentLogsTmp/PRState/ (CCA session state with convention/build-command pattern detection)
    • Repository memories via -MemoryContext parameter (structured agent memory blocks: subject, fact, citations)
    • Most recent N PRs via -RecentPRCount (default: 20) — review comments, Copilot suggestion acceptance/rejection rates, review hotspot areas by file path
  • Analyze-And-Recommend.ps1 — Pattern detection across collected data: slow discovery (multiple attempts), quick successes, repeated failure approaches, common root causes, recurring memory subjects, suggestion rejection rates, and review comment hotspots. Outputs prioritized recommendations with evidence.

New Custom Agent: scrape-and-improve

  • .github/agents/scrape-and-improve.md — Autonomous 5-phase agent (Collect → Analyze → Present → Apply → Verify) that runs all scrapings and applies High/Medium priority instruction improvements directly to instruction files, skills, and copilot-instructions.md.
  • Registered as agent Update README.md #5 in copilot-instructions.md.

GitHub Action & Workflow

  • Composite action (.github/actions/scrape-and-improve/) wraps both scripts, uploads artifacts
  • Workflow (.github/workflows/scrape-and-improve.yml) runs weekly on schedule or via workflow_dispatch with optional PR number, label, date filters, and recent PR count

Applied Improvements from Initial Run

Ran all scrapings against the repo (5 session files, 21 memories, 21 fix attempts, 47.6% success rate, 7 recommendations). Applied findings as a new "Common Fix Patterns (From Agent Analysis)" section in copilot-instructions.md documenting what works and what fails for:

Original prompt

Add a skill thats able to scrape memories, copilot session. Cca sessions and copilot comments and then able to generate instruction updates for success

Also add an action that can trigger this


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

…teractions

Co-authored-by: PureWeen <5375137+PureWeen@users.noreply.github.com>
Copilot AI changed the title [WIP] Add memory scraping skill for instruction updates Add scrape-and-improve skill and workflow for agent instruction refinement Feb 7, 2026
Copilot AI requested a review from PureWeen February 7, 2026 19:49
@PureWeen
Copy link
Member

PureWeen commented Feb 7, 2026

@copilot

Can you generate suggestions based on

1 ) memories
2) cca github copilot
3) scrape most recent 20 prs and analyze responses to copilot suggestions

…cent PR suggestion analysis

Co-authored-by: PureWeen <5375137+PureWeen@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 7, 2026

@copilot

Can you generate suggestions based on
...

Added all three in commit 5d18f89:

  1. Memories - New -MemoryContext parameter accepts structured memory text (subject/fact/citations format). Also scans agent session files and CCA logs for convention, build-command, and store-event patterns. Analysis generates recommendations when memory subjects recur frequently (should be formalized in instruction files).

  2. CCA GitHub Copilot sessions - Enhanced CCA session scanning with pattern detection across CustomAgentLogsTmp/PRState/ files. Detects convention patterns, build commands, and store events embedded in session logs.

  3. Recent 20 PRs - New -RecentPRCount 20 parameter (default). Fetches review comments via gh api, detects Copilot-authored suggestions (specific bot accounts), tracks acceptance/rejection rates (with priority-based dedup to avoid double-counting), identifies review hotspot areas by file path, and generates recommendations for high-rejection rates and high-comment areas.

@PureWeen
Copy link
Member

PureWeen commented Feb 7, 2026

@copilot

Convert this to a custom agent

And then do a run of all the scrapings and show me the results

Check in the improvements

…ions from analysis

New agent: .github/agents/scrape-and-improve.md
Applied findings: Common Fix Patterns section in copilot-instructions.md
- NavigationPage handler disconnection (from PR #32289 - 6 attempts)
- CollectionView EmptyView Android (from PR #33134 - 8 attempts)
- Shell navigation iOS tests (from PR #33380 - 4 attempts)
- Device test isolation (from PR #33406 - 3 attempts)

Scrape results: 5 sessions, 21 fix attempts, 47.6% success rate, 7 recommendations

Co-authored-by: PureWeen <5375137+PureWeen@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 7, 2026

@copilot

Convert this to a custom agent
...

Done in commit de8d59e. Three things completed:

1. Custom Agent Created.github/agents/scrape-and-improve.md with a 5-phase autonomous workflow: Collect → Analyze → Present → Apply → Verify. Registered as agent #5 in copilot-instructions.md.

2. Scraping Results (run against current repo):

Metric Value
Agent Session Files 5
Memories Collected 21 (16 from context + 5 from session scans)
Fix Attempts Analyzed 21
Success Rate 47.6%
Avg Attempts (slow PRs) 5.2
Recommendations Generated 7 (2 High, 4 Medium, 1 Low)

Slow-discovery PRs: #32289 (6 attempts), #33134 (8 attempts), #33380 (4 attempts), #33406 (3 attempts).

3. Improvements Checked In — Added "Common Fix Patterns (From Agent Analysis)" section to copilot-instructions.md with actionable guidance for the 4 slow-discovery areas:

  • NavigationPage handler disconnection — fix is in Legacy RemovePage path, not modern handler update
  • CollectionView EmptyView Android — normalize int.MaxValue back to double.PositiveInfinity where consumed
  • Shell navigation iOS tests — simplify DidPopItem to always sync stacks
  • Device test isolation — check for leaked state from previous tests

Copilot AI changed the title Add scrape-and-improve skill and workflow for agent instruction refinement Add scrape-and-improve skill, custom agent, and workflow for agent instruction refinement Feb 7, 2026
@rmarinho
Copy link
Member

rmarinho commented Feb 18, 2026

🤖 AI Summary

📊 Expand Full Review
🔍 Pre-Flight — Context & Validation
📝 Review SessionAdd scrape-and-improve custom agent and apply improvement recommendations from analysis · de8d59e

Pre-Flight: PR #33939

PR: #33939 - Add scrape-and-improve skill, custom agent, and workflow for agent instruction refinement
Author: Copilot (bot)
Status: Draft
Platforms Affected: N/A (tooling/infrastructure — GitHub Actions, PowerShell scripts, agent definitions)

Summary

This PR adds automated tooling to scrape agent interaction data and generate instruction file improvement recommendations. It is a purely infrastructure/tooling PR with no MAUI application code changes.

Files Changed

File Type Description
.github/actions/scrape-and-improve/action.yml New Composite GitHub Action wrapping both scripts
.github/agents/scrape-and-improve.md New Custom agent definition (5-phase: Collect→Analyze→Present→Apply→Verify)
.github/copilot-instructions.md Modified Adds "Common Fix Patterns" section + registers new skill/agent
.github/skills/scrape-and-improve/SKILL.md New Skill documentation
.github/skills/scrape-and-improve/scripts/Analyze-And-Recommend.ps1 New Pattern analysis script
.github/skills/scrape-and-improve/scripts/Collect-AgentData.ps1 New Data collection script
.github/workflows/scrape-and-improve.yml New Weekly scheduled workflow + manual dispatch

Total: 6 new files, 1 modified file (+1736 lines, -1 lines)

PR Discussion Summary

Comment Author Content
Initial request PureWeen "Add a skill able to scrape memories, Copilot sessions, CCA sessions and Copilot comments to generate instruction updates. Also add an action to trigger this."
Iteration 1 PureWeen "Generate suggestions based on memories, CCA GitHub Copilot, and scrape most recent 20 PRs"
Response 1 Copilot Added -MemoryContext parameter, CCA session scanning, and -RecentPRCount 20 parameter
Iteration 2 PureWeen "Convert this to a custom agent. Run all scrapings and show me results. Check in improvements."
Response 2 Copilot Created custom agent, ran scraping (5 session files, 21 memories, 21 fix attempts, 47.6% success rate, 7 recommendations), applied "Common Fix Patterns" section to copilot-instructions.md

No disagreements or edge case discussions in PR comments.

Fix Candidates (for Infrastructure PR)

# Source Approach Test Result Files Changed Notes
PR PR #33939 Add scrape-and-improve infrastructure (skill, agent, action, workflow, fix patterns in copilot-instructions.md) ⏳ PENDING (Gate) 7 files (+1736) Original PR

Key Observations

  • No functional MAUI test code — no TestCases.HostApp or TestCases.Shared.Tests files
  • No related GitHub issue — the PR originated from an in-PR conversation request
  • Applied improvements includedcopilot-instructions.md now has "Common Fix Patterns" section with concrete guidance on NavigationPage, CollectionView, Shell, and device test isolation patterns
  • Gate assessment: There are NO UI/device tests to verify; this is a tooling PR

🚦 Gate — Test Verification
📝 Review SessionAdd scrape-and-improve custom agent and apply improvement recommendations from analysis · de8d59e

Gate: PR #33939

Result: ⚠️ NOT APPLICABLE
Platform: N/A
Mode: N/A

Gate Assessment

This PR is a pure tooling/infrastructure PR. It contains:

  • PowerShell scripts (.github/skills/scrape-and-improve/scripts/)
  • GitHub Actions composite action (.github/actions/scrape-and-improve/action.yml)
  • GitHub Actions workflow (.github/workflows/scrape-and-improve.yml)
  • Agent/skill documentation (.github/agents/, .github/skills/)
  • Documentation updates to copilot-instructions.md

No MAUI source code changed. No UI tests, device tests, or unit tests exist for this PR, nor are they applicable (PowerShell/YAML tooling has no testable MAUI behavior).

Gate Decision

Gate verification is NOT APPLICABLE for this PR type. The PR does not modify any MAUI framework code, controls, handlers, or platform code that would require test verification.

Proceeding to Report phase with code review of the tooling/infrastructure changes.


🔧 Fix — Analysis & Comparison
📝 Review SessionAdd scrape-and-improve custom agent and apply improvement recommendations from analysis · de8d59e

Fix (Try-Fix): PR #33939

Status: SKIPPED

Reason: This is a tooling/infrastructure PR. It adds PowerShell scripts, GitHub Actions, agent documentation, and skill definitions — no MAUI application code, no testable behavior via UI tests.

The try-fix phase is not applicable because:

  1. There is no bug to fix — this PR adds new features
  2. There are no MAUI tests to verify a fix against
  3. The PR's changes cannot be validated via BuildAndRunHostApp.ps1

Fix Candidates

# Source Approach Test Result Files Changed Notes
PR PR #33939 Add scrape-and-improve infrastructure (skill, agent, action, workflow, fix patterns) N/A (no tests) 7 files (+1736) Infrastructure/tooling PR

Exhausted: N/A (skipped - infra PR)
Selected Fix: PR's approach — the only applicable approach for an infrastructure PR


📋 Report — Final Recommendation
📝 Review SessionAdd scrape-and-improve custom agent and apply improvement recommendations from analysis · de8d59e

Report: PR #33939

⚠️ Final Recommendation: COMMENT (Request Review)

Summary

PR #33939 adds a new scrape-and-improve skill, custom agent, GitHub composite action, and scheduled workflow for automatically analyzing agent interaction data and generating instruction improvement recommendations. It also applies initial findings directly to copilot-instructions.md as a "Common Fix Patterns" section.

This is an infrastructure/tooling PR with no MAUI application code changes. Gate verification was not applicable. The try-fix phase was skipped (no testable fix to verify).


Root Cause (of the need for this PR)

Agent PR sessions have historically required multiple fix attempts for certain problem types (e.g., NavigationPage handler disconnection: 6 attempts, CollectionView EmptyView: 8 attempts). This PR adds automation to identify such patterns and codify learnings as instruction file updates.


Fix Quality Assessment

The PR is well-structured and achieves its stated goal. The tooling correctly:

  • Collects data from 5 sources (agent sessions, Copilot comments, CCA logs, memories, recent PR reviews)
  • Analyzes patterns (slow discovery, quick success, memory frequency, suggestion rejection rates)
  • Generates prioritized recommendations
  • Applies initial findings to copilot-instructions.md

Issues Found

🔴 Bug: Suggestion acceptance/rejection rate calculation is incorrect

File: .github/skills/scrape-and-improve/scripts/Collect-AgentData.ps1 lines 483-495

The acceptance/rejection pattern matching (lgtm, looks good, disagree, won't fix, etc.) applies to ALL review comments, not just replies to Copilot-authored suggestions. This causes inflated or impossible statistics (e.g., if Copilot posts 1 suggestion and 2 humans reply "lgtm", the script reports 2 accepted out of 1 Copilot suggestion = 200% acceptance rate).

The copilotSuggestions counter is correctly incremented only for Copilot-authored comments (line 475), but suggestionsAccepted/suggestionsRejected count human responses in any comment thread.

A better approach: use in_reply_to_id from the GitHub API to track which human comments are replies to Copilot suggestions, or at minimum only count acceptance/rejection in comments that are replies to Copilot-authored comments.

🟡 Minor: Repository parameter not passed to Collect-AgentData.ps1 via the action

File: .github/actions/scrape-and-improve/action.yml

The Collect-AgentData.ps1 script has a $Repository parameter (line 48 of the script, default "dotnet/maui"). In the action, it's passed via $params["Repository"] = "${{ github.repository }}" — this is good. But there is no explicit documentation of this parameter in the action's inputs, so someone using the action on a fork would need to know the repository is auto-detected from github.repository.

🟡 Minor: Workflow doesn't pass memory-context input

File: .github/workflows/scrape-and-improve.yml

The action supports a memory-context input for analyzing structured memory blocks, but the workflow doesn't expose this as a workflow_dispatch input. Memory context provides the richest data for the analysis. Consider adding it as an optional input.


PR Description Accuracy

The PR description accurately describes the implementation. The "Applied Improvements from Initial Run" section provides concrete metrics (5 session files, 21 memories, 21 fix attempts, 47.6% success rate, 7 recommendations) and the Common Fix Patterns in copilot-instructions.md are well-documented with evidence from specific PRs.


Positive Aspects

  • ✅ Workflow uses github.repository_owner == 'dotnet' guard to prevent running on forks
  • ✅ Permissions are minimal (read-only: contents, issues, pull-requests)
  • ErrorActionPreference = "Continue" prevents script halting on non-critical errors
  • ✅ All data collection sections have gh CLI not available fallbacks
  • ✅ Recommendations are evidence-based with specific PR citations
  • ✅ Artifacts are uploaded with 30-day retention
  • ✅ The "Common Fix Patterns" content in copilot-instructions.md is accurate and based on real PR data
  • ci: github.repository_owner == 'dotnet' guard prevents fork runs
  • ✅ Weekly schedule is appropriate for this type of background analysis

📋 Expand PR Finalization Review
Title: ✅ Good

Current: Add scrape-and-improve skill, custom agent, and workflow for agent instruction refinement

Description: ✅ Excellent

Description needs updates. See details below.
Missing Elements:

** Prepend the required NOTE block to the top of the description.


Phase 2: Code Review

See code-review.md for detailed findings.

Summary:

  • 🟡 3 suggestions (non-blocking, moderate improvements)
  • ✅ Implementation is generally solid and well-structured

Recommendation

  1. Add NOTE block at top of description (required for all PRs)
  2. Title is fine as-is — no change needed
  3. Code review findings are suggestions only (no critical issues)

The PR is a tooling/infrastructure addition (skill + agent + workflow) with no impact on MAUI runtime code. The copilot-instructions.md changes are additive documentation of patterns from the initial run.

✨ Suggested PR Description

[!NOTE]
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Adds automated tooling to scrape agent interaction data (PR sessions, Copilot comments, CCA sessions, memories, recent PR reviews) and generate instruction file improvement recommendations — both as a reusable skill and as an autonomous custom agent that applies improvements directly.

New Skill: scrape-and-improve

  • Collect-AgentData.ps1 — Gathers data from five sources:
    • .github/agent-pr-session/*.md (fix candidates, root causes, phase statuses)
    • PR comments via gh CLI (AI Summary markers, try-fix attempts, test verification)
    • CustomAgentLogsTmp/PRState/ (CCA session state with convention/build-command pattern detection)
    • Repository memories via -MemoryContext parameter (structured agent memory blocks: subject, fact, citations)
    • Most recent N PRs via -RecentPRCount (default: 20) — review comments, Copilot suggestion acceptance/rejection rates, review hotspot areas by file path
  • Analyze-And-Recommend.ps1 — Pattern detection across collected data: slow discovery (multiple attempts), quick successes, repeated failure approaches, common root causes, recurring memory subjects, suggestion rejection rates, and review comment hotspots. Outputs prioritized recommendations with evidence.

New Custom Agent: scrape-and-improve

  • .github/agents/scrape-and-improve.md — Autonomous 5-phase agent (Collect → Analyze → Present → Apply → Verify) that runs all scrapings and applies High/Medium priority instruction improvements directly to instruction files, skills, and copilot-instructions.md.
  • Registered as agent Update README.md #5 in copilot-instructions.md.

GitHub Action & Workflow

  • Composite action (.github/actions/scrape-and-improve/) wraps both scripts, uploads artifacts
  • Workflow (.github/workflows/scrape-and-improve.yml) runs weekly on schedule or via workflow_dispatch with optional PR number, label, date filters, and recent PR count

Applied Improvements from Initial Run

Ran all scrapings against the repo (5 session files, 21 memories, 21 fix attempts, 47.6% success rate, 7 recommendations). Applied findings as a new "Common Fix Patterns (From Agent Analysis)" section in copilot-instructions.md documenting what works and what fails for:

Code Review: ✅ Passed

Code Review: PR #33939

PR: Add scrape-and-improve skill, custom agent, and workflow for agent instruction refinement
Scope: Tooling/infrastructure only — no MAUI runtime code changes


Code Review Findings

🟡 Suggestions

1. Workflow missing memory-context input wiring

  • File: .github/workflows/scrape-and-improve.yml
  • Observation: The composite action (action.yml) accepts a memory-context input, but the workflow (scrape-and-improve.yml) does not expose it as a workflow_dispatch input and does not pass it to the action. This means scheduled/dispatch runs can never pass memory context.
  • Recommendation: Either add memory_context as an optional workflow_dispatch input and wire it through, or document that this input is only usable when calling the action directly (not via the workflow).

2. PowerShell $ErrorActionPreference = "Continue" in analysis script

  • File: .github/skills/scrape-and-improve/scripts/Analyze-And-Recommend.ps1 (line ~30)
  • Observation: $ErrorActionPreference = "Continue" silently swallows errors. For a script that parses JSON and writes reports, a silent failure could produce an empty/corrupt output file that downstream steps (like the artifact upload) would capture without any CI signal.
  • Recommendation: Use $ErrorActionPreference = "Stop" and wrap JSON parsing in a try/catch with a clear error message, or at minimum change it to "SilentlyContinue" only for specific commands that are expected to fail (e.g., missing optional files).

3. Inline ${{ inputs.memory-context }} in shell script is injection-prone

  • File: .github/actions/scrape-and-improve/action.yml (line ~64)
  • Observation: The composite action passes memory-context input directly into a shell if condition via "${{ inputs.memory-context }}". If the memory context contains PowerShell special characters or newlines, this could break the script or produce unexpected behavior. The action runs with shell: pwsh so PowerShell injection is the concern.
  • Recommendation: Pass multi-line or complex inputs via an environment variable rather than inline template substitution:
    env:
      MEMORY_CONTEXT: ${{ inputs.memory-context }}
    run: |
      if ($env:MEMORY_CONTEXT) {
        $params["MemoryContext"] = $env:MEMORY_CONTEXT
      }

✅ Looks Good

  • Script structure — Both PowerShell scripts are well-organized with clear sections, meaningful variable names, and helpful Write-Host progress output.
  • Analysis thresholds — Constants like $HIGH_REJECTION_THRESHOLD = 30, $HOTSPOT_COMMENT_THRESHOLD = 5, and $MEMORY_FREQUENCY_THRESHOLD = 2 are clearly named and documented at the top of Analyze-And-Recommend.ps1.
  • Agent documentation — The scrape-and-improve.md agent file is thorough: 5-phase workflow, error handling table, and clear distinction between skill vs. agent modes.
  • SKILL.md structure — Follows the established pattern of other skills in the repo (inputs, outputs, workflow steps, error handling, integration section).
  • copilot-instructions.md additions — The "Common Fix Patterns" section is well-targeted and provides concrete, actionable guidance for future agents with specific PR references and ✅/❌ formatting consistent with the rest of the file.
  • Workflow safetyif: github.repository_owner == 'dotnet' guard prevents the scheduled workflow from running on forks.
  • Permissions — Workflow uses minimal permissions (contents: read, issues: read, pull-requests: read) — no write permissions needed since the action only uploads artifacts.
  • Artifact retention — 30-day retention on the analysis artifacts is appropriate for a weekly workflow.

@rmarinho rmarinho added s/agent-review-incomplete AI agent could not complete all phases (blocker, timeout, error) s/agent-fix-lose AI could not beat the PR fix - PR is the best among all candidates s/agent-reviewed PR was reviewed by AI agent workflow (full 4-phase review) labels Feb 18, 2026
@kubaflo kubaflo removed s/agent-reviewed PR was reviewed by AI agent workflow (full 4-phase review) s/agent-review-incomplete AI agent could not complete all phases (blocker, timeout, error) s/agent-fix-lose AI could not beat the PR fix - PR is the best among all candidates labels Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants