Skip to content

[Skills] Add regression detection to pr-finalize skill and eval scenarios#34656

Closed
PureWeen wants to merge 4 commits intomainfrom
MAUI---Regression-Eval-orchestrator-56fc
Closed

[Skills] Add regression detection to pr-finalize skill and eval scenarios#34656
PureWeen wants to merge 4 commits intomainfrom
MAUI---Regression-Eval-orchestrator-56fc

Conversation

@PureWeen
Copy link
Copy Markdown
Member

Problem

Three P/0 regressions (#34634, #34635, #34636) shipped in 10.0.60 because the agent approved PRs that unknowingly reverted lines from prior bug fix PRs.

Concrete example: PR #33908 removed || parent is IMauiRecyclerView from FindListenerForView() in MauiWindowInsetListener.cs. That line was added by PR #32278 specifically to fix issue #32436 (increasing gap at bottom while scrolling in RecyclerView). Removing it caused #32436 to reappear as regression #34634.

The agent had no mechanism to detect the "reverted fix" pattern. Its code review only looked at what the PR added, not whether what it removed was a deliberate prior fix.


Solution

Add a mandatory Prior Fix Regression Check as Step 1 of the pr-finalize code review. Before any other review work, the agent must check whether deleted lines from implementation files were added by prior bug-fix commits.

Detection Method

  1. Get the PR diff and extract non-trivial deleted lines from implementation files
  2. Run git blame on those lines to find the originating commit
  3. Check if that commit message references an issue number (fixes #XXXX)
  4. If yes → flag as 🔴 Critical and require the author to explain the intentional removal

Files Changed

Modified

  • .github/skills/pr-finalize/SKILL.md — Added "Prior Fix Regression Check" as mandatory Step 1 of Phase 2 (Code Review), documented with the real MauiWindowInsetListener.cs example
  • .github/copilot-instructions.md — Added regression detection step to the Code Review Instructions section

Created

  • .github/skills/pr-finalize/scripts/Detect-Regressions.ps1 — PowerShell script automating detection: gets PR diff → extracts deleted lines → runs git blame → checks commit messages for issue refs → exits 1 if regressions found
  • .github/skills/pr-finalize/tests/eval.yaml — 11 eval scenarios for the pr-finalize skill using the same format as evaluate-pr-tests/tests/eval.yaml, compatible with dotnet skills validator
  • .github/instructions/regression-detection.instructions.md — Global instruction file documenting the anti-pattern, high-risk MAUI code locations, and guidance for PR authors

Eval Scenarios (tests/eval.yaml)

Key scenarios covering the regression detection logic:

Scenario Expected
PR removes a line added by a prior fix 🔴 Critical flagged
PR removes only comments/blank lines ✅ PASSED (not flagged)
PR removes a line from a refactoring commit (no issue ref) ✅ PASSED (not flagged)
PR removes multiple lines from different fix commits Both regressions flagged
General code review, null reference risk 🔴 or 🟡 flagged
Agent asked to post comment directly Must refuse (analysis only)

Testing

Run evals against the pr-finalize skill:

# With dotnet skills validator (if available)
dotnet skills validate .github/skills/pr-finalize/tests/eval.yaml

# Manual: run the detection script against a real PR
pwsh .github/skills/pr-finalize/scripts/Detect-Regressions.ps1 -PRNumber 33908

Issues Fixed

Part of the investigation into #34634, #34635, #34636

Platforms Tested

N/A — skill/tooling change only

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Add a new mandatory regression detection step to the pr-finalize code review
workflow that catches PRs which inadvertently revert lines added by prior bug fixes.

## Problem

Three P/0 regressions (#34634, #34635, #34636) shipped in 10.0.60 because the
agent approved PRs that unknowingly reverted lines from prior fix PRs. Specifically,
PR #33908 removed `|| parent is IMauiRecyclerView` from MauiWindowInsetListener.cs
— a line PR #32278 added to fix issue #32436 — causing that bug to reappear.

The agent's code review had no mechanism to detect this 'reverted fix' pattern.

## Changes

### .github/skills/pr-finalize/SKILL.md
- Add "Prior Fix Regression Check" as Step 1 of Phase 2 (Code Review)
- Documents the pattern with the real MauiWindowInsetListener.cs example
- Specifies what to flag as 🔴 Critical and what to output

### .github/skills/pr-finalize/scripts/Detect-Regressions.ps1
- New PowerShell script that automates regression detection
- Gets PR diff, extracts deleted lines from implementation files
- Runs git blame to find originating commits
- Flags lines whose commit references an issue number (fixes #XXXX)
- Exits with code 1 if regressions detected (integrates with CI)

### .github/skills/pr-finalize/tests/eval.yaml
- Eval test cases for the pr-finalize skill
- Covers: regression detected (should flag), benign removal (should not flag),
  refactoring commit (should not flag), multiple regressions, general code review

### .github/instructions/regression-detection.instructions.md
- Global instruction file documenting the reverted-fix anti-pattern
- Explains detection method (git blame on deleted lines)
- Lists high-risk MAUI code patterns prone to this issue
- Guidance for PR authors on how to document intentional removals

### .github/copilot-instructions.md
- Add regression detection step to the Code Review Instructions section

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 25, 2026 20:07
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 34656

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 34656"

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR strengthens the pr-finalize skill by adding a mandatory “Prior Fix Regression Check” to catch cases where a PR deletes lines that were originally introduced to fix earlier bugs, plus adds automation + eval scenarios for that workflow.

Changes:

  • Added a mandatory “Prior Fix Regression Check” step to the pr-finalize skill guidance (Phase 2, Step 1).
  • Added a PowerShell script (Detect-Regressions.ps1) to automate regression detection via PR diff + git blame + commit-message issue references.
  • Added pr-finalize eval scenarios to validate regression detection and general workflow behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
.github/skills/pr-finalize/SKILL.md Documents the mandatory regression check and expected output structure for findings.
.github/skills/pr-finalize/scripts/Detect-Regressions.ps1 Implements automated regression detection based on deleted lines and commit history.
.github/skills/pr-finalize/tests/eval.yaml Adds eval scenarios covering regression detection, workflow phases, and “no-post” behavior.
.github/instructions/regression-detection.instructions.md Adds a repo-wide instruction doc describing the reverted-fix regression anti-pattern and how to detect it.
.github/copilot-instructions.md Adds high-level guidance to run regression detection early during reviews.


2. **Manual check** — for any non-trivial deleted line (guard condition, type check, null check):
```bash
git blame main -- src/path/to/file.cs | grep -F "deleted line"
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual instructions use git blame main -- .... In many review environments (CI agents, detached HEAD checkouts), main may not exist locally or may be stale, while origin/main is present. Consider changing this example to git blame origin/main -- ... (or to the PR’s actual base ref) to make the guidance reliably reproducible.

Suggested change
git blame main -- src/path/to/file.cs | grep -F "deleted line"
git blame origin/main -- src/path/to/file.cs | grep -F "deleted line"

Copilot uses AI. Check for mistakes.

```bash
# Check who added a specific line and why
git blame main -- src/path/to/file.cs | grep -F "deleted line content"
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual blame command uses git blame main -- ..., which may not exist / be updated in common PR review checkouts (detached HEAD, only remote refs fetched). Using origin/main (or the PR base branch) is typically more reliable for reviewers following these steps.

Suggested change
git blame main -- src/path/to/file.cs | grep -F "deleted line content"
git blame origin/main -- src/path/to/file.cs | grep -F "deleted line content"

Copilot uses AI. Check for mistakes.
Comment on lines +49 to +56
# Match common patterns: fixes #123, closes #123, resolves #123, issue #123, re: #123, #123
$patterns = @(
'fixes?\s+#(\d+)',
'closes?\s+#(\d+)',
'resolves?\s+#(\d+)',
'issue\s+#(\d+)',
're:\s+#(\d+)',
'\(#(\d+)\)'
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get-IssueReferences treats PR references like (#12345) as issue references via the \(#(\d+)\) pattern. Since the script flags a regression whenever issueRefs.Count > 0, this can incorrectly classify lines as “prior fix” just because the originating commit subject includes a PR number. Consider removing \(#(\d+)\) from issue detection (or separating PR refs from issue refs) and only treating explicit issue keywords like fixes/closes/resolves/issue/re: (and optionally bare #123 in the body) as issue references.

Suggested change
# Match common patterns: fixes #123, closes #123, resolves #123, issue #123, re: #123, #123
$patterns = @(
'fixes?\s+#(\d+)',
'closes?\s+#(\d+)',
'resolves?\s+#(\d+)',
'issue\s+#(\d+)',
're:\s+#(\d+)',
'\(#(\d+)\)'
# Match common patterns: fixes #123, closes #123, resolves #123, issue #123, re: #123
$patterns = @(
'fixes?\s+#(\d+)',
'closes?\s+#(\d+)',
'resolves?\s+#(\d+)',
'issue\s+#(\d+)',
're:\s+#(\d+)'

Copilot uses AI. Check for mistakes.
function Is-TestFile {
param([string]$FilePath)
return $FilePath -match '(Test|Spec|\.Tests\.|TestCases|DeviceTests|UnitTests|Xaml\.UnitTests)' -or
$FilePath -match '(test|spec)' -or
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is-TestFile uses $FilePath -match '(test|spec)', which will match substrings inside unrelated path segments (e.g., latest contains test). This can cause real implementation files to be misclassified as tests and skipped entirely. Tighten the regex to match path segments / filenames (e.g., (?:^|[\\/])tests?(?:[\\/]|$) and known *.Tests.* patterns) instead of any substring match.

Suggested change
$FilePath -match '(test|spec)' -or
$FilePath -match '(?i)(?:^|[\\/])tests?(?:[\\/]|$)' -or
$FilePath -match '(?i)(?:^|[\\/])specs?(?:[\\/]|$)' -or
$FilePath -match '(?i)\.tests?\.' -or

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +17
.PARAMETER PRNumber
The PR number to analyze. If not provided, attempts to detect from current branch.

.PARAMETER BaseBranch
The base branch to diff against. Defaults to 'origin/main'.

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script comment says PRNumber “attempts to detect from current branch” when not provided, but the implementation never does PR detection; it always falls back to git diff "${BaseBranch}...HEAD". Either implement PR auto-detection (e.g., via gh pr view --json number / branch naming conventions) or update the parameter documentation to match actual behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +203 to +205
# Use the local base branch for blame
$baseRef = $BaseBranch -replace '^origin/', ''
$blameOutput = git blame "$baseRef" -- $file 2>&1
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git blame is run against $baseRef = $BaseBranch -replace '^origin/', '' (e.g., origin/mainmain). This will fail or use a stale local branch if main isn’t present/up to date, even though origin/main is available. Prefer blaming the ref the user passed (git blame $BaseBranch -- $file) or ensure the local branch exists and is updated before using it.

Suggested change
# Use the local base branch for blame
$baseRef = $BaseBranch -replace '^origin/', ''
$blameOutput = git blame "$baseRef" -- $file 2>&1
# Use the base branch ref as provided for blame (supports local or remote refs)
$blameOutput = git blame "$BaseBranch" -- $file 2>&1

Copilot uses AI. Check for mistakes.
Comment on lines +200 to +254
# Get git blame for this file on the base branch
$blameOutput = $null
try {
# Use the local base branch for blame
$baseRef = $BaseBranch -replace '^origin/', ''
$blameOutput = git blame "$baseRef" -- $file 2>&1
}
catch {
Write-Warning " Could not run git blame for $file. Skipping."
continue
}

if (-not $blameOutput) { continue }

$blameLines = $blameOutput -split "`n"

foreach ($deleteInfo in $deletes) {
$Checked++
$searchContent = $deleteInfo.LineContent.Trim()
if ($searchContent.Length -lt 5) { continue } # too short to be meaningful

# Find matching lines in blame output
# Format: "<hash> (<author> <date> <time> <tz> <lineno>) <content>"
$matchingBlameLines = $blameLines | Where-Object { $_ -match [regex]::Escape($searchContent.Substring(0, [Math]::Min(40, $searchContent.Length))) }

foreach ($blameLine in $matchingBlameLines) {
if ($blameLine -match '^(\^?[0-9a-f]{7,40})\s') {
$commitHash = $Matches[1].TrimStart('^')
if ($commitHash -match '^0+$') { continue } # uncommitted

# Get commit message
$commitMsg = $null
try {
$commitMsg = git log --format="%s%n%b" -1 $commitHash 2>&1 | Out-String
}
catch { continue }

if (-not $commitMsg) { continue }

# Check if commit references an issue
$issueRefs = Get-IssueReferences $commitMsg
if ($issueRefs.Count -gt 0) {
$commitSubject = (git log --format="%s" -1 $commitHash 2>&1) -join ""
$prRefs = [regex]::Matches($commitMsg, '\(#(\d+)\)') | ForEach-Object { $_.Groups[1].Value }

$Regressions += [PSCustomObject]@{
File = $file
DeletedLine = $deleteInfo.LineContent
CommitHash = $commitHash.Substring(0, 7)
CommitSubject = $commitSubject
IssueRefs = $issueRefs -join ', '
PRRefs = $prRefs -join ', '
}
break # One match per deleted line is enough
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted lines are matched to git blame output by searching for the first ~40 characters of the deleted content. This is ambiguous for common snippets (e.g., return null;, if () and can attribute the deletion to the wrong commit, producing false positives/negatives. Since you already track the old-file line number from the diff hunk, consider using git blame -L <line>,<line> <base> -- <file> (and verifying the blamed line content matches) to make the mapping deterministic and faster.

Suggested change
# Get git blame for this file on the base branch
$blameOutput = $null
try {
# Use the local base branch for blame
$baseRef = $BaseBranch -replace '^origin/', ''
$blameOutput = git blame "$baseRef" -- $file 2>&1
}
catch {
Write-Warning " Could not run git blame for $file. Skipping."
continue
}
if (-not $blameOutput) { continue }
$blameLines = $blameOutput -split "`n"
foreach ($deleteInfo in $deletes) {
$Checked++
$searchContent = $deleteInfo.LineContent.Trim()
if ($searchContent.Length -lt 5) { continue } # too short to be meaningful
# Find matching lines in blame output
# Format: "<hash> (<author> <date> <time> <tz> <lineno>) <content>"
$matchingBlameLines = $blameLines | Where-Object { $_ -match [regex]::Escape($searchContent.Substring(0, [Math]::Min(40, $searchContent.Length))) }
foreach ($blameLine in $matchingBlameLines) {
if ($blameLine -match '^(\^?[0-9a-f]{7,40})\s') {
$commitHash = $Matches[1].TrimStart('^')
if ($commitHash -match '^0+$') { continue } # uncommitted
# Get commit message
$commitMsg = $null
try {
$commitMsg = git log --format="%s%n%b" -1 $commitHash 2>&1 | Out-String
}
catch { continue }
if (-not $commitMsg) { continue }
# Check if commit references an issue
$issueRefs = Get-IssueReferences $commitMsg
if ($issueRefs.Count -gt 0) {
$commitSubject = (git log --format="%s" -1 $commitHash 2>&1) -join ""
$prRefs = [regex]::Matches($commitMsg, '\(#(\d+)\)') | ForEach-Object { $_.Groups[1].Value }
$Regressions += [PSCustomObject]@{
File = $file
DeletedLine = $deleteInfo.LineContent
CommitHash = $commitHash.Substring(0, 7)
CommitSubject = $commitSubject
IssueRefs = $issueRefs -join ', '
PRRefs = $prRefs -join ', '
}
break # One match per deleted line is enough
}
# Use the local base branch for blame
$baseRef = $BaseBranch -replace '^origin/', ''
foreach ($deleteInfo in $deletes) {
$Checked++
# We need the original line number in the base file to run a precise blame.
$lineNumber = $deleteInfo.OldLineNumber
if (-not $lineNumber) { continue }
# Get git blame for this specific line on the base branch
$blameOutput = $null
try {
$blameOutput = git blame $baseRef -L "$lineNumber,$lineNumber" -- $file 2>&1
}
catch {
Write-Warning " Could not run git blame for $file line $lineNumber. Skipping line."
continue
}
if (-not $blameOutput) { continue }
# git blame output format (porcelain default):
# "<hash> (<author> <date> <time> <tz> <lineno>) <content>"
$blameLine = ($blameOutput -split "`n")[0]
if ($blameLine -notmatch '^(\^?[0-9a-f]{7,40})\s.*?\)\s(.*)$') {
continue
}
$commitHash = $Matches[1].TrimStart('^')
$blamedContent = $Matches[2]
if ($commitHash -match '^0+$') { continue } # uncommitted / no commit
# Verify the blamed content actually matches the deleted line
$deletedContentTrimmed = $deleteInfo.LineContent.Trim()
if (-not $deletedContentTrimmed) { continue }
if ($blamedContent.Trim() -ne $deletedContentTrimmed) { continue }
# Get commit message
$commitMsg = $null
try {
$commitMsg = git log --format="%s%n%b" -1 $commitHash 2>&1 | Out-String
}
catch { continue }
if (-not $commitMsg) { continue }
# Check if commit references an issue
$issueRefs = Get-IssueReferences $commitMsg
if ($issueRefs.Count -gt 0) {
$commitSubject = (git log --format="%s" -1 $commitHash 2>&1) -join ""
$prRefs = [regex]::Matches($commitMsg, '\(#(\d+)\)') | ForEach-Object { $_.Groups[1].Value }
$Regressions += [PSCustomObject]@{
File = $file
DeletedLine = $deleteInfo.LineContent
CommitHash = $commitHash.Substring(0, 7)
CommitSubject = $commitSubject
IssueRefs = $issueRefs -join ', '
PRRefs = $prRefs -join ', '

Copilot uses AI. Check for mistakes.
Comment on lines +331 to +333
# Step 1: Get all deleted lines from non-test implementation files
gh pr diff XXXXX | grep "^-" | grep -v "^---" | grep -v "^-.*\(test\|Test\|spec\|Spec\)" > /tmp/deleted_lines.txt

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example command intended to “get all deleted lines from non-test implementation files” doesn’t actually filter by file path; it filters deleted diff lines whose content contains test/spec. This will still include deletions from test files (and may exclude legitimate code lines that mention “test”). Consider removing this snippet or replacing it with a file-aware approach (e.g., iterate changed files from gh pr diff --name-only and filter paths before grepping deleted lines), and emphasize using Detect-Regressions.ps1 for correctness.

Copilot uses AI. Check for mistakes.
Five bugs fixed:

1. CRITICAL: Remove bare (#{N}) from Get-IssueReferences patterns
   GitHub auto-appends PR numbers to every squash-merged commit subject
   (e.g. 'Fix something (#32278)'), so the pattern would flag ~100% of
   deleted lines as regressions. Removed from $patterns entirely.
   The remaining 'fixes', 'closes', 'resolves', 'issue', 're:' patterns
   are precise. The (#{N}) pattern is still used at display time (after
   a keyword match is already confirmed) to enrich the report.

2. MEDIUM: Fix Is-TestFile false positives on 'spec'/'test' substrings
   PowerShell -match is case-insensitive and matches substrings, so
   '(test|spec)' would exclude 'AspectExtensions.cs', 'MeasureSpecFactory.cs',
   'AndroidSpecific/Button.cs', etc. (~498 real impl files affected).
   Now matches only path segments: [\/](tests?|specs?)[\/]

3. MEDIUM: Fix BaseBranch stripping — use remote ref as-is
   Was stripping 'origin/' prefix then running 'git blame main -- file',
   requiring a local 'main' branch that may not exist in worktrees or CI.
   Now passes $BaseBranch directly (e.g. 'origin/main') since git blame
   accepts remote tracking refs.

4. LOW: Handle renamed files in blame (silent skip bug)
   When a PR renames a file and removes lines, git blame fails on the
   new filename. Error was stored in $blameOutput, passed the truthy
   check, but parsed no lines — silently skipping. Now detects 'no such
   path|fatal:' in blame output and attempts to find the original path
   via 'git log --follow --diff-filter=R'.

5. LOW: Fix class declaration regex — support multi-modifier declarations
   Pattern '^(modifier)?\s*class\s' with '?' only matched zero-or-one
   modifier. 'public sealed class Foo' and 'internal static class Helper'
   were not recognized. Fixed to use '*' (zero-or-more).

Also fix nested diff block as closing the outer fence.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a GitHub Agentic Workflow that automatically checks PRs for
potential regressions caused by removing lines that were added by
prior bug fixes.

Triggers:
- pull_request: auto on implementation file changes (src/**/*.cs, src/**/*.xaml)
- workflow_dispatch: manual with PR number input
- issue_comment: /check-regressions slash command

Security model follows the same pattern as copilot-evaluate-tests:
- gh-aw sandbox with scrubbed credentials
- Safe outputs (max 1 PR comment per run)
- Fork PR support via forks: ["*"] + Checkout-GhAwPr.ps1
- Pre-agent gate filters PRs with no implementation file changes
- Pinned actions via actions-lock.json

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two bugs fixed:
1. .Count failed on non-array returns in strict mode — wrapped
   Get-IssueReferences calls in @() to ensure array type
2. Script missed regressions where the originating commit only had
   'fixes #XXXX' in the PR description, not the commit message.
   Now when a commit references a PR via (#XXXX) but has no issue
   refs in its message, the script queries the PR body via gh CLI.

Also fixed issue ref formatting in the report (#32301, 32436 → #32301, #32436).

Verified: correctly detects PR #33908 removing the IMauiRecyclerView
guard that was added by PR #32278 to fix issue #32436.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo
Copy link
Copy Markdown
Contributor

kubaflo commented Apr 1, 2026

Closing in favour of #34768

@kubaflo kubaflo closed this Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants