[Skills] Add regression detection to pr-finalize skill and eval scenarios#34656
[Skills] Add regression detection to pr-finalize skill and eval scenarios#34656
Conversation
Add a new mandatory regression detection step to the pr-finalize code review workflow that catches PRs which inadvertently revert lines added by prior bug fixes. ## Problem Three P/0 regressions (#34634, #34635, #34636) shipped in 10.0.60 because the agent approved PRs that unknowingly reverted lines from prior fix PRs. Specifically, PR #33908 removed `|| parent is IMauiRecyclerView` from MauiWindowInsetListener.cs — a line PR #32278 added to fix issue #32436 — causing that bug to reappear. The agent's code review had no mechanism to detect this 'reverted fix' pattern. ## Changes ### .github/skills/pr-finalize/SKILL.md - Add "Prior Fix Regression Check" as Step 1 of Phase 2 (Code Review) - Documents the pattern with the real MauiWindowInsetListener.cs example - Specifies what to flag as 🔴 Critical and what to output ### .github/skills/pr-finalize/scripts/Detect-Regressions.ps1 - New PowerShell script that automates regression detection - Gets PR diff, extracts deleted lines from implementation files - Runs git blame to find originating commits - Flags lines whose commit references an issue number (fixes #XXXX) - Exits with code 1 if regressions detected (integrates with CI) ### .github/skills/pr-finalize/tests/eval.yaml - Eval test cases for the pr-finalize skill - Covers: regression detected (should flag), benign removal (should not flag), refactoring commit (should not flag), multiple regressions, general code review ### .github/instructions/regression-detection.instructions.md - Global instruction file documenting the reverted-fix anti-pattern - Explains detection method (git blame on deleted lines) - Lists high-risk MAUI code patterns prone to this issue - Guidance for PR authors on how to document intentional removals ### .github/copilot-instructions.md - Add regression detection step to the Code Review Instructions section Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 34656Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 34656" |
There was a problem hiding this comment.
Pull request overview
This PR strengthens the pr-finalize skill by adding a mandatory “Prior Fix Regression Check” to catch cases where a PR deletes lines that were originally introduced to fix earlier bugs, plus adds automation + eval scenarios for that workflow.
Changes:
- Added a mandatory “Prior Fix Regression Check” step to the
pr-finalizeskill guidance (Phase 2, Step 1). - Added a PowerShell script (
Detect-Regressions.ps1) to automate regression detection via PR diff + git blame + commit-message issue references. - Added
pr-finalizeeval scenarios to validate regression detection and general workflow behavior.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
.github/skills/pr-finalize/SKILL.md |
Documents the mandatory regression check and expected output structure for findings. |
.github/skills/pr-finalize/scripts/Detect-Regressions.ps1 |
Implements automated regression detection based on deleted lines and commit history. |
.github/skills/pr-finalize/tests/eval.yaml |
Adds eval scenarios covering regression detection, workflow phases, and “no-post” behavior. |
.github/instructions/regression-detection.instructions.md |
Adds a repo-wide instruction doc describing the reverted-fix regression anti-pattern and how to detect it. |
.github/copilot-instructions.md |
Adds high-level guidance to run regression detection early during reviews. |
|
|
||
| 2. **Manual check** — for any non-trivial deleted line (guard condition, type check, null check): | ||
| ```bash | ||
| git blame main -- src/path/to/file.cs | grep -F "deleted line" |
There was a problem hiding this comment.
The manual instructions use git blame main -- .... In many review environments (CI agents, detached HEAD checkouts), main may not exist locally or may be stale, while origin/main is present. Consider changing this example to git blame origin/main -- ... (or to the PR’s actual base ref) to make the guidance reliably reproducible.
| git blame main -- src/path/to/file.cs | grep -F "deleted line" | |
| git blame origin/main -- src/path/to/file.cs | grep -F "deleted line" |
|
|
||
| ```bash | ||
| # Check who added a specific line and why | ||
| git blame main -- src/path/to/file.cs | grep -F "deleted line content" |
There was a problem hiding this comment.
The manual blame command uses git blame main -- ..., which may not exist / be updated in common PR review checkouts (detached HEAD, only remote refs fetched). Using origin/main (or the PR base branch) is typically more reliable for reviewers following these steps.
| git blame main -- src/path/to/file.cs | grep -F "deleted line content" | |
| git blame origin/main -- src/path/to/file.cs | grep -F "deleted line content" |
| # Match common patterns: fixes #123, closes #123, resolves #123, issue #123, re: #123, #123 | ||
| $patterns = @( | ||
| 'fixes?\s+#(\d+)', | ||
| 'closes?\s+#(\d+)', | ||
| 'resolves?\s+#(\d+)', | ||
| 'issue\s+#(\d+)', | ||
| 're:\s+#(\d+)', | ||
| '\(#(\d+)\)' |
There was a problem hiding this comment.
Get-IssueReferences treats PR references like (#12345) as issue references via the \(#(\d+)\) pattern. Since the script flags a regression whenever issueRefs.Count > 0, this can incorrectly classify lines as “prior fix” just because the originating commit subject includes a PR number. Consider removing \(#(\d+)\) from issue detection (or separating PR refs from issue refs) and only treating explicit issue keywords like fixes/closes/resolves/issue/re: (and optionally bare #123 in the body) as issue references.
| # Match common patterns: fixes #123, closes #123, resolves #123, issue #123, re: #123, #123 | |
| $patterns = @( | |
| 'fixes?\s+#(\d+)', | |
| 'closes?\s+#(\d+)', | |
| 'resolves?\s+#(\d+)', | |
| 'issue\s+#(\d+)', | |
| 're:\s+#(\d+)', | |
| '\(#(\d+)\)' | |
| # Match common patterns: fixes #123, closes #123, resolves #123, issue #123, re: #123 | |
| $patterns = @( | |
| 'fixes?\s+#(\d+)', | |
| 'closes?\s+#(\d+)', | |
| 'resolves?\s+#(\d+)', | |
| 'issue\s+#(\d+)', | |
| 're:\s+#(\d+)' |
| function Is-TestFile { | ||
| param([string]$FilePath) | ||
| return $FilePath -match '(Test|Spec|\.Tests\.|TestCases|DeviceTests|UnitTests|Xaml\.UnitTests)' -or | ||
| $FilePath -match '(test|spec)' -or |
There was a problem hiding this comment.
Is-TestFile uses $FilePath -match '(test|spec)', which will match substrings inside unrelated path segments (e.g., latest contains test). This can cause real implementation files to be misclassified as tests and skipped entirely. Tighten the regex to match path segments / filenames (e.g., (?:^|[\\/])tests?(?:[\\/]|$) and known *.Tests.* patterns) instead of any substring match.
| $FilePath -match '(test|spec)' -or | |
| $FilePath -match '(?i)(?:^|[\\/])tests?(?:[\\/]|$)' -or | |
| $FilePath -match '(?i)(?:^|[\\/])specs?(?:[\\/]|$)' -or | |
| $FilePath -match '(?i)\.tests?\.' -or |
| .PARAMETER PRNumber | ||
| The PR number to analyze. If not provided, attempts to detect from current branch. | ||
|
|
||
| .PARAMETER BaseBranch | ||
| The base branch to diff against. Defaults to 'origin/main'. | ||
|
|
There was a problem hiding this comment.
The script comment says PRNumber “attempts to detect from current branch” when not provided, but the implementation never does PR detection; it always falls back to git diff "${BaseBranch}...HEAD". Either implement PR auto-detection (e.g., via gh pr view --json number / branch naming conventions) or update the parameter documentation to match actual behavior.
| # Use the local base branch for blame | ||
| $baseRef = $BaseBranch -replace '^origin/', '' | ||
| $blameOutput = git blame "$baseRef" -- $file 2>&1 |
There was a problem hiding this comment.
git blame is run against $baseRef = $BaseBranch -replace '^origin/', '' (e.g., origin/main → main). This will fail or use a stale local branch if main isn’t present/up to date, even though origin/main is available. Prefer blaming the ref the user passed (git blame $BaseBranch -- $file) or ensure the local branch exists and is updated before using it.
| # Use the local base branch for blame | |
| $baseRef = $BaseBranch -replace '^origin/', '' | |
| $blameOutput = git blame "$baseRef" -- $file 2>&1 | |
| # Use the base branch ref as provided for blame (supports local or remote refs) | |
| $blameOutput = git blame "$BaseBranch" -- $file 2>&1 |
| # Get git blame for this file on the base branch | ||
| $blameOutput = $null | ||
| try { | ||
| # Use the local base branch for blame | ||
| $baseRef = $BaseBranch -replace '^origin/', '' | ||
| $blameOutput = git blame "$baseRef" -- $file 2>&1 | ||
| } | ||
| catch { | ||
| Write-Warning " Could not run git blame for $file. Skipping." | ||
| continue | ||
| } | ||
|
|
||
| if (-not $blameOutput) { continue } | ||
|
|
||
| $blameLines = $blameOutput -split "`n" | ||
|
|
||
| foreach ($deleteInfo in $deletes) { | ||
| $Checked++ | ||
| $searchContent = $deleteInfo.LineContent.Trim() | ||
| if ($searchContent.Length -lt 5) { continue } # too short to be meaningful | ||
|
|
||
| # Find matching lines in blame output | ||
| # Format: "<hash> (<author> <date> <time> <tz> <lineno>) <content>" | ||
| $matchingBlameLines = $blameLines | Where-Object { $_ -match [regex]::Escape($searchContent.Substring(0, [Math]::Min(40, $searchContent.Length))) } | ||
|
|
||
| foreach ($blameLine in $matchingBlameLines) { | ||
| if ($blameLine -match '^(\^?[0-9a-f]{7,40})\s') { | ||
| $commitHash = $Matches[1].TrimStart('^') | ||
| if ($commitHash -match '^0+$') { continue } # uncommitted | ||
|
|
||
| # Get commit message | ||
| $commitMsg = $null | ||
| try { | ||
| $commitMsg = git log --format="%s%n%b" -1 $commitHash 2>&1 | Out-String | ||
| } | ||
| catch { continue } | ||
|
|
||
| if (-not $commitMsg) { continue } | ||
|
|
||
| # Check if commit references an issue | ||
| $issueRefs = Get-IssueReferences $commitMsg | ||
| if ($issueRefs.Count -gt 0) { | ||
| $commitSubject = (git log --format="%s" -1 $commitHash 2>&1) -join "" | ||
| $prRefs = [regex]::Matches($commitMsg, '\(#(\d+)\)') | ForEach-Object { $_.Groups[1].Value } | ||
|
|
||
| $Regressions += [PSCustomObject]@{ | ||
| File = $file | ||
| DeletedLine = $deleteInfo.LineContent | ||
| CommitHash = $commitHash.Substring(0, 7) | ||
| CommitSubject = $commitSubject | ||
| IssueRefs = $issueRefs -join ', ' | ||
| PRRefs = $prRefs -join ', ' | ||
| } | ||
| break # One match per deleted line is enough | ||
| } |
There was a problem hiding this comment.
Deleted lines are matched to git blame output by searching for the first ~40 characters of the deleted content. This is ambiguous for common snippets (e.g., return null;, if () and can attribute the deletion to the wrong commit, producing false positives/negatives. Since you already track the old-file line number from the diff hunk, consider using git blame -L <line>,<line> <base> -- <file> (and verifying the blamed line content matches) to make the mapping deterministic and faster.
| # Get git blame for this file on the base branch | |
| $blameOutput = $null | |
| try { | |
| # Use the local base branch for blame | |
| $baseRef = $BaseBranch -replace '^origin/', '' | |
| $blameOutput = git blame "$baseRef" -- $file 2>&1 | |
| } | |
| catch { | |
| Write-Warning " Could not run git blame for $file. Skipping." | |
| continue | |
| } | |
| if (-not $blameOutput) { continue } | |
| $blameLines = $blameOutput -split "`n" | |
| foreach ($deleteInfo in $deletes) { | |
| $Checked++ | |
| $searchContent = $deleteInfo.LineContent.Trim() | |
| if ($searchContent.Length -lt 5) { continue } # too short to be meaningful | |
| # Find matching lines in blame output | |
| # Format: "<hash> (<author> <date> <time> <tz> <lineno>) <content>" | |
| $matchingBlameLines = $blameLines | Where-Object { $_ -match [regex]::Escape($searchContent.Substring(0, [Math]::Min(40, $searchContent.Length))) } | |
| foreach ($blameLine in $matchingBlameLines) { | |
| if ($blameLine -match '^(\^?[0-9a-f]{7,40})\s') { | |
| $commitHash = $Matches[1].TrimStart('^') | |
| if ($commitHash -match '^0+$') { continue } # uncommitted | |
| # Get commit message | |
| $commitMsg = $null | |
| try { | |
| $commitMsg = git log --format="%s%n%b" -1 $commitHash 2>&1 | Out-String | |
| } | |
| catch { continue } | |
| if (-not $commitMsg) { continue } | |
| # Check if commit references an issue | |
| $issueRefs = Get-IssueReferences $commitMsg | |
| if ($issueRefs.Count -gt 0) { | |
| $commitSubject = (git log --format="%s" -1 $commitHash 2>&1) -join "" | |
| $prRefs = [regex]::Matches($commitMsg, '\(#(\d+)\)') | ForEach-Object { $_.Groups[1].Value } | |
| $Regressions += [PSCustomObject]@{ | |
| File = $file | |
| DeletedLine = $deleteInfo.LineContent | |
| CommitHash = $commitHash.Substring(0, 7) | |
| CommitSubject = $commitSubject | |
| IssueRefs = $issueRefs -join ', ' | |
| PRRefs = $prRefs -join ', ' | |
| } | |
| break # One match per deleted line is enough | |
| } | |
| # Use the local base branch for blame | |
| $baseRef = $BaseBranch -replace '^origin/', '' | |
| foreach ($deleteInfo in $deletes) { | |
| $Checked++ | |
| # We need the original line number in the base file to run a precise blame. | |
| $lineNumber = $deleteInfo.OldLineNumber | |
| if (-not $lineNumber) { continue } | |
| # Get git blame for this specific line on the base branch | |
| $blameOutput = $null | |
| try { | |
| $blameOutput = git blame $baseRef -L "$lineNumber,$lineNumber" -- $file 2>&1 | |
| } | |
| catch { | |
| Write-Warning " Could not run git blame for $file line $lineNumber. Skipping line." | |
| continue | |
| } | |
| if (-not $blameOutput) { continue } | |
| # git blame output format (porcelain default): | |
| # "<hash> (<author> <date> <time> <tz> <lineno>) <content>" | |
| $blameLine = ($blameOutput -split "`n")[0] | |
| if ($blameLine -notmatch '^(\^?[0-9a-f]{7,40})\s.*?\)\s(.*)$') { | |
| continue | |
| } | |
| $commitHash = $Matches[1].TrimStart('^') | |
| $blamedContent = $Matches[2] | |
| if ($commitHash -match '^0+$') { continue } # uncommitted / no commit | |
| # Verify the blamed content actually matches the deleted line | |
| $deletedContentTrimmed = $deleteInfo.LineContent.Trim() | |
| if (-not $deletedContentTrimmed) { continue } | |
| if ($blamedContent.Trim() -ne $deletedContentTrimmed) { continue } | |
| # Get commit message | |
| $commitMsg = $null | |
| try { | |
| $commitMsg = git log --format="%s%n%b" -1 $commitHash 2>&1 | Out-String | |
| } | |
| catch { continue } | |
| if (-not $commitMsg) { continue } | |
| # Check if commit references an issue | |
| $issueRefs = Get-IssueReferences $commitMsg | |
| if ($issueRefs.Count -gt 0) { | |
| $commitSubject = (git log --format="%s" -1 $commitHash 2>&1) -join "" | |
| $prRefs = [regex]::Matches($commitMsg, '\(#(\d+)\)') | ForEach-Object { $_.Groups[1].Value } | |
| $Regressions += [PSCustomObject]@{ | |
| File = $file | |
| DeletedLine = $deleteInfo.LineContent | |
| CommitHash = $commitHash.Substring(0, 7) | |
| CommitSubject = $commitSubject | |
| IssueRefs = $issueRefs -join ', ' | |
| PRRefs = $prRefs -join ', ' |
| # Step 1: Get all deleted lines from non-test implementation files | ||
| gh pr diff XXXXX | grep "^-" | grep -v "^---" | grep -v "^-.*\(test\|Test\|spec\|Spec\)" > /tmp/deleted_lines.txt | ||
|
|
There was a problem hiding this comment.
The example command intended to “get all deleted lines from non-test implementation files” doesn’t actually filter by file path; it filters deleted diff lines whose content contains test/spec. This will still include deletions from test files (and may exclude legitimate code lines that mention “test”). Consider removing this snippet or replacing it with a file-aware approach (e.g., iterate changed files from gh pr diff --name-only and filter paths before grepping deleted lines), and emphasize using Detect-Regressions.ps1 for correctness.
Five bugs fixed:
1. CRITICAL: Remove bare (#{N}) from Get-IssueReferences patterns
GitHub auto-appends PR numbers to every squash-merged commit subject
(e.g. 'Fix something (#32278)'), so the pattern would flag ~100% of
deleted lines as regressions. Removed from $patterns entirely.
The remaining 'fixes', 'closes', 'resolves', 'issue', 're:' patterns
are precise. The (#{N}) pattern is still used at display time (after
a keyword match is already confirmed) to enrich the report.
2. MEDIUM: Fix Is-TestFile false positives on 'spec'/'test' substrings
PowerShell -match is case-insensitive and matches substrings, so
'(test|spec)' would exclude 'AspectExtensions.cs', 'MeasureSpecFactory.cs',
'AndroidSpecific/Button.cs', etc. (~498 real impl files affected).
Now matches only path segments: [\/](tests?|specs?)[\/]
3. MEDIUM: Fix BaseBranch stripping — use remote ref as-is
Was stripping 'origin/' prefix then running 'git blame main -- file',
requiring a local 'main' branch that may not exist in worktrees or CI.
Now passes $BaseBranch directly (e.g. 'origin/main') since git blame
accepts remote tracking refs.
4. LOW: Handle renamed files in blame (silent skip bug)
When a PR renames a file and removes lines, git blame fails on the
new filename. Error was stored in $blameOutput, passed the truthy
check, but parsed no lines — silently skipping. Now detects 'no such
path|fatal:' in blame output and attempts to find the original path
via 'git log --follow --diff-filter=R'.
5. LOW: Fix class declaration regex — support multi-modifier declarations
Pattern '^(modifier)?\s*class\s' with '?' only matched zero-or-one
modifier. 'public sealed class Foo' and 'internal static class Helper'
were not recognized. Fixed to use '*' (zero-or-more).
Also fix nested diff block as closing the outer fence.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds a GitHub Agentic Workflow that automatically checks PRs for potential regressions caused by removing lines that were added by prior bug fixes. Triggers: - pull_request: auto on implementation file changes (src/**/*.cs, src/**/*.xaml) - workflow_dispatch: manual with PR number input - issue_comment: /check-regressions slash command Security model follows the same pattern as copilot-evaluate-tests: - gh-aw sandbox with scrubbed credentials - Safe outputs (max 1 PR comment per run) - Fork PR support via forks: ["*"] + Checkout-GhAwPr.ps1 - Pre-agent gate filters PRs with no implementation file changes - Pinned actions via actions-lock.json Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two bugs fixed: 1. .Count failed on non-array returns in strict mode — wrapped Get-IssueReferences calls in @() to ensure array type 2. Script missed regressions where the originating commit only had 'fixes #XXXX' in the PR description, not the commit message. Now when a commit references a PR via (#XXXX) but has no issue refs in its message, the script queries the PR body via gh CLI. Also fixed issue ref formatting in the report (#32301, 32436 → #32301, #32436). Verified: correctly detects PR #33908 removing the IMauiRecyclerView guard that was added by PR #32278 to fix issue #32436. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Closing in favour of #34768 |
Problem
Three P/0 regressions (#34634, #34635, #34636) shipped in 10.0.60 because the agent approved PRs that unknowingly reverted lines from prior bug fix PRs.
Concrete example: PR #33908 removed
|| parent is IMauiRecyclerViewfromFindListenerForView()inMauiWindowInsetListener.cs. That line was added by PR #32278 specifically to fix issue #32436 (increasing gap at bottom while scrolling in RecyclerView). Removing it caused #32436 to reappear as regression #34634.The agent had no mechanism to detect the "reverted fix" pattern. Its code review only looked at what the PR added, not whether what it removed was a deliberate prior fix.
Solution
Add a mandatory Prior Fix Regression Check as Step 1 of the
pr-finalizecode review. Before any other review work, the agent must check whether deleted lines from implementation files were added by prior bug-fix commits.Detection Method
git blameon those lines to find the originating commitfixes #XXXX)Files Changed
Modified
.github/skills/pr-finalize/SKILL.md— Added "Prior Fix Regression Check" as mandatory Step 1 of Phase 2 (Code Review), documented with the real MauiWindowInsetListener.cs example.github/copilot-instructions.md— Added regression detection step to the Code Review Instructions sectionCreated
.github/skills/pr-finalize/scripts/Detect-Regressions.ps1— PowerShell script automating detection: gets PR diff → extracts deleted lines → runsgit blame→ checks commit messages for issue refs → exits 1 if regressions found.github/skills/pr-finalize/tests/eval.yaml— 11 eval scenarios for thepr-finalizeskill using the same format asevaluate-pr-tests/tests/eval.yaml, compatible withdotnet skills validator.github/instructions/regression-detection.instructions.md— Global instruction file documenting the anti-pattern, high-risk MAUI code locations, and guidance for PR authorsEval Scenarios (tests/eval.yaml)
Key scenarios covering the regression detection logic:
Testing
Run evals against the
pr-finalizeskill:Issues Fixed
Part of the investigation into #34634, #34635, #34636
Platforms Tested
N/A — skill/tooling change only
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com