Skip to content

Integrate UI test category detection into PR review and fix gate reliability#35133

Merged
PureWeen merged 6 commits into
mainfrom
feature/detect-uitest-categories
Apr 30, 2026
Merged

Integrate UI test category detection into PR review and fix gate reliability#35133
PureWeen merged 6 commits into
mainfrom
feature/detect-uitest-categories

Conversation

@kubaflo
Copy link
Copy Markdown
Contributor

@kubaflo kubaflo commented Apr 24, 2026

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

Depends on #35136 (pipeline category detection — should merge first)

What this does

Two things:

1. UI test category detection in PR review

During the PR review workflow, Step 0.5 detects which UI test categories the PR impacts and writes the result to the AI summary comment. This gives reviewers visibility into which UI tests are relevant.

Detection reuses the 3-tier script from #35136 (test attributes → source paths → AI reasoning).

AI summary shows a new 🧪 UI Tests section with detected categories before the gate section.

2. Gate reliability fixes

Multiple fixes to make the gate (verify-tests-fail.ps1) more deterministic:

Fix Problem it solves
Absolute path resolution Gate scripts not found on Linux CI agents (Resolve-Path, GetFullPath)
File existence check Instant cryptic failure when verify script is missing — now logs clear error
3x retry on ENV ERROR Emulator timeouts, ADB failures, app crashes — transient issues that pass on retry
Strip bad report blocks Old verify script produces Passed: False with empty counts — stripped instead of shown
Gate log in fallback When report is missing, shows last 20 lines of gate output instead of just ❌ FAILED / Platform: IOS

Files

File Changes
.github/scripts/Review-PR.ps1 Step 0.5 category detection + all 5 gate fixes
.github/scripts/post-ai-summary-comment.ps1 Add uitests phase to render detected categories
.github/pr-review/pr-preflight.md Step 7: AI identifies impacted UI test categories

Validation — PR reviewer builds (Apr 26)

10 builds against real PRs — all succeeded ✅. Category detection shown in AI summary comment.

PR Categories Detected Build AI Summary
#35037 (WebView theme) ViewBaseTests,WebView 13940071 comment
#35031 (Shell memory leak) Shell 13940072 comment
#35020 (XAML Hot Reload) (none — XAML only) 13940073 ✅ Shows "No UI test categories"
#35008 (Shell SearchHandler) Shell 13940074
#34997 (RadioButton gradient) RadioButton,ViewBaseTests 13940075
#34980 (DatePicker rotation) ViewBaseTests 13940076
#34974 (Picker CharacterSpacing) ViewBaseTests 13940077
#34923 (SwipeView threshold) SwipeView,ViewBaseTests 13940078
#34907 (CollectionView ScrollTo) CollectionView 13940079
#34845 (RefreshView binding) RefreshView,ViewBaseTests 13940080

Copilot AI review requested due to automatic review settings April 24, 2026 18:36
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35133

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35133"

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automatic UI test category detection so PR UI test jobs can skip running when the PR doesn’t touch their category group, reducing overall UI test time and CI load.

Changes:

  • Introduces a PowerShell-based category detection script (test diff scanning + source-path heuristics + optional AI hints).
  • Adds a “Discover” stage and per-job early gating to skip irrelevant UI test jobs in maui-pr-uitests.
  • Adds orchestration and reporting scripts to trigger the UI test pipeline and post a consolidated PR comment, plus integrates category detection into the PR review flow.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
eng/scripts/detect-ui-test-categories.ps1 Detects relevant UI test categories from PR changes and emits variables for downstream filtering.
eng/pipelines/common/ui-tests.yml Adds the discovery stage and wires detected category outputs into UI test stages/jobs.
eng/pipelines/common/ui-tests-steps.yml Adds an early per-job filter step and conditions many steps on whether the job should run.
eng/pipelines/ci-uitests.yml Adds prNumber / categories parameters to support manual targeted runs.
eng/pipelines/ci-copilot.yml Exposes AZURE_DEVOPS_EXT_PAT for cross-org AzDO queuing from the Copilot pipeline.
.github/scripts/trigger-uitest-pipeline.ps1 Orchestrates detect → queue → monitor → summarize flow for UI tests.
.github/scripts/post-uitest-categories-comment.ps1 Posts/updates a single PR comment summarizing UI test results and failures.
.github/scripts/post-ai-summary-comment.ps1 Adds a UI Tests section to the unified AI summary comment.
.github/scripts/Review-PR.ps1 Adds a preliminary category detection step and improves gate retry/report handling.
.github/pr-review/pr-preflight.md Updates the PR pre-flight checklist to include identifying impacted UI test categories.
Comments suppressed due to low confidence (1)

eng/pipelines/common/ui-tests-steps.yml:275

  • The iOS cleanup step runs whenever platform == ios, even when the new early category check set SHOULD_RUN_TESTS=False and all prior steps were skipped. This can waste time and may fail because test artifacts/simulators were never provisioned. Gate this step on SHOULD_RUN_TESTS as well (similar to the other tasks in this template).
- bash: |
    cat ${BASH_SOURCE[0]}
    pwsh ./build.ps1 --target=Cleanup -Script eng/devices/${{ parameters.platform }}.cake ---results="$(TestResultsDirectory)" ${{ parameters.cakeArgs }}
  displayName: Cleanup and Create Simulator Logs if Test Run Failed To
  condition: ${{ eq(parameters.platform, 'ios') }}
  continueOnError: true

Comment thread .github/pr-review/pr-preflight.md Outdated
> **Validation constraint:** The Step 7 prompt MUST NOT contain issue titles, root-cause descriptions, bug summaries, or any Part A content — only `PR #XXXXX`. If you find yourself adding context "to help" the sub-agent, you are violating independence-first.
> **Validation constraint:** The Step 8 prompt MUST NOT contain issue titles, root-cause descriptions, bug summaries, or any Part A content — only `PR #XXXXX`. If you find yourself adding context "to help" the sub-agent, you are violating independence-first.

7. **Invoke the code-review skill as a sub-agent:**
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step numbering is inconsistent after adding the new “Identify Impacted UI Test Categories” step: this section is now labeled “Part B: Code Review (Step 8)”, but the instruction below still says “7. Invoke the code-review skill as a sub-agent”. Update the numbering to avoid confusion when following the checklist.

Suggested change
7. **Invoke the code-review skill as a sub-agent:**
8. **Invoke the code-review skill as a sub-agent:**

Copilot uses AI. Check for mistakes.
Comment thread .github/scripts/Review-PR.ps1 Outdated
Comment on lines +464 to +467
if ([string]::IsNullOrWhiteSpace($uitestCategories) -or $uitestCategories -eq 'NONE') {
Write-Host " ℹ️ No UI test categories detected" -ForegroundColor DarkGray
} else {
Write-Host " 🎯 Detected categories: $uitestCategories" -ForegroundColor Green
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detect-ui-test-categories.ps1 uses an empty UITestCategoryList output to mean “run the full matrix” (it returns without setting the variable). Here, an empty result is treated as “no UI test categories detected”, which is misleading in the common fallback-to-all case. Treat empty as “ALL/full matrix” and reserve NONE for “skip all UI tests”.

Copilot uses AI. Check for mistakes.
if (-not [string]::IsNullOrWhiteSpace($aiContent) -and $aiContent.Trim() -ne 'NONE') {
# Extract category names (lines like "Button — justification")
$aiCatLines = @($aiContent -split "`n" | ForEach-Object {
if ($_ -match '^([A-Za-z]+)') { $Matches[1] }
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI category parsing only captures leading letters (^([A-Za-z]+)), which will truncate categories that contain digits (e.g., Material3 becomes Material). Expand the pattern to include digits/underscores so AI-provided categories can round-trip correctly.

Suggested change
if ($_ -match '^([A-Za-z]+)') { $Matches[1] }
if ($_ -match '^([A-Za-z0-9_]+)') { $Matches[1] }

Copilot uses AI. Check for mistakes.
Comment thread eng/pipelines/common/ui-tests-steps.yml Outdated
Comment on lines +20 to +23
# EARLY CHECK: Determine if this category group should run tests
# This runs FIRST to avoid wasting time on provisioning if no tests will run
# Also calculates matching categories to avoid duplicating this logic later
- pwsh: |
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This early check claims it runs first to avoid wasting time on provisioning when a job is skipped, but the template still invokes provision.yml unconditionally later in the job. That means even skipped category jobs will still pay most of the provisioning cost, which undermines the intended runtime savings. Consider gating the provisioning step(s) on SHOULD_RUN_TESTS=True as well (or adding a condition hook parameter to the provisioning template).

Copilot uses AI. Check for mistakes.
Write-Host "Test file changes detected under '$TestRoot'." -ForegroundColor Green
}

$categoryPattern = '^\+\s*\[Category\((?<value>[^\)]*)\)\]'
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex used to detect added [Category(...)] lines only matches attributes that end immediately with )]. Many tests use combined attributes like [Category(UITestCategories.X), Order(1)], which won't match and will cause category detection to miss relevant categories (leading to overly broad runs). Update the pattern to also match when Category(...) is followed by , or ] (and allow whitespace).

Suggested change
$categoryPattern = '^\+\s*\[Category\((?<value>[^\)]*)\)\]'
$categoryPattern = '^\+\s*\[Category\((?<value>[^\)]*)\)\s*(?:,|\])'

Copilot uses AI. Check for mistakes.
Comment on lines +311 to +322
$content = Get-Content $file -Raw
$fileMatches = [regex]::Matches($content, '\[Category\(([^\)]*)\)\]')
foreach ($m in $fileMatches) {
$rawValue = $m.Groups[1].Value.Trim()
if ([string]::IsNullOrWhiteSpace($rawValue)) { continue }
if ($rawValue -match '^UITestCategories\.(?<name>[A-Za-z0-9_]+)$') {
$cat = $Matches['name']
} elseif ($rawValue -match '^["''](?<name>[A-Za-z0-9_ -]+)["'']$') {
$cat = $Matches['name']
} elseif ($rawValue -match 'nameof\(UITestCategories\.(?<name>[A-Za-z0-9_]+)\)') {
$cat = $Matches['name']
} else { continue }
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When scanning the full contents of modified test files, the \[Category\(([^\)]*)\)\] regex has the same limitation as the diff regex and will not match combined attributes like [Category(UITestCategories.X), Order(1)]. This will prevent Tier 1 fallback detection from finding existing categories in many files. Adjust the regex to capture the Category(...) argument even when additional attributes follow in the same bracket.

Copilot uses AI. Check for mistakes.
Comment on lines +79 to +80
} elseif (-not [string]::IsNullOrWhiteSpace($env:SYSTEM_ACCESSTOKEN)) {
$h['Authorization'] = "Bearer $env:SYSTEM_ACCESSTOKEN"
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get-GitHubHeaders falls back to SYSTEM_ACCESSTOKEN as a GitHub Bearer token. SYSTEM_ACCESSTOKEN is an Azure DevOps token and won't authenticate to the GitHub API, so PR metadata/label calls will fail unless GH_TOKEN is present. Consider removing this fallback and emitting a clear warning when GH_TOKEN is missing (or support a dedicated GitHub token env var).

Suggested change
} elseif (-not [string]::IsNullOrWhiteSpace($env:SYSTEM_ACCESSTOKEN)) {
$h['Authorization'] = "Bearer $env:SYSTEM_ACCESSTOKEN"
} else {
Write-Host "##[warning]GH_TOKEN is not set. GitHub API requests will be sent without authentication and may fail or be rate-limited."

Copilot uses AI. Check for mistakes.
@kubaflo kubaflo added area-ai-agents Copilot CLI agents, agent skills, AI-assisted development agentic-workflows and removed agentic-workflows labels Apr 24, 2026
@kubaflo kubaflo force-pushed the feature/detect-uitest-categories branch from 8898e29 to 8586797 Compare April 25, 2026 09:50
@kubaflo kubaflo changed the title Add smart UI test category detection to skip irrelevant test jobs Integrate UI test category detection into PR review workflow Apr 25, 2026
@kubaflo kubaflo force-pushed the feature/detect-uitest-categories branch 2 times, most recently from 231f9a9 to 1b54ca1 Compare April 25, 2026 10:25
@kubaflo kubaflo changed the title Integrate UI test category detection into PR review workflow Integrate UI test category detection into PR review and fix gate reliability Apr 25, 2026
PR review integration (depends on #35136 for pipeline changes):
- Review-PR.ps1: Step 0.5 detects categories, gate retry on ENV ERROR,
  absolute path resolution, bad report format stripping
- post-uitest-categories-comment.ps1: Rich results with platform table,
  failure classification, supports -OutputFile for AI summary embedding
- trigger-uitest-pipeline.ps1: Orchestrator for detect → queue → monitor
- post-ai-summary-comment.ps1: UI Tests section (before gate)
- pr-preflight.md: Step 7 AI category identification, Step 8 code review
- ci-copilot.yml: DNCENG_PUBLIC_PAT for cross-org build queuing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo kubaflo force-pushed the feature/detect-uitest-categories branch from 1b54ca1 to 8413e08 Compare April 25, 2026 23:41
Two issues:
1. The copilot agent was overwriting gate/content.md with its own
   generic version that lacked actual error messages. Added explicit
   instruction not to touch gate/content.md.
2. The verify script's failure details were in a collapsed <details>
   block. Changed to an open #### heading so the error reason is
   immediately visible (e.g. screenshot baseline mismatch).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

🔍 Skill Validation Results

✅ Static Checks Passed

Skills checked: 15 | Agents checked: 3

Full validator output
Found 1 skill(s)
[verify-tests-fail-without-fix] 📊 verify-tests-fail-without-fix: 2,271 BPE tokens [chars/4: 2,189] (detailed ✓), 26 sections, 7 code blocks
✅ All checks passed (1 skill(s))
Found 3 agent(s)
Validated 3 agent(s)

✅ All checks passed (3 agent(s))

❌ LLM Evaluation Failed

0/1 skill(s) passed validation

Skill Scenario Baseline Skilled Verdict

verify-tests-fail-without-fix: Eval scenario 'Regression: tests passing without fix means verification FAILED' prompt mentions target name 'verify-tests-fail-without-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs. Eval scenario 'Edge case: no test files detected in the PR' prompt mentions target name 'verify-tests-fail-without-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs. Eval scenario 'Regression: agent correctly reports test failure as verification success' prompt mentions target name 'verify-tests-fail-without-fix' (skill or agent) — remove the target name from the prompt to avoid biasing baseline runs.

🔍 Full results and investigation steps

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Code Review: 8 findings posted inline (8 moderate). See summary comment for methodology and discarded findings.

Generated by Expert Code Review · ● 29.8M

Comment on lines +119 to +127
git remote add _detect_base $baseRepoCloneUrl
git fetch _detect_base "$TargetBranch" --no-tags --prune --depth=200 | Out-Null
git update-ref refs/remotes/origin/$TargetBranch _detect_base/$TargetBranch | Out-Null

# Fetch head commit (works for forks too) and check it out so the diff reflects the PR changes.
git remote remove _detect_head 2>$null | Out-Null
git remote add _detect_head $headRepoCloneUrl
git fetch _detect_head "$headSha" --no-tags --depth=200 | Out-Null
git checkout --quiet $headSha | Out-Null
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Moderate — 3/3 consensus | Git state not restored on failure

Manual PR mode performs git checkout --quiet $headSha (detached HEAD) and adds temporary remotes (_detect_base, _detect_head) without a try/finally to restore original state. If the script fails mid-execution, the working tree is left detached and stray remotes persist in .git/config.

Recommendation: Wrap the manual-mode git block in try/finally. In finally, restore the original HEAD (git checkout $originalRef) and remove both temporary remotes unconditionally.

Comment thread .github/scripts/Review-PR.ps1 Outdated
$gateOutput | ForEach-Object { Write-Host " $_" }

# Check if this was an ENV ERROR (emulator timeout, ADB failure, etc.)
$gateContentFile = Join-Path $gateOutputDir "verify-tests-fail/verification-report.md"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Moderate — 2/3 consensus | Stale report file across retries

$gateContentFile (verification-report.md) is read each retry iteration to detect ENV ERROR, but is never cleared between attempts. If attempt 1 writes an ENV ERROR report and attempt 2 crashes before overwriting it, the stale file causes misclassification of the second attempt.

Recommendation: Delete or rename $gateContentFile at the start of each loop iteration before invoking the verify script:

if (Test-Path $gateContentFile) { Remove-Item $gateContentFile -Force }

$pr = Invoke-WithRetry -Uri $prUrl -Headers (Get-GitHubHeaders)
$TargetBranch = $pr.base.ref
$headRef = $pr.head.ref
$headSha = $pr.head.sha
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Moderate — 2/3 consensus | No null guards on API response fields

If $pr.head.repo is null (common when a fork is deleted), accessing .clone_url silently returns $null in PowerShell. Subsequent git remote add _detect_head "" and git fetch _detect_head "" produce confusing errors. The outer try/catch may or may not catch native command failures depending on $ErrorActionPreference.

Recommendation: Add explicit null checks before git operations:

if ([string]::IsNullOrWhiteSpace($headSha) -or [string]::IsNullOrWhiteSpace($headRepoCloneUrl)) {
    Write-Host "##[warning]Incomplete PR API response (fork may be deleted). Falling back."
    return
}

Start-Sleep -Seconds 30
}
}
if ($isEnvError) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Moderate — 2/3 consensus | $isEnvError post-loop invariant is non-obvious

The check if ($isEnvError) after the loop is technically correct (it can only be $true here if ALL iterations were env errors, since non-env-error iterations break). However, the variable name suggests "any" rather than "all", and if $maxGateAttempts is ever 0, $isEnvError would be undefined ($null → falsy, so safe but fragile).

Recommendation: Either add a clarifying comment explaining the invariant, or use a counter: if ($envErrorCount -eq $maxGateAttempts) { ... }.

$uitestCategories = ""

$detectScript = Join-Path $RepoRoot "eng/scripts/detect-ui-test-categories.ps1"
if (Test-Path $detectScript) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Moderate — 2/3 consensus | Variable name shadowing

$detectScript is defined here for detect-ui-test-categories.ps1, then reassigned at line ~499 to Detect-TestsInDiff.ps1. Both live in the same script scope. Future edits could accidentally reference the wrong script path.

Recommendation: Use distinct names — e.g., $uitestDetectScript (Step 0.5) and $diffDetectScript (Step 1).

} else {
$message = "Unrecognized category expression '$rawValue'. Expected formats: UITestCategories.<Name>, nameof(UITestCategories.<Name>), or a quoted string."
Write-Host "##[error]$message"
throw $message
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Moderate — 2/3 consensus | throw on unrecognized category halts all detection

When the diff scan encounters a [Category()] expression that doesn't match the three supported patterns (e.g., a constant like [Category(MyConstants.Button)]), the script throws an exception. This kills the child pwsh process, causing the parent to receive zero categories — silently skipping Tier 2/3 fallback detection.

Note: The existing-file scan at line ~314 uses continue for unrecognized formats, creating an inconsistency.

Recommendation: Replace throw $message with Write-Host "##[warning]$message" + continue. Let Tier 2/3 fill gaps gracefully.

if ($touchesControls) {
# Changed files under src/Controls/ but couldn't map to specific categories — run all
Write-Host "Changed files touch Controls/Core/Essentials but no specific categories identified. Running all." -ForegroundColor Yellow
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Moderate — 2/3 consensus | "Run all" fallback paths never set output variables

This return (and several others at lines 52, 57, 130, 154, 165, 174, 180) exits without emitting ##vso[task.setvariable variable=UITestCategoryList;isOutput=true] or UITestCategoryMatrix. The log says "Running all" but the pipeline receives no output — which may be interpreted as "run nothing" rather than "run everything" depending on how the consuming YAML handles an unset matrix variable.

Recommendation: For each "run all" fallback, explicitly emit a sentinel (e.g., ALL) or the full category list, ensuring the pipeline YAML has a well-defined behavior for this case.

# ============================================================================

if (-not [string]::IsNullOrWhiteSpace($diff)) {
foreach ($line in $diff -split "`n") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Moderate — 2/3 consensus | Line-by-line scan cannot detect multi-line attributes

The Tier 1 diff scan splits on \n and applies the category regex per-line. While current codebase conventions use single-line [Category(...)] attributes (making this safe today), the approach silently misses any future multi-line formatting. Under-detection is safe (falls back to broader detection) but worth noting.

Recommendation: Consider adding a comment documenting this assumption, or collapsing consecutive + lines before regex matching to handle wrapped attributes.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Code Review: 2 findings posted inline (1 warning, 1 suggestion). Both are minor — overall the implementation is solid with proper error handling, retry logic, and fail-safe fallbacks.

Note

🔒 Integrity filter blocked 5 items

The following items were blocked because they don't meet the GitHub integrity level.

  • 683daa4 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 8413e08 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • 388b82f list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • bd3a0e5 list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".
  • f49d3bd list_commits: has lower integrity than agent requires. The agent cannot read data with integrity below "approved".

To allow these resources, lower min-integrity in your GitHub frontmatter:

tools:
  github:
    min-integrity: approved  # merged | approved | unapproved | none

Generated by Expert Code Review · ● 40.6M

} else {
$message = "Unrecognized category expression '$rawValue'. Expected formats: UITestCategories.<Name>, nameof(UITestCategories.<Name>), or a quoted string."
Write-Host "##[error]$message"
throw $message
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Warning — Aggressive abort on unrecognized [Category] format

If a test in the diff uses an uncommon [Category] expression format (e.g., a constant from another class, string interpolation, or concatenation), this throw aborts the entire detection script.

In practice the caller in Review-PR.ps1 treats this as non-fatal (category detection silently returns empty), so it's fail-safe. However, in the Azure Pipelines context (direct invocation), this terminates with an error and the ##vso output variables are never set — the pipeline would fall back to "run all categories" which is fine, but the ##[error] log line may cause confusion during triage.

Consider: Write-Host "##[warning]$message" + continue instead of throw, so detection continues for remaining diff lines and only logs a warning for the unrecognized expression.

$aiCatList = @($AiCategories -split '[,\n]' | ForEach-Object { ($_ -replace '\s*[-—].*$', '').Trim() } | Where-Object { $_ -and $_ -ne 'NONE' })
if ($aiCatList.Count -gt 0) {
Write-Host "Tier 3 (AI reasoning): $([string]::Join(', ', $aiCatList))" -ForegroundColor Green
foreach ($c in $aiCatList) { $addedCategories.Add($c) | Out-Null }
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Suggestion — AI-provided categories added without validation

Tier 3 adds AI-suggested categories directly to $addedCategories without checking they exist in UITestCategories.cs. If the AI halluccinates a category name (e.g., "Modal", "Toolbar"), it creates a matrix job that runs zero tests — wasting a CI slot and potentially masking detection failures.

Consider: Validate against the known category list (Tier 2's $pathToCategoryMap already enumerates valid categories), or emit a ##[warning] for unrecognized AI categories so they're visible in logs without silently creating empty jobs.

…ats and AI hallucinations

- Replace 'throw' with '##[warning]' + continue when an unrecognized
  [Category(...)] expression is found in a diff. Previously a single
  unsupported expression (e.g., a constant from another class) aborted
  the whole detection script and silently fell back to running ALL
  categories. Now the script logs a warning and lets Tier 2 / Tier 3
  fill in the gaps.

- Validate Tier 3 (AI-suggested) categories against UITestCategories.cs
  before adding them to the matrix. Hallucinated category names would
  otherwise create matrix jobs that run zero tests, wasting a CI slot
  and masking detection failures. Invalid names are now skipped with a
  warning.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo
Copy link
Copy Markdown
Contributor Author

kubaflo commented Apr 29, 2026

Addressed both findings from the latest expert review in 3db7ee4:

⚠️ Warning — Aggressive abort on unrecognized [Category] format (line 296)
Replaced throw with Write-Host "##[warning]" + continue. A single unrecognized category expression (e.g., a constant from another class) no longer aborts the entire detection script. Tier 2 (source paths) and Tier 3 (AI) still get a chance to fill in categories.

💡 Suggestion — AI-provided categories added without validation (line 369)
Tier 3 now reads UITestCategories.cs once, builds a HashSet of valid category names (case-insensitive), and skips any AI-suggested category that isn't defined there. Hallucinated names emit a ##[warning] instead of silently creating an empty matrix job. Validated locally against the file: 74 categories parsed, Material3 ✅, NotARealCategory ❌.

Did not adopt the prior review's findings (copilot-pull-request-reviewer and github-actions reviews from earlier today) — those cover separate issues (git state restoration on failure, stale gate report file across retries, etc.) which are worth addressing but are out of scope for this targeted pass on review #4200492243. Happy to follow up on those in a separate commit if you'd like.

This was referenced May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai-agents Copilot CLI agents, agent skills, AI-assisted development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants