Skip to content

Add UI test category detection to maui-pr-uitests pipeline#35136

Draft
kubaflo wants to merge 3 commits into
mainfrom
feature/uitest-category-pipeline
Draft

Add UI test category detection to maui-pr-uitests pipeline#35136
kubaflo wants to merge 3 commits into
mainfrom
feature/uitest-category-pipeline

Conversation

@kubaflo
Copy link
Copy Markdown
Contributor

@kubaflo kubaflo commented Apr 25, 2026

Note

Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!

What this does

Before: Every PR runs ~24,000 UI tests across all categories (~2+ hours).

After: The pipeline detects which UI test categories a PR actually touches and only runs those. A PR that changes Button code only runs Button tests (~400 tests, ~30 min). Docs-only PRs skip UI tests entirely.

How it works

  1. A new Discover stage analyzes the PR diff before test stages run
  2. It detects relevant categories using:
    • Test file attributes[Category(UITestCategories.X)] in changed/existing test files (supports combined attributes like [Category(X), Order(1)])
    • Source-path mapping — 60+ patterns map product code to categories (e.g. Shell/ → Shell, Button* → Button, DragGesture* → Gestures + DragAndDrop)
    • AI categories — optional -AiCategories parameter for external input
  3. Each test job checks if its category group matches — if not, it skips in seconds
  4. If NONE detected (docs-only, build scripts) — build stages are skipped too
  5. If no specific categories identified but src/Controls/ changed — runs full matrix (safe default)

Escape hatches

  • Add run-all-uitests label to any PR → forces full matrix
  • Queue manually with categories parameter → run specific categories
  • Queue with prNumber parameter → test detection against any PR

Files

File What it does
eng/scripts/detect-ui-test-categories.ps1 3-tier detection script with path mapping, retry on GitHub API failures
eng/pipelines/common/ui-tests.yml Discover stage, build stage skip on NONE, cross-stage variable propagation
eng/pipelines/common/ui-tests-steps.yml Per-job filter gate — early check before provisioning/test execution
eng/pipelines/ci-uitests.yml prNumber + categories queue-time parameters

Validation runs

Latest batch (Apr 26) — 6 builds, all succeeded ✅

Each build correctly detected categories from the PR diff. Non-matching jobs started but skipped all test steps in ~2min (vs 60-100min for matching jobs).

PR Categories Detected Build Result
#35031 (Shell memory leak) Shell 1397058 ✅ Shell jobs ran ~100min, all others skipped steps
#34997 (RadioButton gradient) RadioButton,ViewBaseTests 1397062
#34980 (DatePicker rotation) ViewBaseTests 1397065
#34974 (Picker CharacterSpacing) ViewBaseTests 1397067
#34923 (SwipeView threshold) SwipeView,ViewBaseTests 1397069
#34845 (RefreshView binding) RefreshView,ViewBaseTests 1397071

Earlier batch — 5 builds

PR Categories Detected Build Result
#35131 Shell 1396744 ✅ Succeeded
#35129 Shell 1396745 ✅ Succeeded
#35087 SwipeView,ViewBaseTests 1396746 ❌ Test failures
#35092 ViewBaseTests,WebView 1396747 ❌ Test failures
#35083 SafeAreaEdges,ViewBaseTests 1396748 ✅ Succeeded

Category detection accuracy (30+ PRs validated)

PR Detected Tests run vs Full matrix
#35009 ToolbarItem 467 24,000
#34997 RadioButton 496 24,000
#35079 SearchBar,Shell 2,218 24,000
#34637 Shape 505 24,000
#35072 Navigation,Shell 2,540 24,000
#34980 ViewBaseTests 553 24,000
#31672 CarouselView,CollectionView 4,451 24,000

Adds smart category detection to maui-pr-uitests pipeline. Instead of
running all ~24,000 UI tests (~2h+), the pipeline detects which
categories a PR touches and only runs those.

3-tier detection:
1. Test file [Category] attributes in the PR diff
2. Source-path-to-category mapping (60+ patterns)
3. AI-provided categories via -AiCategories parameter

Pipeline changes:
- New Discover stage runs detection before test stages
- Per-job filter gate skips non-matching matrix cells
- prNumber and categories queue-time parameters for manual trigger
- run-all-uitests label escape hatch

Validated on 30+ PRs. Targeted runs: 400-4,000 tests in ~30 min
vs full matrix: 24,000 tests in 2h+.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 25, 2026 09:47
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 25, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35136

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35136"

kubaflo added a commit that referenced this pull request Apr 25, 2026
PR review integration (depends on #35136 for pipeline changes):
- Review-PR.ps1: Step 0.5 detects categories, gate retry on ENV ERROR,
  absolute path resolution, bad report format stripping
- post-uitest-categories-comment.ps1: Rich results with platform table,
  failure classification, supports -OutputFile for AI summary embedding
- trigger-uitest-pipeline.ps1: Orchestrator for detect → queue → monitor
- post-ai-summary-comment.ps1: UI Tests section (before gate)
- pr-preflight.md: Step 7 AI category identification, Step 8 code review
- ci-copilot.yml: DNCENG_PUBLIC_PAT for cross-org build queuing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds category-based UI test discovery so maui-pr-uitests can skip UI test jobs that aren’t relevant to the changes in a PR, reducing overall CI time by avoiding unnecessary test execution.

Changes:

  • Added a new PowerShell script to detect impacted UI test categories from PR diffs (test attribute scan + source-path mapping + optional AI input).
  • Introduced a “Discover UI Test Categories” stage and propagated detected categories to downstream UI test jobs.
  • Added an early per-job gate to skip execution when a job’s category doesn’t match the detected set, plus new queue-time parameters (prNumber, categories) for manual runs/overrides.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
eng/scripts/detect-ui-test-categories.ps1 New detection script that determines UI test categories to run based on PR changes and emits AzDO output variables.
eng/pipelines/common/ui-tests.yml Adds the discover stage and wires detected categories into UI test job variables/conditions.
eng/pipelines/common/ui-tests-steps.yml Adds an early “should this category run?” step and uses its result to gate many subsequent steps.
eng/pipelines/ci-uitests.yml Adds prNumber and categories parameters and passes them through to the shared UI test template.

Comment thread eng/pipelines/common/ui-tests-steps.yml Outdated
Comment on lines +140 to +141
parameters:
extraCondition: eq(variables['SHOULD_RUN_TESTS'], 'True')
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enable-kvm.yml is a step template with no parameters: block, but this call site passes a parameters: object. Azure Pipelines will fail template expansion with an unexpected parameter. Either remove the parameters here, or update eng/pipelines/common/enable-kvm.yml to accept a parameter and incorporate it into the step condition.

Suggested change
parameters:
extraCondition: eq(variables['SHOULD_RUN_TESTS'], 'True')

Copilot uses AI. Check for mistakes.
Comment on lines 152 to 170
- template: provision.yml
parameters:
skipJdk: ${{ ne(parameters.platform, 'android') }}
skipAndroidCommonSdks: ${{ ne(parameters.platform, 'android') }}
skipAndroidPlatformApis: true
onlyAndroidPlatformDefaultApis: true
skipAndroidEmulatorImages: ${{ ne(parameters.platform, 'android') }}
skipAndroidCreateAvds: true
androidEmulatorApiLevel: ${{ parameters.version }}
skipXcode: ${{ or(eq(parameters.platform, 'android'), eq(parameters.platform, 'windows'), eq(parameters.platform, 'catalyst')) }}
skipSimulatorSetup: ${{ or(eq(parameters.platform, 'android'), eq(parameters.platform, 'windows'), eq(parameters.platform, 'catalyst')) }}
provisionatorChannel: ${{ parameters.provisionatorChannel }}
${{ if parameters.skipProvisioning }}:
skipProvisionator: true
${{ else }}:
skipProvisionator: false
${{ if eq(parameters.platform, 'catalyst') }}:
openSslArgs: '' # Do not use legacy openssl for Catalyst builds

Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even when SHOULD_RUN_TESTS is set to false by the early category check, the provision.yml template is still invoked unconditionally. That means skipped jobs can still spend significant time provisioning (Xcode/SDKs/etc.), undermining the goal of “skip before provisioning”. Consider gating provisioning on SHOULD_RUN_TESTS (e.g., by adding a condition-like parameter to provision.yml that gets applied to its steps, or by moving the filtering decision into job/stage conditions so the job is skipped entirely).

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +214
@{ Pattern = 'src/Controls/src/Core/PointerGesture'; Category = 'Gestures' }
@{ Pattern = 'src/Controls/src/Core/DragGesture'; Category = 'DragAndDrop' }
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path-to-category mapping assigns DragGesture/DropGesture changes to the DragAndDrop category, but many existing drag/drop UI tests are categorized under UITestCategories.Gestures (and will be skipped if only DragAndDrop is detected). To avoid missing relevant test coverage for gesture-related changes, consider also mapping these paths to Gestures (or otherwise ensuring the mapping aligns with how tests are actually categorized).

Suggested change
@{ Pattern = 'src/Controls/src/Core/PointerGesture'; Category = 'Gestures' }
@{ Pattern = 'src/Controls/src/Core/DragGesture'; Category = 'DragAndDrop' }
@{ Pattern = 'src/Controls/src/Core/PointerGesture'; Category = 'Gestures' }
@{ Pattern = 'src/Controls/src/Core/DragGesture'; Category = 'Gestures' }
@{ Pattern = 'src/Controls/src/Core/DragGesture'; Category = 'DragAndDrop' }
@{ Pattern = 'src/Controls/src/Core/DropGesture'; Category = 'Gestures' }

Copilot uses AI. Check for mistakes.
Comment on lines 84 to +92
- stage: build_ui_tests
displayName: Build UITests Sample App
dependsOn: []
dependsOn:
- discover_ui_test_categories
condition: |
or(
ne(variables['Build.Reason'], 'PullRequest'),
in(dependencies.discover_ui_test_categories.result, 'Succeeded', 'Skipped')
)
Copy link

Copilot AI Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discover_ui_test_categories can output UITestCategoryList=NONE when a PR is determined to be UI-irrelevant, but the build stages still run (they only gate on the discover stage result, not the detected category list). This can keep consuming build time even for docs-only PRs where the intent is to skip all UI test work. Consider adding a condition to skip the UI test build stages when the detected category list is NONE.

Copilot uses AI. Check for mistakes.
kubaflo added a commit that referenced this pull request Apr 25, 2026
PR review integration (depends on #35136 for pipeline changes):
- Review-PR.ps1: Step 0.5 detects categories, gate retry on ENV ERROR,
  absolute path resolution, bad report format stripping
- post-uitest-categories-comment.ps1: Rich results with platform table,
  failure classification, supports -OutputFile for AI summary embedding
- trigger-uitest-pipeline.ps1: Orchestrator for detect → queue → monitor
- post-ai-summary-comment.ps1: UI Tests section (before gate)
- pr-preflight.md: Step 7 AI category identification, Step 8 code review
- ci-copilot.yml: DNCENG_PUBLIC_PAT for cross-org build queuing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kubaflo added a commit that referenced this pull request Apr 25, 2026
PR review integration (depends on #35136 for pipeline changes):
- Review-PR.ps1: Step 0.5 detects categories, gate retry on ENV ERROR,
  absolute path resolution, bad report format stripping
- post-uitest-categories-comment.ps1: Rich results with platform table,
  failure classification, supports -OutputFile for AI summary embedding
- trigger-uitest-pipeline.ps1: Orchestrator for detect → queue → monitor
- post-ai-summary-comment.ps1: UI Tests section (before gate)
- pr-preflight.md: Step 7 AI category identification, Step 8 code review
- ci-copilot.yml: DNCENG_PUBLIC_PAT for cross-org build queuing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kubaflo and others added 2 commits April 25, 2026 12:34
… skip

- Remove invalid parameters from enable-kvm.yml call (no params block)
- Map DragGesture/DropGesture to both DragAndDrop AND Gestures categories
- Skip build stages when UITestCategoryList=NONE (docs-only PRs)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Material3 stage was missing dependsOn: discover_ui_test_categories
and DETECTED_CATEGORIES variable, causing filter to not engage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
kubaflo added a commit that referenced this pull request Apr 25, 2026
PR review integration (depends on #35136 for pipeline changes):
- Review-PR.ps1: Step 0.5 detects categories, gate retry on ENV ERROR,
  absolute path resolution, bad report format stripping
- post-uitest-categories-comment.ps1: Rich results with platform table,
  failure classification, supports -OutputFile for AI summary embedding
- trigger-uitest-pipeline.ps1: Orchestrator for detect → queue → monitor
- post-ai-summary-comment.ps1: UI Tests section (before gate)
- pr-preflight.md: Step 7 AI category identification, Step 8 code review
- ci-copilot.yml: DNCENG_PUBLIC_PAT for cross-org build queuing

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@PureWeen
Copy link
Copy Markdown
Member

/review

1 similar comment
@PureWeen
Copy link
Copy Markdown
Member

/review

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Expert Code Review completed successfully!

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expert Code Review — PR #35136

Methodology: 3 independent reviewers with adversarial consensus. Findings included only when ≥2/3 reviewers agree on the same root cause, with disputed findings verified via targeted follow-up.


Findings Summary

# Severity Finding Consensus File
1 🔴 CRITICAL stageDependencies invalid in stage conditions — cv1/carv1 filtering is a silent no-op 3/3 reviewers ui-tests.yml:379-380, 436-437
2 🔴 CRITICAL throw on unrecognized [Category] cascades to skip ALL UI tests 2/3 reviewers detect-ui-test-categories.ps1:296 + ui-tests.yml:92
3 🟡 MODERATE provision.yml not gated on SHOULD_RUN_TESTS — provisioning still runs for skipped jobs 3/3 reviewers ui-tests-steps.yml (existing template call)
4 🟡 MODERATE No GH_TOKEN mapped — label escape hatch unreliable under load 2/3 reviewers ui-tests.yml:75-78
5 🟡 MODERATE build_ui_tests_coreclr, _windows, _material3 stages not gated on NONE 3/3 reviewers ui-tests.yml:109-178
6 🟡 MODERATE categories param silently ignored when queued without prNumber 2/3 reviewers ui-tests.yml:59-62
7 🟡 MODERATE contains() is substring match — fragile for future category names 3/3 reviewers ui-tests.yml:380, 437
8 🟢 MINOR UITestCategoryMatrix output produced but never consumed 2/3 reviewers detect-ui-test-categories.ps1:403

Details for Non-Inline Findings

#3provision.yml not gated (3/3 reviewers, 🟡)

The early check comment says it "runs FIRST to avoid wasting time on provisioning if no tests will run", but provision.yml and enable-kvm.yml template inclusions in ui-tests-steps.yml have no SHOULD_RUN_TESTS condition. When a job determines it should skip, provisioning (Xcode, JDK, Android SDK) still executes — potentially 10+ minutes of wasted compute per skipped job. Consider this as a follow-up improvement.

#5 — Other build stages not NONE-gated (3/3 reviewers, 🟡)

The PR description states "If NONE detected (docs-only, build scripts) — build stages are skipped too", but only build_ui_tests (Mono) has the NONE condition. build_ui_tests_coreclr (line 109), build_ui_tests_nativeaot (line 127), build_ui_tests_windows (line 144), and build_ui_tests_material3 (line 160) all retain dependsOn: [] with no NONE check. For docs-only PRs, these stages still build sample apps (~20 min each) that will never run.


CI & Test Coverage

This PR changes pipeline infrastructure only — no product code or test code is modified. The PR description includes validation runs against 11 builds covering various real PRs, which is the appropriate validation approach for pipeline changes. No unit tests are expected for YAML/PowerShell pipeline scripts.


Verdict: NEEDS_DISCUSSION

Confidence: High

The overall architecture is well-designed — the 3-tier detection, safe fallback to full matrix, escape hatches, and per-job early gating are all sound patterns. However, the two 🔴 CRITICAL findings should be addressed before merge:

  1. The stageDependencies bug in cv1/carv1 conditions means the filtering optimization is silently broken for two of the most expensive test stages — they always run regardless of detected categories.
  2. The throw on unrecognized category expressions can cascade through the in(..., 'Succeeded', 'Skipped') condition to silently skip the entire UI test suite for a PR.

Both are straightforward fixes (use dependencies instead of stageDependencies; replace throw with continue + add 'Failed' fallback). The 🟡 findings are worth addressing but could be follow-ups.

Generated by Expert Code Review for issue #35136 · ● 24.2M

Comment on lines +379 to +380
eq(stageDependencies.discover_ui_test_categories.detect.outputs['DetectCategories.UITestCategoryList'], ''),
contains(stageDependencies.discover_ui_test_categories.detect.outputs['DetectCategories.UITestCategoryList'], 'CollectionView')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL — stageDependencies is invalid in stage-level conditions; cv1/carv1 filtering is a silent no-op (3/3 reviewers)

stageDependencies is only valid in job-level contexts (job conditions, $[ ] variable expressions). In a stage-level condition: block, the correct object is dependencies. When stageDependencies is evaluated in a stage condition, it resolves to empty string, so:

  • eq(stageDependencies..., '')eq('', '')always true
  • contains(stageDependencies..., 'CollectionView')contains('', 'CollectionView')always false

Since the or() has eq(..., '') which is always true, the entire condition always passes — these two expensive stages always run regardless of detected categories. The optimization is silently broken for cv1 and carv1.

Note: the $[ stageDependencies... ] usage in job-level variables: blocks (e.g., line 397) is correct — the bug is specifically in these two stage-level conditions.

Additionally (3/3 reviewers, 🟡 MODERATE): contains() is a substring match in AzDO. A future category like CollectionViewLegacy would false-positive. Use delimiter-wrapped matching for exact element checking.

Suggested fix:

eq(dependencies.discover_ui_test_categories.outputs['detect.DetectCategories.UITestCategoryList'], ''),
contains(
  concat(',', dependencies.discover_ui_test_categories.outputs['detect.DetectCategories.UITestCategoryList'], ','),
  ',CollectionView,'
)

Same for the carv1 stage at line 436-437.

and(
or(
ne(variables['Build.Reason'], 'PullRequest'),
in(dependencies.discover_ui_test_categories.result, 'Succeeded', 'Skipped')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Related to throw-cascading bug above — Consider adding 'Failed' here so that a detection failure falls back to running all tests rather than silently skipping:

in(dependencies.discover_ui_test_categories.result, 'Succeeded', 'Skipped', 'Failed')

This provides defense-in-depth: even if the detect script crashes (unrecognized [Category], network error, git issue), the test pipeline degrades gracefully to the full matrix instead of running zero tests.


$matrixJson = $matrix | ConvertTo-Json -Depth 5

Write-Host "##vso[task.setvariable variable=UITestCategoryMatrix;isOutput=true]$matrixJson"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 MINOR — UITestCategoryMatrix output is set but never consumed (2/3 reviewers)

This variable is generated here and at line 50 (the manual categories path) but no pipeline YAML references it. Only UITestCategoryList is actually consumed via DETECTED_CATEGORIES. If this is scaffolding for a planned dynamic-matrix feature, consider adding a comment; otherwise, removing it reduces debugging confusion.

} else {
$message = "Unrecognized category expression '$rawValue'. Expected formats: UITestCategories.<Name>, nameof(UITestCategories.<Name>), or a quoted string."
Write-Host "##[error]$message"
throw $message
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL — throw on unrecognized category expression can cascade to skip ALL UI tests (2/3 reviewers)

If a test author writes a category in an unexpected format (e.g., [Category(MyConstants.ButtonCat)] or a computed expression), this throw crashes the entire detect task. A failed detect task means dependencies.discover_ui_test_categories.result is 'Failed'.

In ui-tests.yml line 92, the build_ui_tests condition checks in(...result, 'Succeeded', 'Skipped')'Failed' is not in that list, so the entire build + test pipeline is silently skipped for that PR.

A flaky GitHub API call hitting the throw at line 97 (inside Invoke-WithRetry) would cause the same cascade.

Suggested fix — replace throw with warning + continue:

} else {
    Write-Host "##[warning]Skipping unrecognized category expression '$rawValue' — add a handling case if needed."
    continue
}

And/or add 'Failed' to the in() check in ui-tests.yml:92 as defense in depth.

env:
SYSTEM_PULLREQUEST_TARGETBRANCH: $(System.PullRequest.TargetBranch)
MANUAL_PR_NUMBER: ${{ parameters.prNumber }}
MANUAL_CATEGORIES: ${{ parameters.categories }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — No GH_TOKEN mapped; label escape hatch and manual PR mode are unreliable (2/3 reviewers)

The detect script's Get-GitHubHeaders uses $env:GH_TOKEN for authenticated GitHub API calls (label check, manual PR metadata fetch). This env block doesn't map any token, so all GitHub API calls are unauthenticated — limited to 60 req/hour shared across all agents on the same egress IP.

Under heavy CI load, the run-all-uitests label escape hatch silently fails (falls through to auto-detection), and manual -PrNumber mode falls back to running all categories.

Suggested fix:

env:
  SYSTEM_PULLREQUEST_TARGETBRANCH: $(System.PullRequest.TargetBranch)
  MANUAL_PR_NUMBER: $\{\{ parameters.prNumber }}
  MANUAL_CATEGORIES: $\{\{ parameters.categories }}
  GH_TOKEN: $(System.AccessToken)

Comment on lines +59 to +62
condition: |
or(
eq(variables['Build.Reason'], 'PullRequest'),
ne(trim('${{ parameters.prNumber }}'), '')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MODERATE — categories parameter silently ignored when queued without prNumber (2/3 reviewers)

The discover stage only runs for PRs or when prNumber is non-empty. If a user queues with categories="Button" but leaves prNumber empty (and it's not a PR build), the discover stage is skipped entirely. Since the detect script handles categories at lines 40-52, but that code is unreachable when the stage is skipped, the categories parameter is silently ignored and the full matrix runs.

Suggested fix — add categories check to the condition:

condition: |
  or(
    eq(variables['Build.Reason'], 'PullRequest'),
    ne(trim('$\{\{ parameters.prNumber }}'), ''),
    ne(trim('$\{\{ parameters.categories }}'), '')
  )

kubaflo pushed a commit that referenced this pull request Apr 30, 2026


Inline review threads from #35133 (gh-aw bot reviews + 1 human comment).
Key fixes (most-impactful first):

🔴 CRITICAL — Step 0.5 left repo in detached HEAD before Step 1 gate
  Review-PR.ps1: Belt-and-suspenders `git checkout $reviewBranch` after
  Step 0.5 with a $LASTEXITCODE check that warns loudly if restore fails.
  The detect script's own try/finally still does the same restore, but if
  that finally is skipped (process killed, error before outer try opens)
  the gate would diff against the wrong tree.

🟡 MODERATE — silent native-git failures in manual-PR mode
  detect-ui-test-categories.ps1: Added Invoke-Git helper that throws on
  non-zero $LASTEXITCODE and wrapped every native git command in the
  manual-PR block (remote add, fetch x2, update-ref, checkout). Outer
  try/catch falls back to 'run all' on any failure.

🟡 MODERATE — Tier 3 AI categories was dead code in the local workflow
  Review-PR.ps1: After Step 2 writes `ai-categories.md`, re-invoke the
  detect script with `-AiCategories` (passed as a single string so it
  binds to [string]$AiCategories across pwsh -File) and rewrite
  `uitests/content.md` before Step 3 posts the unified comment.

🟡 MODERATE — 'run all' fallback paths never emitted UITestCategoryList
  detect-ui-test-categories.ps1: Centralized output via
  Write-CategoryListOutput helper and emitted explicit empty value at every
  run-all return (matches `eq(...UITestCategoryList, '')` in #35136 YAML).
  Review-PR.ps1: Updated parser regex from `(.+)$` → `(.*)$` so the
  empty marker is observable instead of being indistinguishable from
  'marker not seen'.

🟢 MINOR — nameof() regex branch was dead code
  detect-ui-test-categories.ps1 (both diff scan + file scan): The outer
  capture stops at first `)`, so rawValue for nameof(...) lacks the
  trailing paren. Anchored the inner check with `^…$` (no `)`) so the
  branch matches what the outer regex actually produces.

— pr-preflight.md: Replaced 70-name hardcoded category list with a link
  to UITestCategories.cs + explicit instruction to read it (per human
  reviewer comment — reduces drift risk and shrinks the instruction file).
— Review-PR.ps1: Added clarifying comment for the post-loop
  `if ($isEnvError)` invariant ('all attempts' not 'any attempt').
— detect-ui-test-categories.ps1: Documented Tier 1's per-line scan
  limitation (multi-line attributes are intentionally missed → falls
  through to Tier 2/3 instead of producing a wrong category).
— detect-ui-test-categories.ps1: Documented `-AiCategories` parameter
  as populated by either the AzDO pipeline or Review-PR.ps1's second
  invocation.

Validation:
  All three scripts re-parse cleanly under `pwsh`. Smoke-tested the
  detect script against PR #35133 itself: emitted NONE (no test files
  changed), HEAD restored to `feature/detect-uitest-categories`, no
  stray `_detect_*` remotes left in .git/config. Verified the parser
  regex now captures empty / NONE / specific values correctly. Verified
  Invoke-Git throws on a bogus fetch (exit 128).

Threads already addressed in 3db7ee4 / 118e22e (no action needed):
  — null guards on PR API response (Thread 9)
  — $detectScript variable shadowing (Thread 11)
  — try/finally for HEAD restore in detect script (Thread 17)
  — clearing stale gateContentFile between retries (Thread 21)
  — throw → continue on unrecognized [Category] format (Threads 22, 27)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PureWeen pushed a commit that referenced this pull request Apr 30, 2026
…ability (#35133)

<!-- Please let the below note in for people that find this PR -->
> [!NOTE]
> Are you waiting for the changes in this PR to be merged?
> It would be very helpful if you could [test the resulting
artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from
this PR and let us know in a comment if this change resolves your issue.
Thank you!

> **Depends on #35136** (pipeline category detection — should merge
first)

## What this does

Two things:

### 1. UI test category detection in PR review

During the PR review workflow, Step 0.5 detects which UI test categories
the PR impacts and writes the result to the AI summary comment. This
gives reviewers visibility into which UI tests are relevant.

**Detection** reuses the 3-tier script from #35136 (test attributes →
source paths → AI reasoning).

**AI summary** shows a new 🧪 UI Tests section with detected categories
before the gate section.

### 2. Gate reliability fixes

Multiple fixes to make the gate (`verify-tests-fail.ps1`) more
deterministic:

| Fix | Problem it solves |
|-----|-------------------|
| **Absolute path resolution** | Gate scripts not found on Linux CI
agents (`Resolve-Path`, `GetFullPath`) |
| **File existence check** | Instant cryptic failure when verify script
is missing — now logs clear error |
| **3x retry on ENV ERROR** | Emulator timeouts, ADB failures, app
crashes — transient issues that pass on retry |
| **Strip bad report blocks** | Old verify script produces `Passed:
False` with empty counts — stripped instead of shown |
| **Gate log in fallback** | When report is missing, shows last 20 lines
of gate output instead of just `❌ FAILED / Platform: IOS` |

## Files

| File | Changes |
|------|---------|
| `.github/scripts/Review-PR.ps1` | Step 0.5 category detection + all 5
gate fixes |
| `.github/scripts/post-ai-summary-comment.ps1` | Add `uitests` phase to
render detected categories |
| `.github/pr-review/pr-preflight.md` | Step 7: AI identifies impacted
UI test categories |

## Validation — PR reviewer builds (Apr 26)

10 builds against real PRs — all succeeded ✅. Category detection shown
in AI summary comment.

| PR | Categories Detected | Build | AI Summary |
|----|-------------------|-------|------------|
| #35037 (WebView theme) | `ViewBaseTests,WebView` |
[13940071](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940071)
|
[comment](#35037 (comment))
|
| #35031 (Shell memory leak) | `Shell` |
[13940072](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940072)
|
[comment](#35031 (comment))
|
| #35020 (XAML Hot Reload) | _(none — XAML only)_ |
[13940073](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940073)
| ✅ Shows "No UI test categories" |
| #35008 (Shell SearchHandler) | `Shell` |
[13940074](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940074)
| ✅ |
| #34997 (RadioButton gradient) | `RadioButton,ViewBaseTests` |
[13940075](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940075)
| ✅ |
| #34980 (DatePicker rotation) | `ViewBaseTests` |
[13940076](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940076)
| ✅ |
| #34974 (Picker CharacterSpacing) | `ViewBaseTests` |
[13940077](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940077)
| ✅ |
| #34923 (SwipeView threshold) | `SwipeView,ViewBaseTests` |
[13940078](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940078)
| ✅ |
| #34907 (CollectionView ScrollTo) | `CollectionView` |
[13940079](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940079)
| ✅ |
| #34845 (RefreshView binding) | `RefreshView,ViewBaseTests` |
[13940080](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940080)
| ✅ |

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@kubaflo kubaflo marked this pull request as draft May 13, 2026 20:43
@kubaflo
Copy link
Copy Markdown
Contributor Author

kubaflo commented May 24, 2026

/review -b feature/refactor-copilot-yml

@MauiBot
Copy link
Copy Markdown
Collaborator

MauiBot commented May 24, 2026

⚠️ Merge Conflict Detected — This PR has merge conflicts with its target branch. Please rebase onto the target branch and resolve the conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants