Add UI test category detection to maui-pr-uitests pipeline#35136
Add UI test category detection to maui-pr-uitests pipeline#35136kubaflo wants to merge 3 commits into
Conversation
Adds smart category detection to maui-pr-uitests pipeline. Instead of running all ~24,000 UI tests (~2h+), the pipeline detects which categories a PR touches and only runs those. 3-tier detection: 1. Test file [Category] attributes in the PR diff 2. Source-path-to-category mapping (60+ patterns) 3. AI-provided categories via -AiCategories parameter Pipeline changes: - New Discover stage runs detection before test stages - Per-job filter gate skips non-matching matrix cells - prNumber and categories queue-time parameters for manual trigger - run-all-uitests label escape hatch Validated on 30+ PRs. Targeted runs: 400-4,000 tests in ~30 min vs full matrix: 24,000 tests in 2h+. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35136Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35136" |
PR review integration (depends on #35136 for pipeline changes): - Review-PR.ps1: Step 0.5 detects categories, gate retry on ENV ERROR, absolute path resolution, bad report format stripping - post-uitest-categories-comment.ps1: Rich results with platform table, failure classification, supports -OutputFile for AI summary embedding - trigger-uitest-pipeline.ps1: Orchestrator for detect → queue → monitor - post-ai-summary-comment.ps1: UI Tests section (before gate) - pr-preflight.md: Step 7 AI category identification, Step 8 code review - ci-copilot.yml: DNCENG_PUBLIC_PAT for cross-org build queuing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds category-based UI test discovery so maui-pr-uitests can skip UI test jobs that aren’t relevant to the changes in a PR, reducing overall CI time by avoiding unnecessary test execution.
Changes:
- Added a new PowerShell script to detect impacted UI test categories from PR diffs (test attribute scan + source-path mapping + optional AI input).
- Introduced a “Discover UI Test Categories” stage and propagated detected categories to downstream UI test jobs.
- Added an early per-job gate to skip execution when a job’s category doesn’t match the detected set, plus new queue-time parameters (
prNumber,categories) for manual runs/overrides.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
eng/scripts/detect-ui-test-categories.ps1 |
New detection script that determines UI test categories to run based on PR changes and emits AzDO output variables. |
eng/pipelines/common/ui-tests.yml |
Adds the discover stage and wires detected categories into UI test job variables/conditions. |
eng/pipelines/common/ui-tests-steps.yml |
Adds an early “should this category run?” step and uses its result to gate many subsequent steps. |
eng/pipelines/ci-uitests.yml |
Adds prNumber and categories parameters and passes them through to the shared UI test template. |
| parameters: | ||
| extraCondition: eq(variables['SHOULD_RUN_TESTS'], 'True') |
There was a problem hiding this comment.
enable-kvm.yml is a step template with no parameters: block, but this call site passes a parameters: object. Azure Pipelines will fail template expansion with an unexpected parameter. Either remove the parameters here, or update eng/pipelines/common/enable-kvm.yml to accept a parameter and incorporate it into the step condition.
| parameters: | |
| extraCondition: eq(variables['SHOULD_RUN_TESTS'], 'True') |
| - template: provision.yml | ||
| parameters: | ||
| skipJdk: ${{ ne(parameters.platform, 'android') }} | ||
| skipAndroidCommonSdks: ${{ ne(parameters.platform, 'android') }} | ||
| skipAndroidPlatformApis: true | ||
| onlyAndroidPlatformDefaultApis: true | ||
| skipAndroidEmulatorImages: ${{ ne(parameters.platform, 'android') }} | ||
| skipAndroidCreateAvds: true | ||
| androidEmulatorApiLevel: ${{ parameters.version }} | ||
| skipXcode: ${{ or(eq(parameters.platform, 'android'), eq(parameters.platform, 'windows'), eq(parameters.platform, 'catalyst')) }} | ||
| skipSimulatorSetup: ${{ or(eq(parameters.platform, 'android'), eq(parameters.platform, 'windows'), eq(parameters.platform, 'catalyst')) }} | ||
| provisionatorChannel: ${{ parameters.provisionatorChannel }} | ||
| ${{ if parameters.skipProvisioning }}: | ||
| skipProvisionator: true | ||
| ${{ else }}: | ||
| skipProvisionator: false | ||
| ${{ if eq(parameters.platform, 'catalyst') }}: | ||
| openSslArgs: '' # Do not use legacy openssl for Catalyst builds | ||
|
|
There was a problem hiding this comment.
Even when SHOULD_RUN_TESTS is set to false by the early category check, the provision.yml template is still invoked unconditionally. That means skipped jobs can still spend significant time provisioning (Xcode/SDKs/etc.), undermining the goal of “skip before provisioning”. Consider gating provisioning on SHOULD_RUN_TESTS (e.g., by adding a condition-like parameter to provision.yml that gets applied to its steps, or by moving the filtering decision into job/stage conditions so the job is skipped entirely).
| @{ Pattern = 'src/Controls/src/Core/PointerGesture'; Category = 'Gestures' } | ||
| @{ Pattern = 'src/Controls/src/Core/DragGesture'; Category = 'DragAndDrop' } |
There was a problem hiding this comment.
The path-to-category mapping assigns DragGesture/DropGesture changes to the DragAndDrop category, but many existing drag/drop UI tests are categorized under UITestCategories.Gestures (and will be skipped if only DragAndDrop is detected). To avoid missing relevant test coverage for gesture-related changes, consider also mapping these paths to Gestures (or otherwise ensuring the mapping aligns with how tests are actually categorized).
| @{ Pattern = 'src/Controls/src/Core/PointerGesture'; Category = 'Gestures' } | |
| @{ Pattern = 'src/Controls/src/Core/DragGesture'; Category = 'DragAndDrop' } | |
| @{ Pattern = 'src/Controls/src/Core/PointerGesture'; Category = 'Gestures' } | |
| @{ Pattern = 'src/Controls/src/Core/DragGesture'; Category = 'Gestures' } | |
| @{ Pattern = 'src/Controls/src/Core/DragGesture'; Category = 'DragAndDrop' } | |
| @{ Pattern = 'src/Controls/src/Core/DropGesture'; Category = 'Gestures' } |
| - stage: build_ui_tests | ||
| displayName: Build UITests Sample App | ||
| dependsOn: [] | ||
| dependsOn: | ||
| - discover_ui_test_categories | ||
| condition: | | ||
| or( | ||
| ne(variables['Build.Reason'], 'PullRequest'), | ||
| in(dependencies.discover_ui_test_categories.result, 'Succeeded', 'Skipped') | ||
| ) |
There was a problem hiding this comment.
discover_ui_test_categories can output UITestCategoryList=NONE when a PR is determined to be UI-irrelevant, but the build stages still run (they only gate on the discover stage result, not the detected category list). This can keep consuming build time even for docs-only PRs where the intent is to skip all UI test work. Consider adding a condition to skip the UI test build stages when the detected category list is NONE.
PR review integration (depends on #35136 for pipeline changes): - Review-PR.ps1: Step 0.5 detects categories, gate retry on ENV ERROR, absolute path resolution, bad report format stripping - post-uitest-categories-comment.ps1: Rich results with platform table, failure classification, supports -OutputFile for AI summary embedding - trigger-uitest-pipeline.ps1: Orchestrator for detect → queue → monitor - post-ai-summary-comment.ps1: UI Tests section (before gate) - pr-preflight.md: Step 7 AI category identification, Step 8 code review - ci-copilot.yml: DNCENG_PUBLIC_PAT for cross-org build queuing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR review integration (depends on #35136 for pipeline changes): - Review-PR.ps1: Step 0.5 detects categories, gate retry on ENV ERROR, absolute path resolution, bad report format stripping - post-uitest-categories-comment.ps1: Rich results with platform table, failure classification, supports -OutputFile for AI summary embedding - trigger-uitest-pipeline.ps1: Orchestrator for detect → queue → monitor - post-ai-summary-comment.ps1: UI Tests section (before gate) - pr-preflight.md: Step 7 AI category identification, Step 8 code review - ci-copilot.yml: DNCENG_PUBLIC_PAT for cross-org build queuing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… skip - Remove invalid parameters from enable-kvm.yml call (no params block) - Map DragGesture/DropGesture to both DragAndDrop AND Gestures categories - Skip build stages when UITestCategoryList=NONE (docs-only PRs) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Material3 stage was missing dependsOn: discover_ui_test_categories and DETECTED_CATEGORIES variable, causing filter to not engage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR review integration (depends on #35136 for pipeline changes): - Review-PR.ps1: Step 0.5 detects categories, gate retry on ENV ERROR, absolute path resolution, bad report format stripping - post-uitest-categories-comment.ps1: Rich results with platform table, failure classification, supports -OutputFile for AI summary embedding - trigger-uitest-pipeline.ps1: Orchestrator for detect → queue → monitor - post-ai-summary-comment.ps1: UI Tests section (before gate) - pr-preflight.md: Step 7 AI category identification, Step 8 code review - ci-copilot.yml: DNCENG_PUBLIC_PAT for cross-org build queuing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/review |
1 similar comment
|
/review |
|
✅ Expert Code Review completed successfully! |
There was a problem hiding this comment.
Expert Code Review — PR #35136
Methodology: 3 independent reviewers with adversarial consensus. Findings included only when ≥2/3 reviewers agree on the same root cause, with disputed findings verified via targeted follow-up.
Findings Summary
| # | Severity | Finding | Consensus | File |
|---|---|---|---|---|
| 1 | 🔴 CRITICAL | stageDependencies invalid in stage conditions — cv1/carv1 filtering is a silent no-op |
3/3 reviewers | ui-tests.yml:379-380, 436-437 |
| 2 | 🔴 CRITICAL | throw on unrecognized [Category] cascades to skip ALL UI tests |
2/3 reviewers | detect-ui-test-categories.ps1:296 + ui-tests.yml:92 |
| 3 | 🟡 MODERATE | provision.yml not gated on SHOULD_RUN_TESTS — provisioning still runs for skipped jobs |
3/3 reviewers | ui-tests-steps.yml (existing template call) |
| 4 | 🟡 MODERATE | No GH_TOKEN mapped — label escape hatch unreliable under load |
2/3 reviewers | ui-tests.yml:75-78 |
| 5 | 🟡 MODERATE | build_ui_tests_coreclr, _windows, _material3 stages not gated on NONE |
3/3 reviewers | ui-tests.yml:109-178 |
| 6 | 🟡 MODERATE | categories param silently ignored when queued without prNumber |
2/3 reviewers | ui-tests.yml:59-62 |
| 7 | 🟡 MODERATE | contains() is substring match — fragile for future category names |
3/3 reviewers | ui-tests.yml:380, 437 |
| 8 | 🟢 MINOR | UITestCategoryMatrix output produced but never consumed |
2/3 reviewers | detect-ui-test-categories.ps1:403 |
Details for Non-Inline Findings
#3 — provision.yml not gated (3/3 reviewers, 🟡)
The early check comment says it "runs FIRST to avoid wasting time on provisioning if no tests will run", but provision.yml and enable-kvm.yml template inclusions in ui-tests-steps.yml have no SHOULD_RUN_TESTS condition. When a job determines it should skip, provisioning (Xcode, JDK, Android SDK) still executes — potentially 10+ minutes of wasted compute per skipped job. Consider this as a follow-up improvement.
#5 — Other build stages not NONE-gated (3/3 reviewers, 🟡)
The PR description states "If NONE detected (docs-only, build scripts) — build stages are skipped too", but only build_ui_tests (Mono) has the NONE condition. build_ui_tests_coreclr (line 109), build_ui_tests_nativeaot (line 127), build_ui_tests_windows (line 144), and build_ui_tests_material3 (line 160) all retain dependsOn: [] with no NONE check. For docs-only PRs, these stages still build sample apps (~20 min each) that will never run.
CI & Test Coverage
This PR changes pipeline infrastructure only — no product code or test code is modified. The PR description includes validation runs against 11 builds covering various real PRs, which is the appropriate validation approach for pipeline changes. No unit tests are expected for YAML/PowerShell pipeline scripts.
Verdict: NEEDS_DISCUSSION
Confidence: High
The overall architecture is well-designed — the 3-tier detection, safe fallback to full matrix, escape hatches, and per-job early gating are all sound patterns. However, the two 🔴 CRITICAL findings should be addressed before merge:
- The
stageDependenciesbug in cv1/carv1 conditions means the filtering optimization is silently broken for two of the most expensive test stages — they always run regardless of detected categories. - The
throwon unrecognized category expressions can cascade through thein(..., 'Succeeded', 'Skipped')condition to silently skip the entire UI test suite for a PR.
Both are straightforward fixes (use dependencies instead of stageDependencies; replace throw with continue + add 'Failed' fallback). The 🟡 findings are worth addressing but could be follow-ups.
Generated by Expert Code Review for issue #35136 · ● 24.2M
| eq(stageDependencies.discover_ui_test_categories.detect.outputs['DetectCategories.UITestCategoryList'], ''), | ||
| contains(stageDependencies.discover_ui_test_categories.detect.outputs['DetectCategories.UITestCategoryList'], 'CollectionView') |
There was a problem hiding this comment.
🔴 CRITICAL — stageDependencies is invalid in stage-level conditions; cv1/carv1 filtering is a silent no-op (3/3 reviewers)
stageDependencies is only valid in job-level contexts (job conditions, $[ ] variable expressions). In a stage-level condition: block, the correct object is dependencies. When stageDependencies is evaluated in a stage condition, it resolves to empty string, so:
eq(stageDependencies..., '')→eq('', '')→ alwaystruecontains(stageDependencies..., 'CollectionView')→contains('', 'CollectionView')→ alwaysfalse
Since the or() has eq(..., '') which is always true, the entire condition always passes — these two expensive stages always run regardless of detected categories. The optimization is silently broken for cv1 and carv1.
Note: the $[ stageDependencies... ] usage in job-level variables: blocks (e.g., line 397) is correct — the bug is specifically in these two stage-level conditions.
Additionally (3/3 reviewers, 🟡 MODERATE): contains() is a substring match in AzDO. A future category like CollectionViewLegacy would false-positive. Use delimiter-wrapped matching for exact element checking.
Suggested fix:
eq(dependencies.discover_ui_test_categories.outputs['detect.DetectCategories.UITestCategoryList'], ''),
contains(
concat(',', dependencies.discover_ui_test_categories.outputs['detect.DetectCategories.UITestCategoryList'], ','),
',CollectionView,'
)Same for the carv1 stage at line 436-437.
| and( | ||
| or( | ||
| ne(variables['Build.Reason'], 'PullRequest'), | ||
| in(dependencies.discover_ui_test_categories.result, 'Succeeded', 'Skipped') |
There was a problem hiding this comment.
🔴 Related to throw-cascading bug above — Consider adding 'Failed' here so that a detection failure falls back to running all tests rather than silently skipping:
in(dependencies.discover_ui_test_categories.result, 'Succeeded', 'Skipped', 'Failed')This provides defense-in-depth: even if the detect script crashes (unrecognized [Category], network error, git issue), the test pipeline degrades gracefully to the full matrix instead of running zero tests.
|
|
||
| $matrixJson = $matrix | ConvertTo-Json -Depth 5 | ||
|
|
||
| Write-Host "##vso[task.setvariable variable=UITestCategoryMatrix;isOutput=true]$matrixJson" |
There was a problem hiding this comment.
🟢 MINOR — UITestCategoryMatrix output is set but never consumed (2/3 reviewers)
This variable is generated here and at line 50 (the manual categories path) but no pipeline YAML references it. Only UITestCategoryList is actually consumed via DETECTED_CATEGORIES. If this is scaffolding for a planned dynamic-matrix feature, consider adding a comment; otherwise, removing it reduces debugging confusion.
| } else { | ||
| $message = "Unrecognized category expression '$rawValue'. Expected formats: UITestCategories.<Name>, nameof(UITestCategories.<Name>), or a quoted string." | ||
| Write-Host "##[error]$message" | ||
| throw $message |
There was a problem hiding this comment.
🔴 CRITICAL — throw on unrecognized category expression can cascade to skip ALL UI tests (2/3 reviewers)
If a test author writes a category in an unexpected format (e.g., [Category(MyConstants.ButtonCat)] or a computed expression), this throw crashes the entire detect task. A failed detect task means dependencies.discover_ui_test_categories.result is 'Failed'.
In ui-tests.yml line 92, the build_ui_tests condition checks in(...result, 'Succeeded', 'Skipped') — 'Failed' is not in that list, so the entire build + test pipeline is silently skipped for that PR.
A flaky GitHub API call hitting the throw at line 97 (inside Invoke-WithRetry) would cause the same cascade.
Suggested fix — replace throw with warning + continue:
} else {
Write-Host "##[warning]Skipping unrecognized category expression '$rawValue' — add a handling case if needed."
continue
}And/or add 'Failed' to the in() check in ui-tests.yml:92 as defense in depth.
| env: | ||
| SYSTEM_PULLREQUEST_TARGETBRANCH: $(System.PullRequest.TargetBranch) | ||
| MANUAL_PR_NUMBER: ${{ parameters.prNumber }} | ||
| MANUAL_CATEGORIES: ${{ parameters.categories }} |
There was a problem hiding this comment.
🟡 MODERATE — No GH_TOKEN mapped; label escape hatch and manual PR mode are unreliable (2/3 reviewers)
The detect script's Get-GitHubHeaders uses $env:GH_TOKEN for authenticated GitHub API calls (label check, manual PR metadata fetch). This env block doesn't map any token, so all GitHub API calls are unauthenticated — limited to 60 req/hour shared across all agents on the same egress IP.
Under heavy CI load, the run-all-uitests label escape hatch silently fails (falls through to auto-detection), and manual -PrNumber mode falls back to running all categories.
Suggested fix:
env:
SYSTEM_PULLREQUEST_TARGETBRANCH: $(System.PullRequest.TargetBranch)
MANUAL_PR_NUMBER: $\{\{ parameters.prNumber }}
MANUAL_CATEGORIES: $\{\{ parameters.categories }}
GH_TOKEN: $(System.AccessToken)| condition: | | ||
| or( | ||
| eq(variables['Build.Reason'], 'PullRequest'), | ||
| ne(trim('${{ parameters.prNumber }}'), '') |
There was a problem hiding this comment.
🟡 MODERATE — categories parameter silently ignored when queued without prNumber (2/3 reviewers)
The discover stage only runs for PRs or when prNumber is non-empty. If a user queues with categories="Button" but leaves prNumber empty (and it's not a PR build), the discover stage is skipped entirely. Since the detect script handles categories at lines 40-52, but that code is unreachable when the stage is skipped, the categories parameter is silently ignored and the full matrix runs.
Suggested fix — add categories check to the condition:
condition: |
or(
eq(variables['Build.Reason'], 'PullRequest'),
ne(trim('$\{\{ parameters.prNumber }}'), ''),
ne(trim('$\{\{ parameters.categories }}'), '')
)Inline review threads from #35133 (gh-aw bot reviews + 1 human comment). Key fixes (most-impactful first): 🔴 CRITICAL — Step 0.5 left repo in detached HEAD before Step 1 gate Review-PR.ps1: Belt-and-suspenders `git checkout $reviewBranch` after Step 0.5 with a $LASTEXITCODE check that warns loudly if restore fails. The detect script's own try/finally still does the same restore, but if that finally is skipped (process killed, error before outer try opens) the gate would diff against the wrong tree. 🟡 MODERATE — silent native-git failures in manual-PR mode detect-ui-test-categories.ps1: Added Invoke-Git helper that throws on non-zero $LASTEXITCODE and wrapped every native git command in the manual-PR block (remote add, fetch x2, update-ref, checkout). Outer try/catch falls back to 'run all' on any failure. 🟡 MODERATE — Tier 3 AI categories was dead code in the local workflow Review-PR.ps1: After Step 2 writes `ai-categories.md`, re-invoke the detect script with `-AiCategories` (passed as a single string so it binds to [string]$AiCategories across pwsh -File) and rewrite `uitests/content.md` before Step 3 posts the unified comment. 🟡 MODERATE — 'run all' fallback paths never emitted UITestCategoryList detect-ui-test-categories.ps1: Centralized output via Write-CategoryListOutput helper and emitted explicit empty value at every run-all return (matches `eq(...UITestCategoryList, '')` in #35136 YAML). Review-PR.ps1: Updated parser regex from `(.+)$` → `(.*)$` so the empty marker is observable instead of being indistinguishable from 'marker not seen'. 🟢 MINOR — nameof() regex branch was dead code detect-ui-test-categories.ps1 (both diff scan + file scan): The outer capture stops at first `)`, so rawValue for nameof(...) lacks the trailing paren. Anchored the inner check with `^…$` (no `)`) so the branch matches what the outer regex actually produces. — pr-preflight.md: Replaced 70-name hardcoded category list with a link to UITestCategories.cs + explicit instruction to read it (per human reviewer comment — reduces drift risk and shrinks the instruction file). — Review-PR.ps1: Added clarifying comment for the post-loop `if ($isEnvError)` invariant ('all attempts' not 'any attempt'). — detect-ui-test-categories.ps1: Documented Tier 1's per-line scan limitation (multi-line attributes are intentionally missed → falls through to Tier 2/3 instead of producing a wrong category). — detect-ui-test-categories.ps1: Documented `-AiCategories` parameter as populated by either the AzDO pipeline or Review-PR.ps1's second invocation. Validation: All three scripts re-parse cleanly under `pwsh`. Smoke-tested the detect script against PR #35133 itself: emitted NONE (no test files changed), HEAD restored to `feature/detect-uitest-categories`, no stray `_detect_*` remotes left in .git/config. Verified the parser regex now captures empty / NONE / specific values correctly. Verified Invoke-Git throws on a bogus fetch (exit 128). Threads already addressed in 3db7ee4 / 118e22e (no action needed): — null guards on PR API response (Thread 9) — $detectScript variable shadowing (Thread 11) — try/finally for HEAD restore in detect script (Thread 17) — clearing stale gateContentFile between retries (Thread 21) — throw → continue on unrecognized [Category] format (Threads 22, 27) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ability (#35133) <!-- Please let the below note in for people that find this PR --> > [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could [test the resulting artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from this PR and let us know in a comment if this change resolves your issue. Thank you! > **Depends on #35136** (pipeline category detection — should merge first) ## What this does Two things: ### 1. UI test category detection in PR review During the PR review workflow, Step 0.5 detects which UI test categories the PR impacts and writes the result to the AI summary comment. This gives reviewers visibility into which UI tests are relevant. **Detection** reuses the 3-tier script from #35136 (test attributes → source paths → AI reasoning). **AI summary** shows a new 🧪 UI Tests section with detected categories before the gate section. ### 2. Gate reliability fixes Multiple fixes to make the gate (`verify-tests-fail.ps1`) more deterministic: | Fix | Problem it solves | |-----|-------------------| | **Absolute path resolution** | Gate scripts not found on Linux CI agents (`Resolve-Path`, `GetFullPath`) | | **File existence check** | Instant cryptic failure when verify script is missing — now logs clear error | | **3x retry on ENV ERROR** | Emulator timeouts, ADB failures, app crashes — transient issues that pass on retry | | **Strip bad report blocks** | Old verify script produces `Passed: False` with empty counts — stripped instead of shown | | **Gate log in fallback** | When report is missing, shows last 20 lines of gate output instead of just `❌ FAILED / Platform: IOS` | ## Files | File | Changes | |------|---------| | `.github/scripts/Review-PR.ps1` | Step 0.5 category detection + all 5 gate fixes | | `.github/scripts/post-ai-summary-comment.ps1` | Add `uitests` phase to render detected categories | | `.github/pr-review/pr-preflight.md` | Step 7: AI identifies impacted UI test categories | ## Validation — PR reviewer builds (Apr 26) 10 builds against real PRs — all succeeded ✅. Category detection shown in AI summary comment. | PR | Categories Detected | Build | AI Summary | |----|-------------------|-------|------------| | #35037 (WebView theme) | `ViewBaseTests,WebView` | [13940071](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940071) | [comment](#35037 (comment)) | | #35031 (Shell memory leak) | `Shell` | [13940072](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940072) | [comment](#35031 (comment)) | | #35020 (XAML Hot Reload) | _(none — XAML only)_ | [13940073](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940073) | ✅ Shows "No UI test categories" | | #35008 (Shell SearchHandler) | `Shell` | [13940074](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940074) | ✅ | | #34997 (RadioButton gradient) | `RadioButton,ViewBaseTests` | [13940075](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940075) | ✅ | | #34980 (DatePicker rotation) | `ViewBaseTests` | [13940076](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940076) | ✅ | | #34974 (Picker CharacterSpacing) | `ViewBaseTests` | [13940077](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940077) | ✅ | | #34923 (SwipeView threshold) | `SwipeView,ViewBaseTests` | [13940078](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940078) | ✅ | | #34907 (CollectionView ScrollTo) | `CollectionView` | [13940079](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940079) | ✅ | | #34845 (RefreshView binding) | `RefreshView,ViewBaseTests` | [13940080](https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=13940080) | ✅ | --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/review -b feature/refactor-copilot-yml |
|
|
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
What this does
Before: Every PR runs ~24,000 UI tests across all categories (~2+ hours).
After: The pipeline detects which UI test categories a PR actually touches and only runs those. A PR that changes
Buttoncode only runs Button tests (~400 tests, ~30 min). Docs-only PRs skip UI tests entirely.How it works
[Category(UITestCategories.X)]in changed/existing test files (supports combined attributes like[Category(X), Order(1)])Shell/→ Shell,Button*→ Button,DragGesture*→ Gestures + DragAndDrop)-AiCategoriesparameter for external inputNONEdetected (docs-only, build scripts) — build stages are skipped toosrc/Controls/changed — runs full matrix (safe default)Escape hatches
run-all-uitestslabel to any PR → forces full matrixcategoriesparameter → run specific categoriesprNumberparameter → test detection against any PRFiles
eng/scripts/detect-ui-test-categories.ps1eng/pipelines/common/ui-tests.ymleng/pipelines/common/ui-tests-steps.ymleng/pipelines/ci-uitests.ymlprNumber+categoriesqueue-time parametersValidation runs
Latest batch (Apr 26) — 6 builds, all succeeded ✅
Each build correctly detected categories from the PR diff. Non-matching jobs started but skipped all test steps in ~2min (vs 60-100min for matching jobs).
ShellRadioButton,ViewBaseTestsViewBaseTestsViewBaseTestsSwipeView,ViewBaseTestsRefreshView,ViewBaseTestsEarlier batch — 5 builds
ShellShellSwipeView,ViewBaseTestsViewBaseTests,WebViewSafeAreaEdges,ViewBaseTestsCategory detection accuracy (30+ PRs validated)