[CI] Refactor ci-copilot pipeline: scope env vars per task#35324
Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.sh | bash -s -- 35324Or
iex "& { $(irm https://raw.githubusercontent.com/dotnet/maui/main/eng/scripts/get-maui-pr.ps1) } 35324" |
f910161 to
8267986
Compare
kubaflo
left a comment
There was a problem hiding this comment.
Note: posting as a formal review since the previous two issue comments with the same content were deleted shortly after posting. HEAD is unchanged at
82679867since 10:09:19Z (no new commits, identical blob SHAs9f8a1fb…/698b0d3…), so the findings below are unchanged from my prior comments — included again here for completeness so the PR has a durable record of the analysis.
The token-scoping refactor is the right shape: per-task env: blocks, two distinct trusted phases (Setup/Post), an untrusted Gate with no env, and a CopilotReview phase that gets only COPILOT_GITHUB_TOKEN. persistCredentials: false, removal of the gh auth login step, and --secret-env-vars=GH_TOKEN,COPILOT_GITHUB_TOKEN,GITHUB_TOKEN are all good belt-and-braces moves. The four issues below are the ones I'd want fixed before this lands.
🔴 1. trusted-scripts/ is writable by PR code
eng/pipelines/ci-copilot.yml 567-572 (Setup, Task 1):
TRUSTED="$(Agent.TempDirectory)/trusted-scripts"
mkdir -p "$TRUSTED"
cp -r .github/scripts "$TRUSTED/scripts"
cp -r .github/skills "$TRUSTED/skills"
cp -r eng/scripts "$TRUSTED/eng-scripts"Agent.TempDirectory is owned by the agent user, and nothing makes the copies read-only. Task 2 (Gate) runs dotnet build / dotnet test on PR-controlled code and can chmod +w and overwrite anything under $TRUSTED/ — for example scripts/post-ai-summary-comment.ps1 or eng-scripts/detect-ui-test-categories.ps1. Task 4 (Post) then re-invokes Review-PR.ps1 -Phase Post from that directory with GH_TOKEN (line 704) and runs the modified script.
So an attacker PR with a dotnet test payload that rewrites $AGENT_TEMPDIRECTORY/trusted-scripts/... re-acquires GH_COMMENT_TOKEN in Task 4 — defeating the whole partition.
Fix: lock the directory after copy.
cp -r .github/scripts "$TRUSTED/scripts"
cp -r .github/skills "$TRUSTED/skills"
cp -r eng/scripts "$TRUSTED/eng-scripts"
chmod -R a-w "$TRUSTED" # at minimum
# or better: own as a different user (e.g. root via sudo) and chmod 0555Even better, copy into a path outside the agent user's writable space (e.g. /opt/maui-ci-trusted/$(Build.BuildId), root-owned via the image bootstrap).
🔴 2. $gateOutputDir referenced in Setup-only path → Join-Path $null crash
.github/scripts/Review-PR.ps1 306-314:
if ($Phase -eq 'Setup') {
$sentinelDir = if ($TrustedScriptsDir) { Split-Path $TrustedScriptsDir -Parent } else { $gateOutputDir }
"OK" | Set-Content (Join-Path $sentinelDir "setup-complete") -Encoding UTF8
...
exit 0
}$gateOutputDir is only defined at line 575, inside if ($runGate) { ... }. In CI this happens to work because -TrustedScriptsDir is always passed, so the first branch is taken. But the comment block at lines 91-93 explicitly advertises a "no -Phase for backward-compat for local development" mode, and pwsh Review-PR.ps1 -Phase Setup -PRNumber 1 (no trusted dir) hits the else branch and resolves $gateOutputDir to $null, then Join-Path $null "setup-complete" throws.
Fix: make the fallback well-defined, e.g.
$sentinelDir = if ($TrustedScriptsDir) {
Split-Path $TrustedScriptsDir -Parent
} else {
Join-Path $RepoRoot "CustomAgentLogsTmp/PRState/$PRNumber/PRAgent"
}
New-Item -ItemType Directory -Force -Path $sentinelDir | Out-NullSame pattern would help line 666 / 786 where $gateVerdictDir has the same fallback.
🟠 3. Tier 3 category refresh deleted, never reinstated
Review-PR.ps1 829-831:
# NOTE: Tier 3 category refresh (re-running detect-ui-test-categories.ps1 with AI
# categories from pre-flight) has been moved to the Post phase so that PR-controlled
# scripts don't run in a task that has COPILOT_GITHUB_TOKEN in scope.The note is correct about why it was removed from CopilotReview, but the refresh code was never actually added to the if ($runPost) { ... } block (lines 840-897) — only the AI-Summary post and label application live there. Net effect: when the agent writes ai-categories.md, nothing reads it back, and **Detected UI test categories** in the AI Summary comment will reflect only Tier-1/2 (filename heuristics), missing the AI tier on every PR.
Either (a) re-add the refresh block at the top of if ($runPost), reading $aiCategoriesFile and re-running detect-ui-test-categories.ps1 -AiCategories ..., or (b) drop the misleading NOTE and document that Tier 3 is intentionally gone.
🟠 4. ##vso[] filter is bypassable
ci-copilot.yml 615 / 645:
... | sed 's/^##vso\[/##_vso[/'The Azure DevOps agent matches logging commands with surprisingly forgiving rules — leading whitespace, leading \r (common in Windows-encoded output and ANSI-colored streams), and embedded ANSI escapes can all let a ##vso[...] slip past ^##vso\[. Examples that bypass this regex but are still parsed by the agent on at least some configurations:
\r##vso[task.setvariable variable=GH_TOKEN;isSecret=false]leak
##vso[task.prependpath]/tmp/evil
\e[0m##vso[task.logissue type=error]anything
Tighter version:
... | tr -d '\r' | sed -E 's/^[[:space:]]*##vso\[/##_vso[/'(Strip \r first, then anchor after optional whitespace.) Worth adding to Setup and Post too — both echo PR-derived content (Setup runs the merge, Post echoes gate/content.md and friends, both written by Gate/CopilotReview which were untrusted).
🟡 Smaller things
-
Setup failure → no PR comment, just a red X. If Task 1 throws (merge conflict, gh transient failure), the sentinel isn't written, Tasks 3 and 4
exit 0silently (lines 628-637, 675-678), and the PR author sees "maui-pr failed" with no AI Summary comment to explain. The job result is at least red because of Task 1, so this is UX, not security — but consider posting a "Setup failed: $reason" comment from a small failure-only step that still hasGH_COMMENT_TOKEN. -
Cross-phase state via working tree.
gate-result.txtis persisted toSplit-Path $TrustedScriptsDir -Parent(good — outside the working tree), butai-categories.mdandgate/content.mdlive underCustomAgentLogsTmp/PRState/...in the working tree, and Gate runs PR code that can rewrite either. Post should treat anything it reads from there as untrusted (it's posted verbatim into a PR comment, so XSS-in-Markdown / spoofed gate verdict are both reachable). Worth a sanitization pass when reading. -
cp -r CustomAgentLogsTmp $(Build.ArtifactStagingDirectory)/copilot-logs/in Task 4. The comment "exclude .copilot session state — may contain auth metadata" is spot on for~/.copilot, but the same caveat applies toCustomAgentLogsTmp/— Copilot's intermediate output files there can include tool transcripts. If they're meant to be public artifacts, fine; if not, strip them before publish. -
Setup also has
GH_TOKENand runsgit mergeon PR code. Out of scope for this PR, but worth noting: a malicious.gitattributeswith amerge=driver pointing at a script in the PR can run that script during merge withGH_TOKENin env. Mitigations:git -c core.attributesFile=/dev/null merge ...orgit config merge.<driver>.driver falsebefore the merge.
🟢 What I really like
- Per-task
env:partition with# env: deliberately emptyon Gate. Crystal clear, easy to audit, easy to break the wrong way and notice in review. persistCredentials: false+ removal ofgh auth login(no~/.config/gh/hosts.ymlleft on disk for dotnet subprocesses to read).--secret-env-vars=...on thecopilotinvocation as defense-in-depth even after the env: scoping.- Sentinel-based gating between tasks is the right pattern for "did Setup succeed?" (just needs the not-silent-on-failure tweak above).
- Phase routing in PowerShell with backward-compat for local runs — the
$run<Phase>flags read very nicely.
Summary
| # | Severity | Where | Issue |
|---|---|---|---|
| 1 | 🔴 High | ci-copilot.yml 567-572 |
Trusted scripts dir writable by Gate (PR code) → Post task re-acquires GH_TOKEN |
| 2 | 🔴 High | Review-PR.ps1 309 |
$gateOutputDir undefined in -Phase Setup local mode → Join-Path $null |
| 3 | 🟠 Med | Review-PR.ps1 829 / runPost block |
Tier 3 AI category refresh deleted with "moved to Post" comment, never re-added |
| 4 | 🟠 Med | ci-copilot.yml 615/645 |
##vso[] filter regex bypassable by \r, leading whitespace, ANSI escapes |
| 5 | 🟡 Low | ci-copilot.yml 628-637, 675-678 |
Setup failure → silent skip of Post → no PR comment explaining |
| 6 | 🟡 Low | Post phase | Treat working-tree content written by Gate/CopilotReview as untrusted |
| 7 | 🟡 Low | Setup phase | git merge of PR code with GH_TOKEN in env — mitigate .gitattributes merge drivers |
Happy to elaborate on any of these. The overall direction is great — the env-scoping refactor was overdue.
PureWeen
left a comment
There was a problem hiding this comment.
🔍 Adversarial PR Review
Methodology: 3 independent reviewers with adversarial consensus
CI: ✅ 30/30 checks passing
Summary
| # | Sev | Consensus | Category | Finding |
|---|---|---|---|---|
| 1 | ❌ | 2/3 | Security | trusted-scripts/ writable → Post runs tampered scripts with GH_TOKEN |
| 2 | ❌ | 3/3 | Security | gate-result.txt spoofable by PR test code |
| 3 | ❌ | 3/3 | Logic | $gateOutputDir undefined in Setup without -TrustedScriptsDir |
| 4 | ❌ | 3/3 | Regression | Tier 3 category refresh deleted, never reinstated |
| 5 | ❌ | Verified | Logic | eng-scripts path off by one level → category detection silently skipped |
| 6 | 3/3 | Security | ##vso[] filter bypassable |
|
| 7 | 2/3 | Config Impact | PRNumber type change is breaking |
Verdict: 5 blocking issues, 2 warnings. The security architecture (per-task env scoping) is the right direction, but the implementation has gaps that undermine the security boundary — particularly #1 and #2 which allow Gate-phase PR code to tamper with files that Post executes with GH_TOKEN.
⚠️ Author's Claimed Fixes Not Present
The author comment states all 4 issues from kubaflo's review are "addressed in current HEAD (8267986)." All 3 reviewers independently confirmed none of the 4 fixes are in the code on disk. The commit hash is unchanged.
Discarded Findings (1/3, no consensus)
setup-completesentinel writable by Gate (single reviewer)$testDetectScriptuses$PSScriptRoot(single reviewer)- Post applies gate-failed label when gate never ran (single reviewer)
$reviewBranchcheckout warns but continues (single reviewer)post-ai-summary-comment.ps1called twice (single reviewer — intentional idempotent design)
8267986 to
d4d7bdb
Compare
|
Addressing all findings from both reviews. Current HEAD: d4d7bdb.
Waived items (not in scope for this PR):
|
There was a problem hiding this comment.
Pull request overview
This PR refactors the ci-copilot Azure DevOps pipeline to reduce secret exposure by splitting the previous single “Run PR Reviewer Agent” step into distinct phases with tightly scoped env: blocks, and updates Review-PR.ps1 to support per-phase invocation via a new -Phase parameter (plus -TrustedScriptsDir) so trusted scripts can be copied before merging PR code.
Changes:
- Split the pipeline into 4 tasks (Setup / Gate / CopilotReview / Post) with scoped environment variables and a “trusted-scripts” copy staged in
$(Agent.TempDirectory). - Add phase routing to
.github/scripts/Review-PR.ps1, including log file suffixing per phase and cross-task state handoff via sentinel/result files. - Harden Copilot invocation by passing
--secret-env-varsand move gate verdict persistence to$(Agent.TempDirectory)to avoid PR-writeable state.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| eng/pipelines/ci-copilot.yml | Splits the monolithic reviewer step into 4 tasks with scoped secrets and trusted-script staging. |
| .github/scripts/Review-PR.ps1 | Adds -Phase/-TrustedScriptsDir to enable per-task execution and cross-task state sharing. |
Comments suppressed due to low confidence (1)
eng/pipelines/ci-copilot.yml:88
- Since PRNumber is now a numeric parameter, the current validation only checks for an empty string. It will accept 0 (or other invalid values) and proceed to run against a non-existent PR. Consider validating PRNumber is > 0 (and possibly an integer) before continuing.
- bash: |
echo "Validating PR Number parameter..."
if [ -z "${{ parameters.PRNumber }}" ]; then
echo "##vso[task.logissue type=error]PRNumber parameter is required"
exit 1
fi
Split the monolithic 'Run PR Reviewer Agent' bash task into 4 sequential tasks, each with exactly the env vars it needs: Task 1 (Setup): GH_TOKEN only — branch checkout, PR merge Task 2 (Gate): NO tokens — dotnet build/test, gate verification Task 3 (CopilotReview): COPILOT_GITHUB_TOKEN — expert review + try-fix Task 4 (Post): GH_TOKEN only — comments, labels, summary Review-PR.ps1 gains -Phase (Setup|Gate|CopilotReview|Post) and -TrustedScriptsDir parameters so each pipeline task invokes a single phase. Backward-compatible: omitting -Phase runs all steps sequentially. Security improvements: - persistCredentials: false (credentials no longer available to all tasks) - Removed gh auth login step (GH_TOKEN used directly as env var) - --secret-env-vars strips tokens from copilot subprocess environments - Trusted scripts copied once in Setup, reused by all phases - PRNumber type changed to 'number' for AzDO parameter validation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
cc4e511 to
7d87222
Compare
Code Review — PR #35324Independent AssessmentWhat this changes: Splits the monolithic CI pipeline task into 4 sequential tasks with scoped environment variables. Each task calls Inferred motivation: Security hardening — the Gate phase (dotnet build/test) and CopilotReview phase previously had access to Findings
|
- Persist regression risks, tests, and platform to files in Gate phase - Restore regression data + detect script path in CopilotReview phase - Fix stale RunReview references in comments (now RunGate/RunPost) - Fix misleading RunPost step name comment in ci-copilot.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Tier 3 AI refresh in CopilotReview phase emits detectedCategories under step RunReview, but downstream RunDeepUITests was only reading RunGate. Use coalesce() so AI-refreshed categories are preferred when available, falling back to Gate-detected categories otherwise. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add TrustedScriptsDir null guard with local fallback in both CopilotReview and Post phase restoration blocks (prevents ParameterBindingException when running locally with -Phase) - Add setup-complete sentinel verification before Gate/CopilotReview/Post phases to fail fast with clear error if Setup didn't complete Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Step 3 (Run Detected UI Tests) was running BuildAndRunHostApp.ps1 per detected category inside the ReviewPR stage, duplicating the same work that RunDeepUITests (Stage 2) does. This caused UI tests to run twice. Now the ReviewPR stage only runs targeted PR-specific tests via the Gate (verify-tests-fail.ps1), and full-category runs happen exclusively in the RunDeepUITests stage. Renumbered steps: old 4→3, 5→4, 6→5, 7→6, 8→7. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Detect category-wide fixture setup failures in deep UI TRX aggregation and render them as setup failures instead of duplicated failed tests. Keep the deep UI task successful so the summary comment stage can publish the artifact-backed result. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Skip rendering UI test and regression cross-reference sections when their content only reports no actionable work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
PureWeen
left a comment
There was a problem hiding this comment.
Adversarial multi-model review — Round 2
Methodology: 3 independent reviewers with adversarial consensus. Every finding below was verified against the source at head bbd60558. Consensus counts shown per finding. This is a COMMENT review, not an approval or change-request.
The author addressed all findings from the prior round — these are new findings against the per-task refactor and the new Remove-StaleMauiBotComments.ps1 cleanup script.
❌/⚠️ Findings
1.
gate-result.txt, detect-script-path.txt, regression-*.json, and uitest-categories.txt are written to Split-Path $TrustedScriptsDir -Parent, i.e. $(Build.ArtifactStagingDirectory) — the writable parent of the chmod -R a-w trusted-github/ directory (Review-PR.ps1:1258-1288). Three consequences:
detect-script-path.txtis read back and executed in CopilotReview (& pwsh -NoProfile -File $detectScript ..., line 1542). A tampered path runs withCOPILOT_GITHUB_TOKEN. TheTest-Path $detectScriptguard (line 1535) checks existence, not trust.gate-result.txtcould be forged toPASSED, yielding thes/agent-gate-passedlabel.- A missing
gate-result.txtfails open toSKIPPED(lines 1309, 1594) rather than failing closed.
Fix: persist cross-phase state into a dedicated subdirectory that is also chmod a-w after writing; re-derive the detect script from $TrustedScriptsDir/eng-scripts/... instead of persisting and executing a stored path (or validate the restored path starts with $TrustedScriptsDir); treat a missing gate result as a failure, not SKIPPED.
2. ##vso[...] is not stripped from PR-controlled stdout (2/3)
The new security doc (rule 6, line 28) and its checklist (line 41) require piping PR-controlled output through tr -d '\r' | sed -E 's/##vso\[[^]]*\]//g', but no such stripping exists anywhere in the pipeline. Gate and CopilotReview run dotnet build/dotnet test, whose stdout flows unfiltered to Azure Pipelines, which will honor an injected ##vso[task.setvariable variable=GateFailed]false (or similar) from PR test/build output.
Fix: pipe the inner PR-controlled subprocess output through the documented stripper, keeping the script's own ##vso emissions outside the pipe.
3.
Remove-StaleMauiBotComments.ps1:92-94: $matchesGeneratedMarker deletes any comment whose body contains <!-- AI Summary --> or <!-- AI Gate -->, regardless of author — unlike $matchesBotOnlyContent (96-100), which is gated by Test-IsMauiBotCommentAuthor. A maintainer who quotes the marker in their own comment would have it deleted on the next run.
Fix: AND the marker check with Test-IsMauiBotCommentAuthor, mirroring the bot-only-content path.
4. gh api --paginate | ConvertFrom-Json breaks past 100 items (1/3, reproduced)
Remove-StaleMauiBotComments.ps1:52,58,128,134 and post-ai-summary-comment.ps1. gh api --paginate concatenates one JSON array per page ([...][...]), which ConvertFrom-Json rejects (Additional text encountered after finished reading JSON content). The catch returns @(), so stale-comment/review cleanup silently no-ops on PRs with >100 comments or reviews. It fails safe (nothing wrongly deleted) but does not perform its function at scale.
Fix: use gh api --paginate --slurp (merges pages into one array) or --jq '.[]' and re-wrap.
5. 💡 Testing — Safety-critical new scripts are under-tested (2/3)
Remove-StaleMauiBotComments.ps1 deletes comments and dismisses reviews but has no Pester coverage; the new test files don't cover paginated parsing or missing/corrupt cross-phase state files. Suggest tests for: non-bot comments with the marker are NOT deleted (after fix #3), missing gate-result.txt behavior, and multi-page API responses.
Minor (single-reviewer, verified, non-blocking): Setup invokes Review-PR.ps1 from the worktree path rather than $TRUSTED (ci-copilot.yml:628; pre-merge, so low risk but inconsistent with the trusted-copy model); Setup uses set +e with no exit, so a Setup failure doesn't short-circuit later tasks (it surfaces later via CopilotFailed, but downstream phases still run on a possibly-bad state); Detect-TestsInDiff.ps1 declares [string]$PRNumber while callers now pass [int] (auto-converts; cosmetic).
CI status
maui-pr (build / pack / integration tests) is green. However maui-pr-uitests (Android Controls API 30 CollectionView) is failing — this looks like an unrelated flake, as the PR does not touch CollectionView — and Skill Validation is failing (LLM evaluation failed), which may be related since this PR adds a new instructions doc. Not a clean/mergeable verdict until those are understood and green.
Test Failure Review
|
| Failure | Verdict | Evidence |
|---|---|---|
Skill Validation check |
Needs human investigation | The Check Run at 78328354172 failed with output title LLM evaluation failed and links to Actions run 26584935471 for head bbd6055. That run evaluated verify-tests-fail-without-fix, and this PR changes .github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1; however the gathered context does not include the detailed LLM evaluation text needed to identify the exact assertion. |
maui-pr-uitests / Android Controls (API 30) CollectionView |
Likely unrelated | AzDO build 1442480 failed on refs/pull/35324/merge in maui-pr-uitests, with failed job/logs for Controls (API 30) CollectionView (logId 4315/4325). The logs contain 23 deduplicated CollectionView UI failures, mostly System.TimeoutException : Timed out waiting for element..., plus one VerifyGroupItemScrollToByIndexWithEndPositionAndHorizontalGrid_Potato expected-value failure. The PR scope is 10 .github/eng/pipelines/ci-copilot.yml automation files, with no inferred platform, Android, Controls, or CollectionView changes; recent base build 1444191 for the same definition also failed on main. |
Build Analysis check |
Insufficient data | The check failed, but the gathered context only has the generic Build Analysis documentation URL and no build-specific known-issue match, failure title, or log excerpt. It cannot be responsibly classified beyond noting that the PR also has a failed maui-pr-uitests build. |
Optional authenticated AzDO _apis/test records |
Likely unrelated | Authenticated test-result enrichment attached 23 deduplicated non-MAUI failures from run names such as coreclr Linux x64 Checked jitstress1, coreclr windows ..., and Microsoft.CodeAnalysis.... These do not match the maui-pr-uitests Android CollectionView job or the PR's changed .github/CI files, so they should not be counted as PR-caused MAUI failures. |
Recommended action
Inspect the Skill Validation LLM-evaluation result for the changed verify-tests-fail-without-fix skill/script; treat the Android CollectionView UI-test failures and unrelated Roslyn/CoreCLR enrichment as non-blocking for this PR unless a rerun reproduces a failure tied to the PR scope.
Evidence details
PR scope from the gathered context:
- PR [CI] Refactor ci-copilot pipeline: scope env vars per task #35324:
[CI] Refactor ci-copilot pipeline: scope env vars per task - Head:
feature/refactor-copilot-ymlatbbd6055 - Labels:
s/agent-reviewed,area-ai-agents - Changed files:
.github/instructions/ci-copilot-pipeline-security.instructions.md,.github/scripts/Post-AISummaryComment.Tests.ps1,.github/scripts/Review-PR.ps1,.github/scripts/post-ai-summary-comment.ps1,.github/scripts/shared/Aggregate-UITestArtifacts.Tests.ps1,.github/scripts/shared/Aggregate-UITestArtifacts.ps1,.github/scripts/shared/Get-AggregatedTrxFromDirectory.ps1,.github/scripts/shared/Remove-StaleMauiBotComments.ps1,.github/skills/verify-tests-fail-without-fix/scripts/verify-tests-fail.ps1,eng/pipelines/ci-copilot.yml - Inferred platforms from files: none
Failing checks in context:
Skill Validation: completed/failure, details https://github.com/dotnet/maui/runs/78328354172maui-pr-uitests: completed/failure, build 1442480maui-pr-uitests (Android UITests Controls (API 30) CollectionView): completed/failure, job URL Android CollectionView logsBuild Analysis: completed/failure, details URL points to generic documentation
AzDO build evidence:
- Build 1442480:
maui-pr-uitests(definition 313),failed / completed, branchrefs/pull/35324/merge, source version6f90925459a6651dca5258480cbae7ebb499b6b6 - Failed timeline records include
Android UITests, phaseControls (API 30), jobControls (API 30) CollectionView, taskControls (API 30) CollectionView, andPublish the android_ui_tests_controls_30 test results - The CollectionView log failures reference
Microsoft.Maui.TestCases.Tests.CollectionView_GroupingFeatureTestsandUITest.Appiumwaits/teardown, not changed PR files - Recent base-branch builds for the same definition include 1444191 failed on
refs/heads/mainat 2026-06-02T02:29:23Z, plus older failed main builds 1431731, 1419912, and 1419503
Limitations:
- The local gatherer used authenticated Azure CLI AzDO access. The workflow normally relies on public build/timeline/log APIs unless
AZDO_TOKENis present. - The gathered context has no detailed Build Analysis result payload and no detailed
Skill ValidationLLM-evaluation assertion text. - Helix aggregate data was not checked (
helix.checked=false), but the observed actionable failure is frommaui-pr-uitests, notmaui-pr-devicetests. - Authenticated
_apis/testresults are optional enrichment; in this context they include unrelated Roslyn/CoreCLR test runs and were not used to infer MAUI PR causality.
…, fail-closed Original PR dotnet#35324 review #4403922251 (3/3 reviewers) flagged: - gate-result.txt etc. landed in 'Split-Path \$TrustedScriptsDir -Parent' which is the writable Build.ArtifactStagingDirectory - any later task could forge 'PASSED' between Gate exit and CopilotReview restore. - detect-script-path.txt was persisted then *executed* under COPILOT_GITHUB_TOKEN. Test-Path checked existence, not trust - a tampered path is direct RCE. - Missing gate-result.txt defaulted to 'SKIPPED' (fail open). This commit: 1. Adds Get-PhaseStateDir returning a dedicated 'phase-state' sibling of trusted-github. Used at all 5 sites (setup sentinel R/W, gate write, CopilotReview/Post restore). 2. Drops detect-script-path.txt persist+restore entirely. CopilotReview re-derives `Join-Path \$EngScriptsDir 'detect-ui-test-categories.ps1'` (same expression Gate uses). 3. Adds Restore-GateResultOrFailClosed which throws on missing / non-whitelisted verdict. Replaces both fail-open SKIPPED defaults. 4. ci-copilot.yml Gate task chmods the state dir read-only after the pwsh call completes (belt-and-braces; Linux/Mac). 5. Adds 6 Pester tests for the new helpers covering: path safety (not the writable parent), idempotency, local-dev fallback path, fail-closed on missing/empty/garbage, trim-after-newline. Tests: Review-PR.Tests.ps1 22/22 green.
Original PR dotnet#35324 review #4403922251 (2/3 reviewers, opus-4.7x noted severity is reduced because the cleanup now MINIMIZES rather than deletes — but UX is still broken: any maintainer who quotes `<!-- AI Summary -->` or `<!-- AI Gate -->` in their own comment gets silently collapsed on the next cleanup pass). Before: $matchesGeneratedMarker = ($IncludeAISummary -and (Test-IsAISummaryCommentBody $body)) -or ($IncludeLegacyGate -and $body.Contains($script:AiGateCommentMarker)) After (mirrors the $matchesBotOnlyContent shape just below): $matchesGeneratedMarker = (Test-IsMauiBotCommentAuthor $comment) -and ( ($IncludeAISummary -and (Test-IsAISummaryCommentBody $body)) -or ($IncludeLegacyGate -and $body.Contains($script:AiGateCommentMarker)) ) Tests (Remove-StaleMauiBotComments.Tests.ps1, +5 cases, 10/10 green): - bot comment with AI Summary marker IS hidden - contributor comment quoting AI Summary marker is NOT hidden - contributor comment quoting AI Gate marker is NOT hidden - bot comment with AI Gate marker IS hidden - mixed batch: bot hidden, contributor preserved
Original PR dotnet#35324 review #4403922251 minor findings: M1 (3/3) - Setup invoked Review-PR.ps1 from the worktree path .github/scripts/Review-PR.ps1 while Tasks 2-4 used "$TRUSTED/scripts/Review-PR.ps1". The trusted copy was already created (and chmod'd a-w) at lines 621-626 BEFORE the Setup pwsh call. Switching Setup to "$TRUSTED/scripts/Review-PR.ps1" makes all four phases run the same script bytes. M2 (3/3) - Setup wrapped pwsh in "set +e" and on failure set CopilotFailed=true but did not exit, so subsequent tasks ran on a possibly half-set-up worktree (no merged PR HEAD, no sentinel). The error surfaced later via CopilotFailed but only after a Gate that read stale state. Fix is two-layered: (a) Setup now exits with SETUP_EXIT after recording the failure. (b) A dedicated SetupFailed variable (distinct from CopilotFailed, which later phases also set) gates the follow-on tasks: - Task 2 Gate: condition and(succeeded(), ne(variables.SetupFailed, true)) - Task 3 Review: condition and(succeeded(), ne(variables.SetupFailed, true)) - Task 4 Post: condition ne(variables.SetupFailed, true) Using a Setup-specific variable keeps Task 4 running after Gate/ Review failures so the partial AI summary / gate-result comment still surfaces on the PR; only a Setup failure (where there is no state to post) skips Post.
… fail-closed Gate state, HMAC verdict, Phase-Audit (#35804) > [!NOTE] > Are you waiting for the changes in this PR to be merged? > It would be very helpful if you could [test the resulting artifacts](https://github.com/dotnet/maui/wiki/Testing-PR-Builds) from this PR and let us know in a comment if this change resolves your issue. Thank you! ## Stacked on #35677 This PR layers a security-hardening stack on top of @kubaflo's #35677 (`feature/enhanced-reviewer`). When merged here, the combined work flows to `main` via #35677. ## Scope Re-applies and updates the security audit work originally landed in #35324 against the latest #35677 base, plus three new defenses caught by adversarial review rounds. | # | Change | Why | |---|---|---| | F1 | Cross-phase state hardening — dedicated `phase-state/` subdir; CopilotReview re-derives `` instead of restoring a persisted path; fail-closed on missing/garbage Gate verdict | Original review said writable `Split-Path -Parent` of `\` allowed PR test code to forge `gate-result.txt` and to redirect `& pwsh -File` execution under `COPILOT_GITHUB_TOKEN` | | F1.A | HMAC-sign `gate-result.txt` via per-job `openssl rand -hex 32` key passed to CopilotReview/Post through an AzDO secret variable | Adversarial reviewer demonstrated empirically that `chmod -R a-w` is bypassable same-user (`mv state state.bak && mkdir state && echo PASSED > ...`). Key generated AFTER pwsh exits → PR-spawned daemons cannot read it | | F2 | `Out-SafePRSubprocessLine` helper + inline `-replace '(?i)##(vso\[\|\[)', '##~SANITIZED~\'` at every PR-controlled `Write-Host` site + `& pwsh -NoProfile -File` subprocess wraps at every shared runner site | Prevents PR test code from forging `##vso[task.setvariable / task.complete / task.uploadfile]` AzDO logging commands through stdout | | F3 | Marker-comment hiding now requires `Test-IsMauiBotCommentAuthor` | A maintainer who quotes `<!-- AI Summary -->` in their own comment was being silently hidden | | D1 / D1.A | Whitelist `^(NONE\|ALL\|[A-Za-z0-9_,]+)\$` before re-emitting `detectedCategories`; `^[A-Za-z0-9_\-]+\$` for `aiSummaryReviewNodeId` | Closes a future-proofing class where PR-controlled text could inject a second `##vso` directive via setvariable echo | | M1 / M2 | Setup invokes `\/scripts/Review-PR.ps1` (not the worktree); `exit \` on failure; Tasks 2-4 gated on `ne(variables['SetupFailed'], 'true')` so a half-set-up worktree can't keep running | Original review said the worktree-vs-trusted asymmetry was inconsistent and the lack of short-circuit let Gate run on stale state | | Phase-Audit | AST-based static analyzer of the `` call-graph closure: default-deny `gh`-write-verb classifier + YAML token-binding audit, run as a Pester suite | Locks the contract that Gate's GH_TOKEN is read-only; any future write-verb regression fails CI | | Pester CI | `security-scripts-pester.yml` workflow runs all 9 `.Tests.ps1` files on every PR touching `.github/scripts`, `.github/skills`, `eng/scripts`, or `eng/pipelines/ci-copilot.yml`. Required-files allowlist + minPassed floor + zero-skipped tolerance → silent test deletion / Pester skip cannot land green | Pre-existing PS tests had no CI runner; this is the first Pester gate in the repo | The original sanitizer / phase-state work passed an adversarial review with three independent reviewers (opus-4.8 + opus-4.7-xhigh + gpt-5.5); the rebase onto #35677's new Copilot-token-usage commits was re-validated by the same set. `gpt-5.5` caught one pre-existing F2 gap (UI/Device runner `\` re-emit at L1526/1539 was raw `Write-Host`) which is fixed in this branch. ## Validation - **Pester: 233/233 green** (50s runtime) - **Phase-Audit Axis A: 21/21** — no `gh` write-verb regressions on real Gate-reachable scripts - **Phase-Audit Axis B: 3/3** — every YAML task's `GH_TOKEN` binding is the documented PAT alias; `CopilotReview` and the new `AnalyzeCopilotTokenUsage` stage have no GH_TOKEN - **openssl ↔ .NET HMAC cross-check** verified locally (`-mac HMAC -macopt hexkey:` round-trips with `Convert.FromHexString`) - **YAML lint**: `yaml.safe_load` clean ## Note for #35677 owner The PR title and description should be reviewed against the actual diff before merge; this stack only adds defenses, no functional changes to the review pipeline beyond the trust-boundary tightening. cc @kubaflo --------- Co-authored-by: Copilot <Copilot@users.noreply.github.com>
Note
Are you waiting for the changes in this PR to be merged?
It would be very helpful if you could test the resulting artifacts from this PR and let us know in a comment if this change resolves your issue. Thank you!
Description
Refactors the
ci-copilotpipeline (eng/pipelines/ci-copilot.yml) by splitting the monolithicRun PR Reviewer Agentbash task into 4 separate tasks with scopedenv:blocks. Each task receives only the environment variables it needs, improving security and clarity.Pipeline Architecture (Before → After)
Before: Single bash task with all tokens (
GH_TOKEN,COPILOT_GITHUB_TOKEN) available to every step.After: 4 isolated tasks with scoped credentials:
GH_TOKENGH_TOKEN(read-only)dotnet build/test, regression analysisCOPILOT_GITHUB_TOKENGH_TOKEN— agent cannot post)GH_TOKENKey Changes
eng/pipelines/ci-copilot.ymlenv:blocksPRNumberparameter type changed fromstringtonumberpersistCredentialsset tofalsefor securitygh auth loginstep (tokens passed via env vars instead)coalesce()fordetectedCategoriesoutput variable routing (supports both Gate and AI-refreshed values).github/scripts/Review-PR.ps1-Phaseparameter (Setup,Gate,CopilotReview,Post) for per-task invocation-TrustedScriptsDirparameter for trusted script snapshot supportTrustedScriptsDirnull guards with local fallback (preventsSplit-Path $nullcrash)Security Model
The primary security improvement is isolating the Copilot agent (Task 3) from
GH_TOKEN. This prevents the AI agent from directly posting comments or pushing code. OnlyCOPILOT_GITHUB_TOKENis available in Task 3, which is scoped to Copilot operations only.Task 2 (Gate) has
GH_TOKENfor read-only GitHub API access needed byDetect-TestsInDiff.ps1to fetch PR metadata and labels.Cross-Phase Data Flow
Since each task runs in a separate process, variables must be persisted to files: