Update agent fix with user feedback, use SendMessageWithRetry, and classify fixable errors #7401
Update agent fix with user feedback, use SendMessageWithRetry, and classify fixable errors #7401hemarina merged 5 commits intoAzure:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the Copilot-driven error handling middleware to streamline agent interactions (including retry UX), adds a new “Fix this error” troubleshooting category, and adjusts templates/tests to support the updated flow.
Changes:
- Replace error “category” classification with a boolean
fixableErrorgate and add a newfixtroubleshooting category that uses the fix template directly. - Switch agent messaging to
SendMessageWithRetryfor improved interactive retry-on-error behavior. - Update prompt templates and tests to reflect the new troubleshooting/fix flow.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| cli/azd/cmd/middleware/templates/troubleshoot_fixable.tmpl | Removes the old fixable troubleshooting template (no longer referenced). |
| cli/azd/cmd/middleware/templates/fix.tmpl | Adds a provision-common-error prompt step and requests post-fix user feedback. |
| cli/azd/cmd/middleware/templates/explain.tmpl | Small reordering/clarification of “stop here” instructions. |
| cli/azd/cmd/middleware/middleware_coverage_test.go | Updates coverage tests to use fixableError instead of classifyError. |
| cli/azd/cmd/middleware/error.go | Adds fixableError, new fix category, switches to SendMessageWithRetry, and changes error flow gating. |
| cli/azd/cmd/middleware/error_test.go | Reworks classification tests into fixable/non-fixable coverage for fixableError and adds fix category constant coverage. |
jongio
left a comment
There was a problem hiding this comment.
PR Review - #7401
Update agent fix with user feedback, use SendMessageWithRetry, and classify fixable errors by @hemarina
Summary
What: Replaces the 3-category error classifier (classifyError -> ErrorCategory) with a boolean gate (fixableError -> bool). Adds a "Fix this error" option so users can skip explain/guidance and go straight to the agent fix. Switches to SendMessageWithRetry.
Why: Fixes #7104. The tri-state classification wasn't driving different behavior - only "fixable vs not" matters for the agent flow. Direct-fix option is a solid UX improvement.
Assessment: The simplification is the right call. categoryFix flow is well-structured and SendMessageWithRetry is a good upgrade. One logic bug: the retry-limit safety net is bypassed in the new fix path because previousError isn't set. Also dead code left behind.
Findings
| Category | High | Medium | Low |
|---|---|---|---|
| Logic | 1 | 0 | 0 |
| Quality | 0 | 1 | 0 |
| Tests | 0 | 2 | 0 |
| Total | 1 | 3 | 0 |
Key Findings
- [HIGH] Logic:
previousErrornever set forcategoryFixpath - 3-attempt retry limit bypassed when users pick "Fix this error" + retry repeatedly (see inline). - [MEDIUM] Quality:
ErrorCategorytype and its 3 constants (AzureContextAndOtherError,MachineContextError,UserContextError) are dead code - no references remain afterclassifyErrorwas replaced byfixableError. - [MEDIUM] Tests: No
categoryFixcase inTest_BuildPromptForCategory- doesn't verify the fix template is selected. - [MEDIUM] Tests: No
Runintegration test for thecategoryFixflow or retry-limit enforcement.
Test Coverage Estimate
fixableError: well covered with direct + wrapped error testscategoryFixconstant: covered inTest_TroubleshootCategory_ConstantscategoryFixinbuildPromptForCategory: NOT coveredcategoryFixinRunmethod: NOT covered - retry-limit bypass undetectable
What's Done Well
- Clean simplification from 3-state enum to boolean - matches actual usage
categoryFixcorrectly skips the redundant "Want to fix?" prompt- Good test restructuring with the
fixable boolfield in the sentinel table SendMessageWithRetryis the right abstraction for interactive messaging
1 inline comment below.
wbreza
left a comment
There was a problem hiding this comment.
PR Review — #7401
Update agent fix with user feedback, use SendMessageWithRetry, and classify fixable errors by @hemarina
Summary
What: Simplifies error classification from a 3-category enum to a boolean fixableError gate. Adds "Fix this error" as a direct troubleshooting option, upgrades to SendMessageWithRetry, and adds a user feedback step after agent fixes.
Why: Fixes #7104 — the tri-state classification wasn't driving different behavior. Only "fixable vs not" matters for the agent flow. Direct-fix option improves UX.
Assessment: Clean simplification with good test restructuring. One logic bug in the retry safety net (previousError not set for categoryFix), dead code left behind, and a behavioral change where return nil, nil swallows the exit code for skip/non-fixable paths.
Findings
| Priority | Count |
|---|---|
| High | 1 |
| Medium | 3 |
| Low | 1 |
| Total | 5 |
Prior Review Verification
@jongio's HIGH finding (previousError infinite loop) — verified and agreed. See inline comment.
What's Done Well
- Clean simplification from 3-category enum to boolean — matches actual usage perfectly
categoryFixlets users skip the explain-then-ask round-trip — excellent UX improvementSendMessageWithRetryis the right upgrade for agent communication resilience- Test restructuring with
fixable boolfield is clean and extensible - User feedback step in
fix.tmplenables iterative fix improvement
Review performed with GitHub Copilot CLI
Azure Dev CLI Install InstructionsInstall scriptsMacOS/Linux
bash: pwsh: WindowsPowerShell install MSI install Standalone Binary
MSI
Documentationlearn.microsoft.com documentationtitle: Azure Developer CLI reference
|
jongio
left a comment
There was a problem hiding this comment.
All previous findings addressed in db92243. The previousError infinite-loop bug is fixed, exit codes are preserved for skip/non-fixable paths, dead ErrorCategory type removed, and categoryFix test case added. No new issues found in the incremental changes. LGTM.
- Rebase onto latest main to pick up error.go refactor (PR #7401) - Replace classifyError/ErrorCategory tests with fixableError tests - Fix ExtensionRunError struct fields (ExitCode -> ExtensionId+Err) - Add COVERAGE_MIN override tip to CONTRIBUTING.md (reviewer feedback) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…er workflow (#7424) * Improve code coverage pipeline: Phase 1 tests, coverage tooling, and developer workflow - Add 16 new test files for cmd/, middleware/, mcp/, and github packages - Create local coverage orchestrator (Get-LocalCoverageReport.ps1) with -UnitOnly, -MergeWithCI hybrid mode, -Html report options - Add generated code filter (Filter-GeneratedCoverage.ps1) for *.pb.go - Integrate filter into CI pipeline (code-coverage-upload.yml) - Raise CI coverage gate from 52% to 55% (release-cli.yml) - Add Code Coverage section to CONTRIBUTING.md with 4-mode matrix - Create cli/azd/docs/code-coverage-guide.md deep-dive reference - Cross-reference all 4 coverage scripts to each other and guide Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Phase 2: Add coverage tests for auth, keyvault, pipeline, cosmosdb, sqldb, armmsi - pkg/auth: 7 new test files, 54.2% -> 80.0% (+25.8pp) Covers CredentialForCurrentUser branches, loadClaims/saveClaims, Mode/SetBuiltInAuthMode, Logout cleanup, RemoteCredential, caches - pkg/keyvault: ParseKeyVaultAppReference edge cases, validation, 37.8% -> 41.5% - pkg/pipeline: Template rendering, provider lifecycle, pure logic, 27.8% -> 30.1% - pkg/cosmosdb: New test file from 0% -> 86.7% (mock ARM client) - pkg/sqldb: New test file from 0% -> 94.4% (mock ARM client) - pkg/armmsi: New test file from 0% -> 76.7% (mock ARM client) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * docs: fill documentation gaps for coverage scripts and guide - Add missing .PARAMETER PipelineDefinitionId to Get-CICoverageReport.ps1 help - Add working directory note to CONTRIBUTING.md Code Coverage section - Add Scripts Reference table to code-coverage-guide.md listing all 7 scripts Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: resolve preflight issues (gofmt, unused types, line length) - Run gofmt -s on all new test files - Remove unused fakeHTTPClient from credential_providers_coverage_test.go - Remove unused jwtTokenCredential from wave3_coverage_test.go - Break long line in middleware_coverage2_test.go (lll) - Fix trailing whitespace in pipeline_coverage_test.go Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * feat: add mage coverage targets and address review feedback - Add Coverage namespace to magefile with 6 targets (unit, full, hybrid, ci, html, check) - Update CONTRIBUTING.md and code-coverage-guide.md with mage target docs - Fix stderr redirect in Get-LocalCoverageReport.ps1 token acquisition (review thread #1) - Fix tautological sentinel test in keyvault_coverage_test.go (review thread #3) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: resolve CI failures - cspell words and gosec nolint for path traversal Agent-Logs-Url: https://github.com/Azure/azure-dev/sessions/75a62821-a5d2-4f5b-8372-b3b8fe8d1802 Co-authored-by: jongio <2163001+jongio@users.noreply.github.com> * fix: CI test failures — env var isolation and CI skip - middleware_coverage2_test: skip TestLoginGuard_EnsureLogin_ConfirmError in CI where IsRunningOnCI() short-circuits before console Confirm - manager_coverage_test: use t.Setenv+os.Unsetenv for proper env var isolation (code uses os.LookupEnv which distinguishes empty vs unset) - Remove bare os.Unsetenv calls that mutated shared process state Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: lower coverage:check threshold to 50% for unit-only mode The coverage:check mage target uses unit-only mode (52.7%) but was defaulting to 55% threshold (designed for CI combined unit+integration). Lower to 50% which is appropriate for unit-only, and update docs to clarify the distinction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix: rebase onto main, update tests for fixableError API change - Rebase onto latest main to pick up error.go refactor (PR #7401) - Replace classifyError/ErrorCategory tests with fixableError tests - Fix ExtensionRunError struct fields (ExitCode -> ExtensionId+Err) - Add COVERAGE_MIN override tip to CONTRIBUTING.md (reviewer feedback) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
This pull request refactors the error handling middleware to introduce a clearer distinction between "fixable" and "non-fixable" errors, and adds a new "Fix this error" troubleshooting option. It updates the logic for classifying errors, modifies the user interaction flow, and improves the templates and tests to reflect these changes.
Error classification and handling:
classifyErrorfunction withfixableError, which determines if an error is fixable by the agent. The logic for what constitutes a fixable error has been updated, and the error classification now directly controls whether agent-driven troubleshooting is offered. [1] [2]User interaction and troubleshooting options:
categoryFix, allowing users to directly request an agent-driven fix for an error. Updates the prompt and selection logic to include this new category. [1] [2] [3]Templates and agent prompts:
fix.tmpltemplate to call an additional tool (azd_provision_common_error) before troubleshooting and adds a step for user feedback after the fix is applied. [1] [2]explain.tmpltemplate to clarify the instructions for the agent, ensuring no fix steps are proposed in the explanation-only mode.troubleshoot_fixable.tmpltemplate.Testing updates:
fixableErrorlogic, ensuring all error types are properly classified as fixable or not, and updates tests to reflect the new troubleshooting flow and constants. [1] [2] [3]Code cleanup: