Skip to content

OCPBUGS-74573: Add validation for AzureManaged boot diagnostics on Azure Stack Hub#183

Merged
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
gpei:fix-OCPBUGS-74573
Mar 9, 2026
Merged

OCPBUGS-74573: Add validation for AzureManaged boot diagnostics on Azure Stack Hub#183
openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
gpei:fix-OCPBUGS-74573

Conversation

@gpei
Copy link
Copy Markdown
Contributor

@gpei gpei commented Feb 7, 2026

Azure Stack Hub does not support AzureManaged boot diagnostics storage and requires UserManaged storage accounts. Without validation, users experience cryptic 400 errors from the Azure API when attempting to use AzureManaged boot diagnostics on Stack Hub.

This change adds validation in the createDiagnosticsConfig function to detect when AzureManaged boot diagnostics is configured on Azure Stack Hub and returns a clear InvalidMachineConfiguration error.

Summary by CodeRabbit

  • Bug Fixes
    • Enhanced, environment-aware validation for boot diagnostics on Azure Stack Hub to block unsupported configurations.
    • Managed diagnostics storage settings are now checked earlier to prevent invalid deployments.
    • Error messages and failure handling improved to give clearer feedback when diagnostics configurations are not supported.

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@gpei: This pull request references Jira Issue OCPBUGS-74573, which is invalid:

  • expected the bug to target the "4.22.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Azure Stack Hub does not support AzureManaged boot diagnostics storage and requires UserManaged storage accounts. Without validation, users experience cryptic 400 errors from the Azure API when attempting to use AzureManaged boot diagnostics on Stack Hub.

This change adds validation in the createDiagnosticsConfig function to detect when AzureManaged boot diagnostics is configured on Azure Stack Hub and returns a clear InvalidMachineConfiguration error.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gpei
Copy link
Copy Markdown
Contributor Author

gpei commented Feb 7, 2026

/jira refresh

@openshift-ci-robot openshift-ci-robot added jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. and removed jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Feb 7, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@gpei: This pull request references Jira Issue OCPBUGS-74573, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @huali9

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@gpei gpei force-pushed the fix-OCPBUGS-74573 branch from a48a34e to 194549c Compare February 28, 2026 12:10
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 28, 2026

Walkthrough

The createDiagnosticsConfig signature now accepts a MachineScope and uses it to validate diagnostics configuration; specifically, it rejects Azure-managed boot diagnostics on Azure Stack Hub. Call sites and unit tests were updated to supply the scope and cover Stack Hub scenarios.

Changes

Cohort / File(s) Summary
Diagnostics Configuration Validation
pkg/cloud/azure/actuators/machine/reconciler.go
Function signature changed to createDiagnosticsConfig(scope *actuators.MachineScope, config *machinev1.AzureMachineProviderSpec); adds scope-based validation to error on Azure Stack Hub when AzureManagedAzureDiagnosticsStorage is used; call sites updated to pass scope.
Test Coverage Updates
pkg/cloud/azure/actuators/machine/reconciler_test.go
Tests updated to construct and pass tc.scope into createDiagnosticsConfig; new cases added for Azure Stack Hub success and failure paths; expected configs/errors adjusted accordingly.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding validation for AzureManaged boot diagnostics on Azure Stack Hub, which aligns with the function signature update and new validation logic introduced.
Stable And Deterministic Test Names ✅ Passed All 44 test case names in the modified file are stable, deterministic static strings with no dynamic values, timestamps, UUIDs, or random identifiers.
Test Structure And Quality ✅ Passed TestCreateDiagnosticsConfig uses table-driven test design with clear test cases, proper assertions using Gomega library, and tests multiple scenarios including Azure environments and boot diagnostics configurations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot
Copy link
Copy Markdown

@gpei: This pull request references Jira Issue OCPBUGS-74573, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @huali9

Details

In response to this:

Azure Stack Hub does not support AzureManaged boot diagnostics storage and requires UserManaged storage accounts. Without validation, users experience cryptic 400 errors from the Azure API when attempting to use AzureManaged boot diagnostics on Stack Hub.

This change adds validation in the createDiagnosticsConfig function to detect when AzureManaged boot diagnostics is configured on Azure Stack Hub and returns a clear InvalidMachineConfiguration error.

Summary by CodeRabbit

  • Bug Fixes
  • Enhanced validation for boot diagnostics configurations on Azure Stack Hub environments
  • Managed diagnostics storage configurations are now validated before deployment to prevent unsupported setups
  • Improved error handling provides clear feedback when attempting invalid configurations on Stack Hub environments

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
pkg/cloud/azure/actuators/machine/reconciler_test.go (1)

604-622: Consider explicitly setting CloudEnv for public Azure test case.

The test case name implies it's testing "public Azure" behavior, but the scope doesn't explicitly set CloudEnv. While this works (empty cloudEnv is not Stack Hub), consider using NewFakeMachineScope with CloudEnv: string(configv1.AzurePublicCloud) for consistency with the new Stack Hub test cases and to make the test's intent clearer.

🔧 Suggested improvement for explicitness
 		{
 			name: "with an Azure managed storage account on public Azure",
-			scope: &actuators.MachineScope{
-				Machine: &machinev1.Machine{},
-			},
+			scope: actuators.NewFakeMachineScope(actuators.FakeMachineScopeParams{
+				Machine:  &machinev1.Machine{},
+				CloudEnv: string(configv1.AzurePublicCloud),
+				Context:  context.Background(),
+			}),
 			config: &machinev1.AzureMachineProviderSpec{
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cloud/azure/actuators/machine/reconciler_test.go` around lines 604 - 622,
The test case for "with an Azure managed storage account on public Azure" should
explicitly set the CloudEnv to public Azure for clarity; update the test to
construct the scope via NewFakeMachineScope (or set scope.CloudEnv) and assign
CloudEnv: string(configv1.AzurePublicCloud) on the MachineScope used in the test
so the intent matches the other Stack Hub cases and the name accurately reflects
the environment being exercised (refer to MachineScope, NewFakeMachineScope, and
configv1.AzurePublicCloud).
pkg/cloud/azure/actuators/machine/reconciler.go (1)

912-950: Consider adding webhook validation for earlier user feedback.

Per the previous review discussion, a validating webhook could catch this invalid configuration earlier and provide immediate feedback to users, avoiding error loops in the controller. If the codebase has similar webhook validations for other Azure Stack Hub constraints, this would be a good candidate to add there as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cloud/azure/actuators/machine/reconciler.go` around lines 912 - 950, Add
proactive validation via the AzureMachine validating webhook to reject
configurations that will fail at runtime: detect when
spec.Diagnostics.Boot.StorageAccountType ==
machinev1.AzureManagedAzureDiagnosticsStorage and the cluster is an Azure Stack
Hub (same logic used in createDiagnosticsConfig and scope.IsStackHub()) and
return a clear admission error; also validate customer-managed configs (require
CustomerManaged != nil and StorageAccountURI non-empty) mirroring the checks in
createDiagnosticsConfig so the webhook rejects missing/invalid StorageAccountURI
up-front. Ensure the webhook handler uses the AzureMachineProviderSpec type and
error messages consistent with the controller (same wording as in
createDiagnosticsConfig) so users receive immediate feedback rather than
controller error loops.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/cloud/azure/actuators/machine/reconciler_test.go`:
- Around line 604-622: The test case for "with an Azure managed storage account
on public Azure" should explicitly set the CloudEnv to public Azure for clarity;
update the test to construct the scope via NewFakeMachineScope (or set
scope.CloudEnv) and assign CloudEnv: string(configv1.AzurePublicCloud) on the
MachineScope used in the test so the intent matches the other Stack Hub cases
and the name accurately reflects the environment being exercised (refer to
MachineScope, NewFakeMachineScope, and configv1.AzurePublicCloud).

In `@pkg/cloud/azure/actuators/machine/reconciler.go`:
- Around line 912-950: Add proactive validation via the AzureMachine validating
webhook to reject configurations that will fail at runtime: detect when
spec.Diagnostics.Boot.StorageAccountType ==
machinev1.AzureManagedAzureDiagnosticsStorage and the cluster is an Azure Stack
Hub (same logic used in createDiagnosticsConfig and scope.IsStackHub()) and
return a clear admission error; also validate customer-managed configs (require
CustomerManaged != nil and StorageAccountURI non-empty) mirroring the checks in
createDiagnosticsConfig so the webhook rejects missing/invalid StorageAccountURI
up-front. Ensure the webhook handler uses the AzureMachineProviderSpec type and
error messages consistent with the controller (same wording as in
createDiagnosticsConfig) so users receive immediate feedback rather than
controller error loops.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 60f6443 and 194549c.

📒 Files selected for processing (2)
  • pkg/cloud/azure/actuators/machine/reconciler.go
  • pkg/cloud/azure/actuators/machine/reconciler_test.go

… Azure Stack Hub

Azure Stack Hub does not support Azure-managed boot diagnostics storage
and requires customer-managed storage accounts. Without validation, users
experience cryptic 400 errors from the Azure API when attempting to use
AzureManaged boot diagnostics on Stack Hub.

This change adds validation in the createDiagnosticsConfig function to
detect when AzureManaged boot diagnostics is configured on Azure Stack Hub
and returns a clear InvalidMachineConfiguration error with an actionable
message directing users to use CustomerManaged boot diagnostics instead.

Fixes: OCPBUGS-74573

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@gpei gpei force-pushed the fix-OCPBUGS-74573 branch from 194549c to 1ccb43e Compare March 1, 2026 06:46
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/cloud/azure/actuators/machine/reconciler.go (1)

912-925: Consider guarding scope before calling scope.IsStackHub().

This helper now dereferences scope; adding a defensive check avoids a panic if a future caller passes nil.

Suggested hardening diff
 func createDiagnosticsConfig(scope *actuators.MachineScope, config *machinev1.AzureMachineProviderSpec) (*armcompute.DiagnosticsProfile, error) {
 	boot := config.Diagnostics.Boot
 	if boot == nil {
 		return nil, nil
 	}

 	switch boot.StorageAccountType {
 	case machinev1.AzureManagedAzureDiagnosticsStorage:
+		if scope == nil {
+			return nil, fmt.Errorf("cannot validate AzureManaged boot diagnostics: machine scope is nil")
+		}
 		// Validate that AzureManaged boot diagnostics is not used on Azure Stack Hub
 		if scope.IsStackHub() {
 			return nil, machinecontroller.InvalidMachineConfiguration(
 				"AzureManaged boot diagnostics is not supported on Azure Stack Hub",
 			)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cloud/azure/actuators/machine/reconciler.go` around lines 912 - 925, The
helper createDiagnosticsConfig dereferences scope when calling
scope.IsStackHub(), so add a defensive nil check for scope at the start of that
function (createDiagnosticsConfig) and handle a nil scope gracefully (return a
clear error via machinecontroller.InvalidMachineConfiguration or nil as
appropriate) before any calls to scope.IsStackHub(); ensure all early returns
use the same error-handling pattern used elsewhere in the function so callers
get a consistent error when scope is nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/cloud/azure/actuators/machine/reconciler.go`:
- Around line 912-925: The helper createDiagnosticsConfig dereferences scope
when calling scope.IsStackHub(), so add a defensive nil check for scope at the
start of that function (createDiagnosticsConfig) and handle a nil scope
gracefully (return a clear error via
machinecontroller.InvalidMachineConfiguration or nil as appropriate) before any
calls to scope.IsStackHub(); ensure all early returns use the same
error-handling pattern used elsewhere in the function so callers get a
consistent error when scope is nil.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 194549c and 1ccb43e.

📒 Files selected for processing (2)
  • pkg/cloud/azure/actuators/machine/reconciler.go
  • pkg/cloud/azure/actuators/machine/reconciler_test.go

Copy link
Copy Markdown
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 7, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 7, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damdo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 7, 2026
@damdo
Copy link
Copy Markdown
Member

damdo commented Mar 7, 2026

@sunzhaohua2 @huali9 could you please coordinate with @gpei for verification? Thanks!

@sunzhaohua2
Copy link
Copy Markdown
Contributor

/verified by @sunzhaohua2
Before create machine with storageAccountType: AzureManaged on ASH

          diagnostics:
            boot:
              storageAccountType: AzureManaged
        RESPONSE 400: 400 Bad Request
        ERROR CODE: InvalidParameter
        --------------------------------------------------------------------------------
        {
          "error": {
            "code": "InvalidParameter",
            "message": "Required parameter 'bootDiagnostics.storageAccountUri' is missing (null).",
            "target": "bootDiagnostics.storageAccountUri"
          }
        }

After the fix

$ oc get machine                                                                                                   
NAME                                    PHASE     TYPE              REGION   ZONE   AGE
zhsun-az391-27nwc-master-0              Running   Standard_DS4_v2   mtcazs   0      82m
zhsun-az391-27nwc-master-1              Running   Standard_DS4_v2   mtcazs   1      82m
zhsun-az391-27nwc-master-2              Running   Standard_DS4_v2   mtcazs   0      82m
zhsun-az391-27nwc-worker-1-d59xb        Failed                                      2m18s

  errorMessage: 'failed to reconcile machine "zhsun-az391-27nwc-worker-1-d59xb": failed
    to create vm zhsun-az391-27nwc-worker-1-d59xb: failed to configure diagnostics
    profile: AzureManaged boot diagnostics is not supported on Azure Stack Hub'
  errorReason: InvalidConfiguration
  lastUpdated: "2026-03-09T08:32:48Z"
  phase: Failed

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 9, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@sunzhaohua2: This PR has been marked as verified by @sunzhaohua2.

Details

In response to this:

/verified by @sunzhaohua2
Before create machine with storageAccountType: AzureManaged on ASH

         diagnostics:
           boot:
             storageAccountType: AzureManaged
       RESPONSE 400: 400 Bad Request
       ERROR CODE: InvalidParameter
       --------------------------------------------------------------------------------
       {
         "error": {
           "code": "InvalidParameter",
           "message": "Required parameter 'bootDiagnostics.storageAccountUri' is missing (null).",
           "target": "bootDiagnostics.storageAccountUri"
         }
       }

After the fix

$ oc get machine                                                                                                   
NAME                                    PHASE     TYPE              REGION   ZONE   AGE
zhsun-az391-27nwc-master-0              Running   Standard_DS4_v2   mtcazs   0      82m
zhsun-az391-27nwc-master-1              Running   Standard_DS4_v2   mtcazs   1      82m
zhsun-az391-27nwc-master-2              Running   Standard_DS4_v2   mtcazs   0      82m
zhsun-az391-27nwc-worker-1-d59xb        Failed                                      2m18s

 errorMessage: 'failed to reconcile machine "zhsun-az391-27nwc-worker-1-d59xb": failed
   to create vm zhsun-az391-27nwc-worker-1-d59xb: failed to configure diagnostics
   profile: AzureManaged boot diagnostics is not supported on Azure Stack Hub'
 errorReason: InvalidConfiguration
 lastUpdated: "2026-03-09T08:32:48Z"
 phase: Failed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Mar 9, 2026

@gpei: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit f24dc4c into openshift:main Mar 9, 2026
9 checks passed
@openshift-ci-robot
Copy link
Copy Markdown

@gpei: Jira Issue Verification Checks: Jira Issue OCPBUGS-74573
✔️ This pull request was pre-merge verified.
✔️ All associated pull requests have merged.
✔️ All associated, merged pull requests were pre-merge verified.

Jira Issue OCPBUGS-74573 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓

Details

In response to this:

Azure Stack Hub does not support AzureManaged boot diagnostics storage and requires UserManaged storage accounts. Without validation, users experience cryptic 400 errors from the Azure API when attempting to use AzureManaged boot diagnostics on Stack Hub.

This change adds validation in the createDiagnosticsConfig function to detect when AzureManaged boot diagnostics is configured on Azure Stack Hub and returns a clear InvalidMachineConfiguration error.

Summary by CodeRabbit

  • Bug Fixes
  • Enhanced, environment-aware validation for boot diagnostics on Azure Stack Hub to block unsupported configurations.
  • Managed diagnostics storage settings are now checked earlier to prevent invalid deployments.
  • Error messages and failure handling improved to give clearer feedback when diagnostics configurations are not supported.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-merge-robot
Copy link
Copy Markdown
Contributor

Fix included in accepted release 4.22.0-0.nightly-2026-03-10-100251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants