Skip to content

Add AKS starter deployment E2E test#14351

Merged
mitchdenny merged 21 commits intomainfrom
aks-e2e-deployment
Feb 6, 2026
Merged

Add AKS starter deployment E2E test#14351
mitchdenny merged 21 commits intomainfrom
aks-e2e-deployment

Conversation

@mitchdenny
Copy link
Member

Summary

This PR adds a new end-to-end deployment test that validates deploying Aspire applications to Azure Kubernetes Service (AKS).

Phase 1 (This PR)

Infrastructure foundation only:

  • Creates resource group, Azure Container Registry (ACR), and AKS cluster
  • Attaches ACR to AKS for seamless image pulls
  • Configures kubectl credentials
  • Verifies cluster connectivity (kubectl get nodes, kubectl cluster-info)
  • Cleans up resources after test (fire-and-forget)

Future Phases

  • Phase 2: Add Aspire project creation, Kubernetes hosting package, and Helm chart generation
  • Phase 3: Full deployment with pod verification
  • Phase 4: Documentation and polish

Test Details

  • Uses minimal AKS configuration (1 node, Standard_B2s) to minimize cost
  • ~45 minute timeout to accommodate AKS provisioning (~10-15 min)
  • Follows existing deployment E2E test patterns

Testing

Request /deployment-test to run the new test against Azure infrastructure.

This adds a new end-to-end deployment test that validates Azure Kubernetes Service (AKS) infrastructure creation:

- Creates resource group, ACR, and AKS cluster
- Configures kubectl credentials
- Verifies cluster connectivity
- Cleans up resources after test

Phase 1 focuses on infrastructure only - Aspire deployment will be added in subsequent phases.
Copilot AI review requested due to automatic review settings February 5, 2026 02:45
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 14351

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 14351"

@mitchdenny
Copy link
Member Author

/deployment-test

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

🚀 Deployment tests starting on PR #14351...

This will deploy to real Azure infrastructure. Results will be posted here when complete.

View workflow run

@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 02:50 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 02:50 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 02:50 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 02:50 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 02:50 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 02:50 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 02:50 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 02:50 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 02:50 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 02:50 Inactive
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a Phase 1 end-to-end test for deploying Aspire applications to Azure Kubernetes Service (AKS). The test validates the infrastructure provisioning foundation by creating a resource group, Azure Container Registry (ACR), and AKS cluster, then verifying cluster connectivity.

Changes:

  • Adds AksStarterDeploymentTests.cs with infrastructure-only E2E test
  • Creates minimal AKS cluster (1 node, Standard_B2s) with attached ACR
  • Verifies cluster health using kubectl commands
  • Implements fire-and-forget resource cleanup

/// </summary>
public sealed class AksStarterDeploymentTests(ITestOutputHelper output)
{
// Timeout set to 45 minutes to allow for AKS provisioning (~10-15 min) plus deployment.
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timeout comment mentions "plus deployment" but Phase 1 (current implementation) only performs infrastructure provisioning and verification, not deployment. The actual operations sum to approximately 25 minutes maximum (20 min AKS creation + ~5 min for other operations). While the 45-minute timeout may be intended for future phases, the comment could be misleading for the current implementation. Consider updating the comment to reflect the current phase or noting that the timeout accounts for future deployment steps.

Suggested change
// Timeout set to 45 minutes to allow for AKS provisioning (~10-15 min) plus deployment.
// Timeout set to 45 minutes to allow for AKS provisioning and verification (currently ~25 min),
// with additional headroom reserved for future phases that will include deployment.

Copilot uses AI. Check for mistakes.
Comment on lines +195 to +219
private static void TriggerCleanupResourceGroup(string resourceGroupName, ITestOutputHelper output)
{
var process = new System.Diagnostics.Process
{
StartInfo = new System.Diagnostics.ProcessStartInfo
{
FileName = "az",
Arguments = $"group delete --name {resourceGroupName} --yes --no-wait",
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false,
CreateNoWindow = true
}
};

try
{
process.Start();
output.WriteLine($"Cleanup triggered for resource group: {resourceGroupName}");
}
catch (Exception ex)
{
output.WriteLine($"Failed to trigger cleanup: {ex.Message}");
}
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup method pattern is inconsistent with the rest of the codebase. Most other deployment tests (AzureContainerRegistryDeploymentTests, AzureAppConfigDeploymentTests, AzureEventHubsDeploymentTests, etc.) use a private async method named CleanupResourceGroupAsync that awaits WaitForExitAsync() on the process. This test uses a synchronous TriggerCleanupResourceGroup method that doesn't wait for process completion or dispose the process object.

This inconsistency makes the codebase harder to maintain and could lead to resource leaks since the Process object is never disposed. Consider refactoring to match the established pattern in files like AzureContainerRegistryDeploymentTests.cs:204-237.

Copilot uses AI. Check for mistakes.
Comment on lines +197 to +218
var process = new System.Diagnostics.Process
{
StartInfo = new System.Diagnostics.ProcessStartInfo
{
FileName = "az",
Arguments = $"group delete --name {resourceGroupName} --yes --no-wait",
RedirectStandardOutput = true,
RedirectStandardError = true,
UseShellExecute = false,
CreateNoWindow = true
}
};

try
{
process.Start();
output.WriteLine($"Cleanup triggered for resource group: {resourceGroupName}");
}
catch (Exception ex)
{
output.WriteLine($"Failed to trigger cleanup: {ex.Message}");
}
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Process object created in the cleanup method is never disposed, which can lead to resource leaks. The process should be wrapped in a using statement or explicitly disposed after starting. This pattern is correctly implemented in other deployment tests like AzureContainerRegistryDeploymentTests.cs where the process is properly awaited and implicitly disposed through proper async patterns.

Copilot uses AI. Check for mistakes.
// Clean up the resource group we created (includes AKS cluster and ACR)
output.WriteLine($"Triggering cleanup of resource group: {resourceGroupName}");
TriggerCleanupResourceGroup(resourceGroupName, output);
DeploymentReporter.ReportCleanupStatus(resourceGroupName, success: true, "Cleanup triggered (fire-and-forget)");
Copy link

Copilot AI Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup status is reported as successful (success: true) even though the cleanup process is fire-and-forget and doesn't wait for completion or check if the deletion actually succeeded. This could be misleading in logs and reports. While this pattern is also used in AcaStarterDeploymentTests.cs:281, most other deployment tests (like AzureContainerRegistryDeploymentTests, AzureAppConfigDeploymentTests, etc.) use an async cleanup method that waits for the process and checks the exit code before reporting status. Consider either: (1) waiting for the process to complete and reporting actual success/failure, or (2) using a more accurate message like "Cleanup initiated" and a neutral status indicator.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

🎬 CLI E2E Test Recordings

The following terminal recordings are available for commit dd62bbf:

Test Recording
AgentCommands_AllHelpOutputs_AreCorrect ▶️ View Recording
AgentInitCommand_MigratesDeprecatedConfig ▶️ View Recording
Banner_DisplayedOnFirstRun ▶️ View Recording
Banner_DisplayedWithExplicitFlag ▶️ View Recording
CreateAndDeployToDockerCompose ▶️ View Recording
CreateAndDeployToDockerComposeInteractive ▶️ View Recording
CreateAndPublishToKubernetes ▶️ View Recording
CreateAndRunAspireStarterProject ▶️ View Recording
CreateAndRunJsReactProject ▶️ View Recording
CreateAndRunPythonReactProject ▶️ View Recording
CreateEmptyAppHostProject ▶️ View Recording
CreateStartAndStopAspireProject ▶️ View Recording
CreateTypeScriptAppHostWithViteApp ▶️ View Recording
DoctorCommand_DetectsDeprecatedAgentConfig ▶️ View Recording
DoctorCommand_WithSslCertDir_ShowsTrusted ▶️ View Recording
DoctorCommand_WithoutSslCertDir_ShowsPartiallyTrusted ▶️ View Recording
LogsCommandShowsResourceLogs ▶️ View Recording
PsCommandListsRunningAppHost ▶️ View Recording
ResourcesCommandShowsRunningResources ▶️ View Recording

📹 Recordings uploaded automatically from CI run #21740059443

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

Deployment E2E Tests failed

Summary: 10 passed, 3 failed, 0 cancelled

View workflow run

Passed Tests

  • ✅ AuthenticationTests
  • ✅ AzureStorageDeploymentTests
  • ✅ PythonFastApiDeploymentTests
  • ✅ AzureServiceBusDeploymentTests
  • ✅ AcaStarterDeploymentTests
  • ✅ AzureContainerRegistryDeploymentTests
  • ✅ AzureAppConfigDeploymentTests
  • ✅ AzureLogAnalyticsDeploymentTests
  • ✅ AzureKeyVaultDeploymentTests
  • ✅ AzureEventHubsDeploymentTests

Failed Tests

  • ❌ AppServicePythonDeploymentTests
  • ❌ AksStarterDeploymentTests
  • ❌ AppServiceReactDeploymentTests

🎬 Terminal Recordings

Test Recording
DeployAzureAppConfigResource ▶️ View Recording
DeployAzureContainerRegistryResource ▶️ View Recording
DeployAzureEventHubsResource ▶️ View Recording
DeployAzureKeyVaultResource ▶️ View Recording
DeployAzureLogAnalyticsResource ▶️ View Recording
DeployAzureServiceBusResource ▶️ View Recording
DeployAzureStorageResource ▶️ View Recording
DeployPythonFastApiTemplateToAzureContainerApps ▶️ View Recording
DeployStarterTemplateToAks ▶️ View Recording
DeployStarterTemplateToAzureContainerApps ▶️ View Recording

Add step to register Microsoft.ContainerService and Microsoft.ContainerRegistry
resource providers before attempting to create AKS resources. This fixes the
MissingSubscriptionRegistration error when the subscription hasn't been
configured for AKS usage.
@mitchdenny
Copy link
Member Author

/deployment-test

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

🚀 Deployment tests starting on PR #14351...

This will deploy to real Azure infrastructure. Results will be posted here when complete.

View workflow run

@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 03:25 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 03:25 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 03:25 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 05:49 Inactive
@github-actions github-actions bot temporarily deployed to deployment-testing February 5, 2026 05:49 Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

Deployment E2E Tests failed

Summary: 10 passed, 3 failed, 0 cancelled

View workflow run

Passed Tests

  • ✅ AzureContainerRegistryDeploymentTests
  • ✅ AzureServiceBusDeploymentTests
  • ✅ AzureAppConfigDeploymentTests
  • ✅ AzureLogAnalyticsDeploymentTests
  • ✅ AuthenticationTests
  • ✅ AcaStarterDeploymentTests
  • ✅ AzureEventHubsDeploymentTests
  • ✅ AzureKeyVaultDeploymentTests
  • ✅ AzureStorageDeploymentTests
  • ✅ PythonFastApiDeploymentTests

Failed Tests

  • ❌ AppServiceReactDeploymentTests
  • ❌ AksStarterDeploymentTests
  • ❌ AppServicePythonDeploymentTests

🎬 Terminal Recordings

Test Recording
DeployAzureAppConfigResource ▶️ View Recording
DeployAzureContainerRegistryResource ▶️ View Recording
DeployAzureEventHubsResource ▶️ View Recording
DeployAzureKeyVaultResource ▶️ View Recording
DeployAzureLogAnalyticsResource ▶️ View Recording
DeployAzureServiceBusResource ▶️ View Recording
DeployAzureStorageResource ▶️ View Recording
DeployPythonFastApiTemplateToAzureContainerApps ▶️ View Recording
DeployStarterTemplateToAks ▶️ View Recording
DeployStarterTemplateToAzureContainerApps ▶️ View Recording

When multiple endpoints resolve to the same port number, the Service
manifest generator was creating duplicate port entries, which Kubernetes
rejects as invalid. This fix deduplicates ports by (port, protocol)
before adding them to the Service spec.

Fixes the error:
  Service 'xxx-service' is invalid: spec.ports[1]: Duplicate value
@mitchdenny
Copy link
Member Author

/deployment-test

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

🚀 Deployment tests starting on PR #14351...

This will deploy to real Azure infrastructure. Results will be posted here when complete.

View workflow run

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

Deployment E2E Tests failed

Summary: 10 passed, 3 failed, 0 cancelled

View workflow run

Passed Tests

  • ✅ AzureAppConfigDeploymentTests
  • ✅ AzureLogAnalyticsDeploymentTests
  • ✅ AcaStarterDeploymentTests
  • ✅ AzureEventHubsDeploymentTests
  • ✅ AuthenticationTests
  • ✅ AzureKeyVaultDeploymentTests
  • ✅ AzureContainerRegistryDeploymentTests
  • ✅ AzureStorageDeploymentTests
  • ✅ AzureServiceBusDeploymentTests
  • ✅ PythonFastApiDeploymentTests

Failed Tests

  • ❌ AksStarterDeploymentTests
  • ❌ AppServiceReactDeploymentTests
  • ❌ AppServicePythonDeploymentTests

🎬 Terminal Recordings

Test Recording
DeployAzureAppConfigResource ▶️ View Recording
DeployAzureContainerRegistryResource ▶️ View Recording
DeployAzureEventHubsResource ▶️ View Recording
DeployAzureKeyVaultResource ▶️ View Recording
DeployAzureLogAnalyticsResource ▶️ View Recording
DeployAzureServiceBusResource ▶️ View Recording
DeployAzureStorageResource ▶️ View Recording
DeployPythonFastApiTemplateToAzureContainerApps ▶️ View Recording
DeployStarterTemplateToAks ▶️ View Recording
DeployStarterTemplateToAzureContainerApps ▶️ View Recording

Added Step 6 to explicitly run 'az aks update --attach-acr' after AKS
cluster creation to ensure the AcrPull role assignment has properly
propagated. This addresses potential image pull permission issues where
AKS cannot pull images from the attached ACR.

Also renumbered all subsequent steps to maintain proper ordering.
@mitchdenny
Copy link
Member Author

/deployment-test

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

🚀 Deployment tests starting on PR #14351...

This will deploy to real Azure infrastructure. Results will be posted here when complete.

View workflow run

@github-actions
Copy link
Contributor

github-actions bot commented Feb 5, 2026

Deployment E2E Tests failed

Summary: 10 passed, 3 failed, 0 cancelled

View workflow run

Passed Tests

  • ✅ AcaStarterDeploymentTests
  • ✅ AzureAppConfigDeploymentTests
  • ✅ AuthenticationTests
  • ✅ AzureContainerRegistryDeploymentTests
  • ✅ AzureKeyVaultDeploymentTests
  • ✅ PythonFastApiDeploymentTests
  • ✅ AzureStorageDeploymentTests
  • ✅ AzureEventHubsDeploymentTests
  • ✅ AzureServiceBusDeploymentTests
  • ✅ AzureLogAnalyticsDeploymentTests

Failed Tests

  • ❌ AppServiceReactDeploymentTests
  • ❌ AksStarterDeploymentTests
  • ❌ AppServicePythonDeploymentTests

🎬 Terminal Recordings

Test Recording
DeployAzureAppConfigResource ▶️ View Recording
DeployAzureContainerRegistryResource ▶️ View Recording
DeployAzureEventHubsResource ▶️ View Recording
DeployAzureKeyVaultResource ▶️ View Recording
DeployAzureLogAnalyticsResource ▶️ View Recording
DeployAzureServiceBusResource ▶️ View Recording
DeployAzureStorageResource ▶️ View Recording
DeployPythonFastApiTemplateToAzureContainerApps ▶️ View Recording
DeployStarterTemplateToAks ▶️ View Recording
DeployStarterTemplateToAzureContainerApps ▶️ View Recording

},
};

// Deduplicate ports by port number and protocol to avoid invalid Service specs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we allocating ports properly there should be no dupes unless the user explicitly added dupes

Mitch Denny added 10 commits February 6, 2026 11:00
The Kubernetes publisher was generating duplicate Service/container ports
(both 8080/TCP) for ProjectResources with default http+https endpoints.
The root cause is that GenerateDefaultProjectEndpointMapping assigns the
same default port 8080 to every endpoint with None target port.

The proper fix mirrors the core framework's SetBothPortsEnvVariables()
behavior: skip the DefaultHttpsEndpoint (which the container won't listen
on — TLS termination happens at ingress/service mesh). The https endpoint
still gets an EndpointMapping (for service discovery) but reuses the http
endpoint's HelmValue, so no duplicate K8s port is generated.

Added Aspire.Hosting.Kubernetes to InternalsVisibleTo to access
ProjectResource.DefaultHttpsEndpoint. The downstream dedup in ToService()
and WithContainerPorts() remains as defense-in-depth.

Fixes #14029
Mitch Denny added 3 commits February 6, 2026 15:57
Validates the Aspire starter template with Redis cache enabled deploys
correctly to AKS. Exercises the full pipeline: webfrontend → apiservice
→ Redis by hitting the /weather page (SSR, uses Redis output caching).

Key differences from the base AKS test:
- Selects 'Yes' for Redis Cache in aspire new prompts
- Redis uses public container image (no ACR push needed)
- Verifies /weather page content (confirms Redis integration works)
Both AKS tests generated the same ACR name from RunId+RunAttempt.
Use different prefixes (acrs/acrr) to ensure uniqueness.
Work around K8s publisher bug where cross-resource secret references create
Helm value paths under the consuming resource instead of referencing the
owning resource's secret. The webfrontend template expects
secrets.webfrontend.cache_password but values.yaml only has
secrets.cache.REDIS_PASSWORD. Provide the missing value via --set.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants