Skip to content

AKS E2E tests: Redis variant, port fix, and reliability improvements#14371

Merged
mitchdenny merged 36 commits intomainfrom
aks-e2e-deployment
Feb 9, 2026
Merged

AKS E2E tests: Redis variant, port fix, and reliability improvements#14371
mitchdenny merged 36 commits intomainfrom
aks-e2e-deployment

Conversation

@mitchdenny
Copy link
Member

Summary

Follow-up to #14351 (merged). Adds the remaining AKS deployment test improvements:

Changes

  1. Fix duplicate K8s ports (KubernetesResource.cs): Skip DefaultHttpsEndpoint in ProcessEndpoints() to prevent duplicate Service/container ports. This is the proper upstream fix for the issue reported in (code provided) Minimal k8s aspire application generates invalid YAML assets: duplicate ports, invalid values, etc. #14029 — the Kubernetes publisher now matches the core framework's behavior of treating the default HTTPS endpoint as a listener alias for the HTTP port.

  2. Add AKS + Redis E2E test (AksStarterWithRedisDeploymentTests.cs): New deployment test that creates an Aspire starter app with Redis enabled, deploys to AKS, and verifies the /weather page works end-to-end (webfrontend → apiservice → Redis output cache).

  3. Fix ACR name collision: Both AKS tests generated the same ACR name from RunId+RunAttempt. Now use different prefixes (acrs/acrr) to ensure global uniqueness.

  4. Work around K8s publisher secret bug (Kubernetes publisher: cross-resource secret references generate broken Helm value paths #14370): The publisher generates broken Helm value paths for cross-resource secret references. Added --set secrets.webfrontend.cache_password="" as a temporary workaround until Kubernetes publisher: cross-resource secret references generate broken Helm value paths #14370 is fixed.

  5. Fix OIDC token expiration: Moved az acr login to immediately after ACR creation (before the 10-15 min AKS cluster creation), since the OIDC federated token expires after ~5 minutes.

Related Issues

Testing

  • AksStarterDeploymentTests: ✅ passed in run 21740063429
  • AksStarterWithRedisDeploymentTests: needs re-run with OIDC token fix

Mitch Denny added 22 commits February 5, 2026 13:44
This adds a new end-to-end deployment test that validates Azure Kubernetes Service (AKS) infrastructure creation:

- Creates resource group, ACR, and AKS cluster
- Configures kubectl credentials
- Verifies cluster connectivity
- Cleans up resources after test

Phase 1 focuses on infrastructure only - Aspire deployment will be added in subsequent phases.
Add step to register Microsoft.ContainerService and Microsoft.ContainerRegistry
resource providers before attempting to create AKS resources. This fixes the
MissingSubscriptionRegistration error when the subscription hasn't been
configured for AKS usage.
The subscription in westus3 doesn't have access to Standard_B2s, only
the v2 series VMs. Changed to Standard_B2s_v2 which is available.
The subscription has zero quota for B-series VMs in westus3. Changed to
Standard_D2s_v3 which is a widely-available D-series VM with typical quota.
…AKS deployment

Phase 2 additions:
- Create Aspire starter project using 'aspire new'
- Add Aspire.Hosting.Kubernetes package via 'aspire add'
- Modify AppHost.cs to call AddKubernetesEnvironment() with ACR config
- Login to ACR for Docker image push
- Run 'aspire publish' to generate Helm charts and push images

Phase 3 additions:
- Deploy Helm chart to AKS using 'helm install'
- Verify pods are running with kubectl
- Verify deployments are healthy

This completes the full end-to-end flow: AKS cluster creation -> Aspire project
creation -> Helm chart generation -> Deployment to Kubernetes
Changes:
- Remove invalid ContainerRegistry property from AddKubernetesEnvironment
- Add pragma warning disable for experimental ASPIREPIPELINES001
- Add container build step using dotnet publish /t:PublishContainer
- Push container images to ACR before Helm deployment
- Override Helm image values with ACR image references

The Kubernetes publisher generates Helm charts but doesn't build containers.
We need to build and push containers separately using dotnet publish.
When multiple endpoints resolve to the same port number, the Service
manifest generator was creating duplicate port entries, which Kubernetes
rejects as invalid. This fix deduplicates ports by (port, protocol)
before adding them to the Service spec.

Fixes the error:
  Service 'xxx-service' is invalid: spec.ports[1]: Duplicate value
Added Step 6 to explicitly run 'az aks update --attach-acr' after AKS
cluster creation to ensure the AcrPull role assignment has properly
propagated. This addresses potential image pull permission issues where
AKS cannot pull images from the attached ACR.

Also renumbered all subsequent steps to maintain proper ordering.
The Kubernetes publisher was generating duplicate Service/container ports
(both 8080/TCP) for ProjectResources with default http+https endpoints.
The root cause is that GenerateDefaultProjectEndpointMapping assigns the
same default port 8080 to every endpoint with None target port.

The proper fix mirrors the core framework's SetBothPortsEnvVariables()
behavior: skip the DefaultHttpsEndpoint (which the container won't listen
on — TLS termination happens at ingress/service mesh). The https endpoint
still gets an EndpointMapping (for service discovery) but reuses the http
endpoint's HelmValue, so no duplicate K8s port is generated.

Added Aspire.Hosting.Kubernetes to InternalsVisibleTo to access
ProjectResource.DefaultHttpsEndpoint. The downstream dedup in ToService()
and WithContainerPorts() remains as defense-in-depth.

Fixes #14029
Validates the Aspire starter template with Redis cache enabled deploys
correctly to AKS. Exercises the full pipeline: webfrontend → apiservice
→ Redis by hitting the /weather page (SSR, uses Redis output caching).

Key differences from the base AKS test:
- Selects 'Yes' for Redis Cache in aspire new prompts
- Redis uses public container image (no ACR push needed)
- Verifies /weather page content (confirms Redis integration works)
Both AKS tests generated the same ACR name from RunId+RunAttempt.
Use different prefixes (acrs/acrr) to ensure uniqueness.
Work around K8s publisher bug where cross-resource secret references create
Helm value paths under the consuming resource instead of referencing the
owning resource's secret. The webfrontend template expects
secrets.webfrontend.cache_password but values.yaml only has
secrets.cache.REDIS_PASSWORD. Provide the missing value via --set.
The OIDC federated token expires after ~5 minutes, but AKS cluster
creation takes 10-15 minutes. By the time the test reaches az acr login,
the assertion is stale. Moving ACR auth to right after ACR creation
ensures the OIDC token is still fresh, and Docker credentials persist
in ~/.docker/config.json for later use.
Copilot AI review requested due to automatic review settings February 6, 2026 06:16
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 14371

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/dotnet/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 14371"

@mitchdenny mitchdenny temporarily deployed to deployment-testing February 6, 2026 07:55 — with GitHub Actions Inactive
@mitchdenny mitchdenny temporarily deployed to deployment-testing February 6, 2026 07:55 — with GitHub Actions Inactive
@mitchdenny mitchdenny temporarily deployed to deployment-testing February 6, 2026 07:55 — with GitHub Actions Inactive
@mitchdenny mitchdenny temporarily deployed to deployment-testing February 6, 2026 07:55 — with GitHub Actions Inactive
@mitchdenny mitchdenny temporarily deployed to deployment-testing February 6, 2026 07:55 — with GitHub Actions Inactive
@mitchdenny mitchdenny temporarily deployed to deployment-testing February 6, 2026 07:55 — with GitHub Actions Inactive
…tion open)

Blazor SSR streaming rendering keeps the HTTP connection open to stream
updates to the browser. curl waits indefinitely for the response to
complete, causing the WaitForSuccessPrompt to time out. Adding --max-time
ensures curl returns after receiving the initial 200 status code.
Mitch Denny added 5 commits February 6, 2026 20:32
…aming

curl --max-time exits with code 28 (timeout) even when HTTP 200 was
received, because Blazor SSR streaming keeps the connection open. This
causes the && chain to fail, so echo/break never execute. Fix by using
semicolons and capturing the status code in a variable, then checking
it explicitly with [ "$S" = "200" ].
…text

The K8s publisher was not setting ExecutionContext when creating the
CommandLineArgsCallbackContext in ProcessArgumentsAsync, causing it to
default to Run mode. This made Redis's WithArgs callback produce
individual args instead of a single -c shell command string, resulting
in '/bin/sh redis-server' (open as script) instead of
'/bin/sh -c "redis-server ..."' (execute as command).

Matches the Docker Compose publisher which correctly sets
ExecutionContext = executionContext.

Also updates the Redis E2E test to wait for all pods (including cache)
and verify Redis responds to PING.
Expand $REDIS_PASSWORD inside the container shell instead of extracting
it from the K8s secret on the host. Also use --no-auth-warning to
suppress redis-cli's password-on-command-line warning.
Mitch Denny added 3 commits February 7, 2026 10:52
… key (REDIS_PASSWORD)

The K8s publisher AllocateParameter creates Helm expressions using the parameter
name (cache-password -> cache_password), but AddValuesToHelmSectionAsync writes
values using the env var key (REDIS_PASSWORD). The template references
.Values.secrets.cache.cache_password but values.yaml has REDIS_PASSWORD, so the
password is always empty and Redis crashes with 'requirepass' having no argument.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(code provided) Minimal k8s aspire application generates invalid YAML assets: duplicate ports, invalid values, etc.

3 participants